Cost mode:

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 1984→4033 tokens

Models

Frontier on this task: GPT-5.5 at 8.21 / 10. Quality bar at 95%: 7.79.

024681095% barQwen 3.5 Flash$0.001108/call98% cheaperGemini 3 Flash Preview$0.006546/call90% cheaperKimi K2.6$0.010810/call83% cheaperGemini 3.1 Pro Preview$0.026182/call60% cheaperGPT-5.5$0.065455/call0% cheaperQwen 3.6 Plus$0.008509/call87% cheaperHaiku 4.5$0.011074/call83% cheaperClaude Sonnet 4.6$0.033224/call49% cheaperDeepSeek V4 Flash$0.001407/call98% cheaperDeepSeek V4 Pro$0.017487/call73% cheaperGemini 3.1 Flash Lite$0.003273/call95% cheaperMiniMax M2.5$0.005435/call92% cheaperGPT-5.4 mini$0.009818/call85% cheaperGPT-5.4 nano$0.002719/call96% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.5 Flash Alibaba Cloud (DashScope)7.94 / 10 CI [7.76, 8.12]n=100 · ranked$0.00110898% cheapersync
Gemini 3 Flash Preview Gemini7.81 / 10 CI [7.63, 7.99]n=92 · ranked$0.00654690% cheaperbatch
Kimi K2.6 Moonshot AI8.04 / 10 CI [7.85, 8.22]n=100 · ranked$0.01081083% cheaperbatch
Gemini 3.1 Pro Preview Gemini8.02 / 10 CI [7.85, 8.18]n=92 · ranked$0.02618260% cheaperbatch
GPT-5.5 best OpenAI8.21 / 10 CI [8.00, 8.41]n=90 · high$0.065455(anchor)batch

Typical call shape for this task: 1984 input tokens → 4033 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.