Cost mode:

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 2062→231 tokens

Models

Frontier on this task: Qwen 3.6 Plus at 8.08 / 10. Quality bar at 95%: 7.68.

024681095% barGemini 3 Flash Preview$0.000862/call23% cheaperQwen 3.6 Plus$0.001121/call0% cheaperQwen 3.5 Flash$0.000122/call89% cheaperHaiku 4.5$0.001608/call-43% cheaperClaude Opus 4.7$0.008042/call-617% cheaperClaude Sonnet 4.6$0.004826/call-331% cheaperDeepSeek V4 Flash$0.000353/call69% cheaperDeepSeek V4 Pro$0.004392/call-292% cheaperGemini 3.1 Flash Lite$0.000431/call62% cheaperGemini 3.1 Pro Preview$0.003448/call-208% cheaperMiniMax M2.5$0.000896/call20% cheaperKimi K2.6$0.001730/call-54% cheaperGPT-5.4 mini$0.001293/call-15% cheaperGPT-5.4 nano$0.000351/call69% cheaperGPT-5.5$0.008620/call-669% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Gemini 3 Flash Preview Gemini7.73 / 10 CI [7.51, 7.95]n=100 · ranked$0.00086223% cheaperbatch
Qwen 3.6 Plus best Alibaba Cloud (DashScope)8.08 / 10 CI [7.70, 8.47]n=100 · high$0.001121(anchor)sync

Typical call shape for this task: 2062 input tokens → 231 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.