Cost mode:

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 370→347 tokens

Models

Frontier on this task: Qwen 3.6 Plus at 8.58 / 10. Quality bar at 95%: 8.15.

024681095% barQwen 3.6 Plus$0.000797/call0% cheaperQwen 3.5 Flash$0.000101/call87% cheaperHaiku 4.5$0.001052/call-32% cheaperClaude Opus 4.7$0.005262/call-560% cheaperClaude Sonnet 4.6$0.003158/call-296% cheaperDeepSeek V4 Flash$0.000149/call81% cheaperDeepSeek V4 Pro$0.001851/call-132% cheaperGemini 3 Flash Preview$0.000613/call23% cheaperGemini 3.1 Flash Lite$0.000306/call62% cheaperGemini 3.1 Pro Preview$0.002452/call-208% cheaperMiniMax M2.5$0.000527/call34% cheaperKimi K2.6$0.001044/call-31% cheaperGPT-5.4 mini$0.000920/call-15% cheaperGPT-5.4 nano$0.000254/call68% cheaperGPT-5.5$0.006130/call-669% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.6 Plus best Alibaba Cloud (DashScope)8.58 / 10 CI [8.37, 8.79]n=100 · ranked$0.000797(anchor)sync

Typical call shape for this task: 370 input tokens → 347 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.