Cost mode:

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 1978→4363 tokens

Models

Frontier on this task: Claude Sonnet 4.6 at 9.10 / 10. Quality bar at 95%: 8.64.

024681095% barQwen 3.6 Plus$0.009151/call74% cheaperKimi K2.6$0.011599/call68% cheaperDeepSeek V4 Pro$0.018625/call48% cheaperClaude Sonnet 4.6$0.035690/call0% cheaperClaude Opus 4.7$0.059482/call-67% cheaperQwen 3.5 Flash$0.001194/call97% cheaperHaiku 4.5$0.011896/call67% cheaperDeepSeek V4 Flash$0.001499/call96% cheaperGemini 3 Flash Preview$0.007039/call80% cheaperGemini 3.1 Flash Lite$0.003520/call90% cheaperGemini 3.1 Pro Preview$0.028156/call21% cheaperMiniMax M2.5$0.005829/call84% cheaperGPT-5.4 mini$0.010558/call70% cheaperGPT-5.4 nano$0.002925/call92% cheaperGPT-5.5$0.070390/call-97% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.6 Plus Alibaba Cloud (DashScope)8.89 / 10 CI [8.79, 8.98]n=100 · ranked$0.00915174% cheapersync
Kimi K2.6 Moonshot AI8.73 / 10 CI [8.53, 8.92]n=100 · ranked$0.01159968% cheaperbatch
DeepSeek V4 Pro DeepSeek8.73 / 10 CI [8.48, 8.97]n=100 · ranked$0.01862548% cheapersync
Claude Sonnet 4.6 best Anthropic9.10 / 10 CI [9.00, 9.19]n=100 · ranked$0.035690(anchor)batch
Claude Opus 4.7 Anthropic8.83 / 10 CI [8.67, 8.99]n=100 · ranked$0.059482batch

Typical call shape for this task: 1978 input tokens → 4363 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.