Cost mode:

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 4466→28187 tokens

Models

Frontier on this task: GPT-5.5 at 9.18 / 10. Quality bar at 95%: 8.72.

024681095% barKimi K2.6$0.070194/call84% cheaperClaude Opus 4.7$0.363502/call16% cheaperGPT-5.5$0.433970/call0% cheaperQwen 3.5 Flash$0.007463/call98% cheaperQwen 3.6 Plus$0.056416/call87% cheaperHaiku 4.5$0.072700/call83% cheaperClaude Sonnet 4.6$0.218102/call50% cheaperDeepSeek V4 Flash$0.008518/call98% cheaperDeepSeek V4 Pro$0.105862/call76% cheaperGemini 3 Flash Preview$0.043397/call90% cheaperGemini 3.1 Flash Lite$0.021698/call95% cheaperGemini 3.1 Pro Preview$0.173588/call60% cheaperMiniMax M2.5$0.035164/call92% cheaperGPT-5.4 mini$0.065096/call85% cheaperGPT-5.4 nano$0.018063/call96% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Kimi K2.6 Moonshot AI8.75 / 10 CI [8.59, 8.91]n=86 · ranked$0.07019484% cheaperbatch
Claude Opus 4.7 Anthropic8.98 / 10 CI [8.76, 9.19]n=75 · high$0.36350216% cheaperbatch
GPT-5.5 best OpenAI9.18 / 10 CI [9.06, 9.31]n=78 · ranked$0.433970(anchor)batch

Typical call shape for this task: 4466 input tokens → 28187 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.