Cost mode:

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 3459→926 tokens

Models

Frontier on this task: GPT-5.5 at 8.99 / 10. Quality bar at 95%: 8.54.

024681095% barKimi K2.6$0.004194/call81% cheaperGemini 3.1 Pro Preview$0.009015/call60% cheaperGPT-5.5$0.022538/call0% cheaperQwen 3.5 Flash$0.000345/call98% cheaperQwen 3.6 Plus$0.002930/call87% cheaperDeepSeek V4 Flash$0.000744/call97% cheaperGemini 3 Flash Preview$0.002254/call90% cheaperGemini 3.1 Flash Lite$0.001127/call95% cheaperMiniMax M2.5$0.002149/call90% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Kimi K2.6 Moonshot AI8.59 / 10 CI [8.39, 8.78]n=83 · ranked$0.00419481% cheaperbatch
Gemini 3.1 Pro Preview Gemini8.67 / 10 CI [8.50, 8.84]n=71 · ranked$0.00901560% cheaperbatch
GPT-5.5 best OpenAI8.99 / 10 CI [8.65, 9.33]n=71 · medium$0.022538(anchor)batch

Typical call shape for this task: 3459 input tokens → 926 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.