Cost mode:

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 723→566 tokens

Models

Frontier on this task: Qwen 3.5 Flash at 8.63 / 10. Quality bar at 95%: 8.20.

024681095% barQwen 3.5 Flash$0.000169/call0% cheaperGemini 3 Flash Preview$0.001030/call-509% cheaperQwen 3.6 Plus$0.001339/call-692% cheaperClaude Sonnet 4.6$0.005330/call-3054% cheaperDeepSeek V4 Flash$0.000260/call-54% cheaperGemini 3.1 Flash Lite$0.000515/call-205% cheaperGemini 3.1 Pro Preview$0.004119/call-2337% cheaperMiniMax M2.5$0.000896/call-430% cheaperKimi K2.6$0.001771/call-948% cheaperGPT-5.5$0.010298/call-5993% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.5 Flash best Alibaba Cloud (DashScope)8.63 / 10 CI [8.33, 8.93]n=100 · medium$0.000169(anchor)sync
Gemini 3 Flash Preview Gemini8.29 / 10 CI [8.05, 8.54]n=100 · high$0.001030batch

Typical call shape for this task: 723 input tokens → 566 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.