Cost mode:

Category: Relevance, Classification & Matching · Rail: absolute · Typical I/O: 15496→29 tokens

Models

Frontier on this task: DeepSeek V4 Pro at 8.94 / 10. Quality bar at 95%: 8.49.

024681095% barDeepSeek V4 Flash$0.002178/call92% cheaperKimi K2.6$0.008902/call67% cheaperGemini 3.1 Pro Preview$0.015670/call42% cheaperDeepSeek V4 Pro$0.027064/call0% cheaperGPT-5.5$0.039175/call-45% cheaperQwen 3.5 Flash$0.000472/call98% cheaperQwen 3.6 Plus$0.005093/call81% cheaperHaiku 4.5$0.007820/call71% cheaperClaude Opus 4.7$0.039102/call-44% cheaperClaude Sonnet 4.6$0.023462/call13% cheaperGemini 3 Flash Preview$0.003918/call86% cheaperGemini 3.1 Flash Lite$0.001959/call93% cheaperMiniMax M2.5$0.004684/call83% cheaperGPT-5.4 mini$0.005876/call78% cheaperGPT-5.4 nano$0.001568/call94% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
DeepSeek V4 Flash DeepSeek8.63 / 10 CI [8.38, 8.89]n=100 · ranked$0.00217892% cheapersync
Kimi K2.6 Moonshot AI8.81 / 10 CI [8.64, 8.99]n=100 · ranked$0.00890267% cheaperbatch
Gemini 3.1 Pro Preview Gemini8.59 / 10 CI [8.42, 8.77]n=100 · ranked$0.01567042% cheaperbatch
DeepSeek V4 Pro best DeepSeek8.94 / 10 CI [8.74, 9.14]n=100 · ranked$0.027064(anchor)sync
GPT-5.5 OpenAI8.57 / 10 CI [8.27, 8.87]n=100 · ranked$0.039175batch

Typical call shape for this task: 15496 input tokens → 29 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.