Cost mode:

Category: Relevance, Classification & Matching · Rail: absolute · Typical I/O: 2968→52 tokens

Models

Frontier on this task: GPT-5.5 at 8.47 / 10. Quality bar at 95%: 8.05.

024681095% barQwen 3.5 Flash$0.000103/call99% cheaperDeepSeek V4 Flash$0.000430/call97% cheaperGemini 3.1 Flash Lite$0.000820/call95% cheaperQwen 3.6 Plus$0.001066/call94% cheaperGemini 3 Flash Preview$0.001640/call90% cheaperKimi K2.6$0.003028/call82% cheaperGemini 3.1 Pro Preview$0.006560/call60% cheaperClaude Sonnet 4.6$0.009684/call41% cheaperClaude Opus 4.7$0.016140/call2% cheaperGPT-5.5$0.016400/call0% cheaperHaiku 4.5$0.003228/call80% cheaperDeepSeek V4 Pro$0.005345/call67% cheaperMiniMax M2.5$0.000953/call94% cheaperGPT-5.4 mini$0.002460/call85% cheaperGPT-5.4 nano$0.000659/call96% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.5 Flash Alibaba Cloud (DashScope)8.38 / 10 CI [8.19, 8.57]n=100 · ranked$0.00010399% cheapersync
DeepSeek V4 Flash DeepSeek8.43 / 10 CI [8.30, 8.55]n=100 · ranked$0.00043097% cheapersync
Gemini 3.1 Flash Lite Gemini8.22 / 10 CI [8.00, 8.43]n=100 · high$0.00082095% cheaperbatch
Qwen 3.6 Plus Alibaba Cloud (DashScope)8.37 / 10 CI [8.22, 8.51]n=100 · ranked$0.00106694% cheapersync
Gemini 3 Flash Preview Gemini8.32 / 10 CI [8.13, 8.51]n=100 · ranked$0.00164090% cheaperbatch
Kimi K2.6 Moonshot AI8.18 / 10 CI [8.03, 8.33]n=100 · ranked$0.00302882% cheaperbatch
Gemini 3.1 Pro Preview Gemini8.31 / 10 CI [8.12, 8.50]n=100 · ranked$0.00656060% cheaperbatch
Claude Sonnet 4.6 Anthropic8.41 / 10 CI [8.24, 8.59]n=100 · ranked$0.00968441% cheaperbatch
Claude Opus 4.7 Anthropic8.31 / 10 CI [8.10, 8.53]n=100 · high$0.0161402% cheaperbatch
GPT-5.5 best OpenAI8.47 / 10 CI [8.34, 8.61]n=99 · ranked$0.016400(anchor)batch

Typical call shape for this task: 2968 input tokens → 52 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.