Cost mode:

Category: Topic Organization & Clustering · Rail: absolute · Typical I/O: 26525→178 tokens

Models

Frontier on this task: GPT-5.5 at 8.59 / 10. Quality bar at 95%: 8.16.

024681095% barQwen 3.6 Plus$0.008968/call87% cheaperHaiku 4.5$0.013708/call80% cheaperKimi K2.6$0.015546/call77% cheaperClaude Sonnet 4.6$0.041122/call40% cheaperDeepSeek V4 Pro$0.046773/call32% cheaperGPT-5.5$0.068982/call0% cheaperQwen 3.5 Flash$0.000842/call99% cheaperClaude Opus 4.7$0.068538/call1% cheaperDeepSeek V4 Flash$0.003763/call95% cheaperGemini 3 Flash Preview$0.006898/call90% cheaperGemini 3.1 Flash Lite$0.003449/call95% cheaperGemini 3.1 Pro Preview$0.027593/call60% cheaperMiniMax M2.5$0.008171/call88% cheaperGPT-5.4 mini$0.010347/call85% cheaperGPT-5.4 nano$0.002764/call96% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.6 Plus Alibaba Cloud (DashScope)8.20 / 10 CI [7.99, 8.41]n=100 · ranked$0.00896887% cheapersync
Haiku 4.5 Anthropic8.22 / 10 CI [7.99, 8.46]n=72 · high$0.01370880% cheaperbatch
Kimi K2.6 Moonshot AI8.50 / 10 CI [8.35, 8.65]n=100 · ranked$0.01554677% cheaperbatch
Claude Sonnet 4.6 Anthropic8.20 / 10 CI [8.02, 8.38]n=94 · ranked$0.04112240% cheaperbatch
DeepSeek V4 Pro DeepSeek8.27 / 10 CI [8.09, 8.45]n=77 · ranked$0.04677332% cheapersync
GPT-5.5 best OpenAI8.59 / 10 CI [8.44, 8.74]n=89 · ranked$0.068982(anchor)batch

Typical call shape for this task: 26525 input tokens → 178 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.