Cost mode:

Category: Topic Organization & Clustering · Rail: absolute · Typical I/O: 3170→64 tokens

Models

Frontier on this task: GPT-5.5 at 8.73 / 10. Quality bar at 95%: 8.30.

024681095% barQwen 3.5 Flash$0.000112/call99% cheaperDeepSeek V4 Flash$0.000462/call95% cheaperQwen 3.6 Plus$0.001155/call87% cheaperKimi K2.6$0.001960/call78% cheaperGemini 3.1 Pro Preview$0.003554/call60% cheaperGPT-5.5$0.008885/call0% cheaperClaude Sonnet 4.6$0.005235/call41% cheaperDeepSeek V4 Pro$0.005739/call35% cheaperGemini 3 Flash Preview$0.000888/call90% cheaperGemini 3.1 Flash Lite$0.000444/call95% cheaperMiniMax M2.5$0.001028/call88% cheaperGPT-5.4 mini$0.001333/call85% cheaperGPT-5.4 nano$0.000357/call96% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.5 Flash Alibaba Cloud (DashScope)8.65 / 10 CI [8.41, 8.89]n=100 · ranked$0.00011299% cheapersync
DeepSeek V4 Flash DeepSeek8.40 / 10 CI [8.15, 8.65]n=100 · high$0.00046295% cheapersync
Qwen 3.6 Plus Alibaba Cloud (DashScope)8.73 / 10 CI [8.46, 9.00]n=100 · high$0.00115587% cheapersync
Kimi K2.6 Moonshot AI8.64 / 10 CI [8.33, 8.95]n=100 · high$0.00196078% cheaperbatch
Gemini 3.1 Pro Preview Gemini8.61 / 10 CI [8.32, 8.90]n=99 · high$0.00355460% cheaperbatch
GPT-5.5 best OpenAI8.73 / 10 CI [8.50, 8.97]n=100 · high$0.008885(anchor)batch

Typical call shape for this task: 3170 input tokens → 64 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.