Cost mode:

Category: Topic Organization & Clustering · Rail: absolute · Typical I/O: 61989→12016 tokens

Models

Frontier on this task: Claude Opus 4.7 at 8.77 / 10. Quality bar at 95%: 8.33.

024681095% barClaude Opus 4.7$0.305172/call0% cheaperQwen 3.5 Flash$0.004984/call98% cheaperQwen 3.6 Plus$0.043578/call86% cheaperHaiku 4.5$0.061034/call80% cheaperClaude Sonnet 4.6$0.183104/call40% cheaperDeepSeek V4 Flash$0.012043/call96% cheaperDeepSeek V4 Pro$0.149677/call51% cheaperGemini 3 Flash Preview$0.033521/call89% cheaperGemini 3.1 Flash Lite$0.016761/call95% cheaperGemini 3.1 Pro Preview$0.134085/call56% cheaperMiniMax M2.5$0.033016/call89% cheaperKimi K2.6$0.064172/call79% cheaperGPT-5.4 mini$0.050282/call84% cheaperGPT-5.4 nano$0.013709/call96% cheaperGPT-5.5$0.335212/call-10% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Claude Opus 4.7 best Anthropic8.77 / 10 CI [8.54, 9.00]n=98 · high$0.305172(anchor)batch

Typical call shape for this task: 61989 input tokens → 12016 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.