Cost mode:

Category: Long-form Content Generation · Rail: absolute · Typical I/O: 15742→19002 tokens

Models

Frontier on this task: Claude Sonnet 4.6 at 8.98 / 10. Quality bar at 95%: 8.53.

024681095% barKimi K2.6$0.054578/call67% cheaperDeepSeek V4 Pro$0.093518/call44% cheaperClaude Sonnet 4.6$0.166128/call0% cheaperClaude Opus 4.7$0.276880/call-67% cheaperGPT-5.5$0.324385/call-95% cheaperQwen 3.5 Flash$0.005413/call97% cheaperQwen 3.6 Plus$0.042170/call75% cheaperHaiku 4.5$0.055376/call67% cheaperDeepSeek V4 Flash$0.007524/call95% cheaperGemini 3 Flash Preview$0.032438/call80% cheaperGemini 3.1 Flash Lite$0.016219/call90% cheaperGemini 3.1 Pro Preview$0.129754/call22% cheaperMiniMax M2.5$0.027525/call83% cheaperGPT-5.4 mini$0.048658/call71% cheaperGPT-5.4 nano$0.013450/call92% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Kimi K2.6 Moonshot AI8.77 / 10 CI [8.48, 9.06]n=100 · high$0.05457867% cheaperbatch
DeepSeek V4 Pro DeepSeek8.69 / 10 CI [8.36, 9.03]n=63 · high$0.09351844% cheapersync
Claude Sonnet 4.6 best Anthropic8.98 / 10 CI [8.69, 9.26]n=84 · high$0.166128(anchor)batch
Claude Opus 4.7 Anthropic8.82 / 10 CI [8.57, 9.07]n=100 · high$0.276880batch
GPT-5.5 OpenAI8.53 / 10 CI [8.29, 8.77]n=90 · high$0.324385batch

Typical call shape for this task: 15742 input tokens → 19002 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.