Cost mode:

Category: Long-form Content Generation · Rail: absolute · Typical I/O: 930→4606 tokens

Models

Frontier on this task: Claude Opus 4.7 at 9.53 / 10. Quality bar at 95%: 9.05.

024681095% barDeepSeek V4 Flash$0.001420/call99% cheaperDeepSeek V4 Pro$0.017647/call85% cheaperHaiku 4.5$0.023960/call80% cheaperClaude Sonnet 4.6$0.071880/call40% cheaperClaude Opus 4.7$0.119800/call0% cheaperGPT-5.5$0.142830/call-19% cheaperQwen 3.5 Flash$0.001225/call99% cheaperQwen 3.6 Plus$0.009284/call92% cheaperGemini 3 Flash Preview$0.014283/call88% cheaperGemini 3.1 Flash Lite$0.007142/call94% cheaperGemini 3.1 Pro Preview$0.057132/call52% cheaperMiniMax M2.5$0.005806/call95% cheaperKimi K2.6$0.019308/call84% cheaperGPT-5.4 mini$0.021424/call82% cheaperGPT-5.4 nano$0.005944/call95% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
DeepSeek V4 Flash DeepSeek9.26 / 10 CI [9.18, 9.33]n=97 · ranked$0.00142099% cheapersync
DeepSeek V4 Pro DeepSeek9.16 / 10 CI [9.06, 9.26]n=97 · ranked$0.01764785% cheapersync
Haiku 4.5 Anthropic9.14 / 10 CI [9.03, 9.25]n=84 · ranked$0.02396080% cheaperbatch
Claude Sonnet 4.6 Anthropic9.18 / 10 CI [9.12, 9.25]n=84 · ranked$0.07188040% cheaperbatch
Claude Opus 4.7 best Anthropic9.53 / 10 CI [9.49, 9.57]n=84 · ranked$0.119800(anchor)batch
GPT-5.5 OpenAI9.26 / 10 CI [9.04, 9.48]n=93 · high$0.142830batch

Typical call shape for this task: 930 input tokens → 4606 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.