Cost mode:

Category: Long-form Content Generation · Rail: absolute · Typical I/O: 535→2089 tokens

Models

Frontier on this task: Claude Opus 4.7 at 9.51 / 10. Quality bar at 95%: 9.04.

024681095% barQwen 3.6 Plus$0.004247/call92% cheaperKimi K2.6$0.008864/call84% cheaperClaude Opus 4.7$0.054900/call0% cheaperHaiku 4.5$0.010980/call80% cheaperGemini 3 Flash Preview$0.006534/call88% cheaperGemini 3.1 Pro Preview$0.026138/call52% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.6 Plus Alibaba Cloud (DashScope)9.09 / 10 CI [8.95, 9.23]n=6 · ranked$0.00424792% cheapersync
Kimi K2.6 Moonshot AI9.21 / 10 CI [9.01, 9.40]n=6 · ranked$0.00886484% cheaperbatch
Claude Opus 4.7 best Anthropic9.51 / 10 CI [9.30, 9.72]n=5 · ranked$0.054900(anchor)batch

Typical call shape for this task: 535 input tokens → 2089 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.