Cost mode:

Category: Long-form Content Generation · Rail: absolute · Typical I/O: 674→2365 tokens

Models

Frontier on this task: Qwen 3.6 Plus at 9.09 / 10. Quality bar at 95%: 8.64.

024681095% barQwen 3.6 Plus$0.004831/call0% cheaperKimi K2.6$0.006060/call-25% cheaperDeepSeek V4 Pro$0.009403/call-95% cheaperClaude Sonnet 4.6$0.018748/call-288% cheaperClaude Opus 4.7$0.031248/call-547% cheaperGPT-5.5$0.037160/call-669% cheaperQwen 3.5 Flash$0.000635/call87% cheaperHaiku 4.5$0.006250/call-29% cheaperDeepSeek V4 Flash$0.000757/call84% cheaperGemini 3 Flash Preview$0.003716/call23% cheaperGemini 3.1 Flash Lite$0.001858/call62% cheaperGemini 3.1 Pro Preview$0.014864/call-208% cheaperGPT-5.4 mini$0.005574/call-15% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.6 Plus best Alibaba Cloud (DashScope)9.09 / 10 CI [9.00, 9.19]n=8 · ranked$0.004831(anchor)sync
Kimi K2.6 Moonshot AI9.05 / 10 CI [8.92, 9.19]n=8 · ranked$0.006060batch
DeepSeek V4 Pro DeepSeek8.81 / 10 CI [8.51, 9.11]n=7 · high$0.009403sync
Claude Sonnet 4.6 Anthropic8.79 / 10 CI [8.30, 9.27]n=6 · medium$0.018748batch
Claude Opus 4.7 Anthropic9.06 / 10 CI [8.81, 9.32]n=6 · high$0.031248batch
GPT-5.5 OpenAI8.91 / 10 CI [8.73, 9.10]n=6 · ranked$0.037160batch

Typical call shape for this task: 674 input tokens → 2365 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.