Cost mode:

Category: Financial Analysis & Trading Decisions · Rail: absolute · Typical I/O: 34660→1774 tokens

Models

Frontier on this task: Claude Sonnet 4.6 at 8.95 / 10. Quality bar at 95%: 8.50.

024681095% barQwen 3.6 Plus$0.014724/call77% cheaperClaude Sonnet 4.6$0.065295/call0% cheaperClaude Opus 4.7$0.108825/call-67% cheaperQwen 3.5 Flash$0.001501/call98% cheaperHaiku 4.5$0.021765/call67% cheaperDeepSeek V4 Flash$0.005349/call92% cheaperDeepSeek V4 Pro$0.066482/call-2% cheaperGemini 3 Flash Preview$0.011326/call83% cheaperGemini 3.1 Flash Lite$0.005663/call91% cheaperGemini 3.1 Pro Preview$0.045304/call31% cheaperMiniMax M2.5$0.012527/call81% cheaperKimi K2.6$0.024014/call63% cheaperGPT-5.5$0.113260/call-73% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Qwen 3.6 Plus Alibaba Cloud (DashScope)8.79 / 10 CI [8.69, 8.90]n=40 · ranked$0.01472477% cheapersync
Claude Sonnet 4.6 best Anthropic8.95 / 10 CI [8.74, 9.15]n=29 · high$0.065295(anchor)batch
Claude Opus 4.7 Anthropic8.77 / 10 CI [8.51, 9.04]n=25 · high$0.108825batch

Typical call shape for this task: 34660 input tokens → 1774 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.