Cost mode:

Category: Financial Analysis & Trading Decisions · Rail: absolute · Typical I/O: 5873→17263 tokens

Models

Frontier on this task: Claude Sonnet 4.6 at 8.79 / 10. Quality bar at 95%: 8.35.

024681095% barKimi K2.6$0.044779/call68% cheaperClaude Sonnet 4.6$0.138282/call0% cheaperClaude Opus 4.7$0.230470/call-67% cheaperGPT-5.5$0.273628/call-98% cheaperQwen 3.5 Flash$0.004665/call97% cheaperQwen 3.6 Plus$0.035572/call74% cheaperHaiku 4.5$0.046094/call67% cheaperDeepSeek V4 Pro$0.070294/call49% cheaperGemini 3 Flash Preview$0.027363/call80% cheaperGemini 3.1 Flash Lite$0.013681/call90% cheaperGemini 3.1 Pro Preview$0.109451/call21% cheaperMiniMax M2.5$0.022478/call84% cheaperGPT-5.4 mini$0.041044/call70% cheaperGPT-5.4 nano$0.011377/call92% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Kimi K2.6 Moonshot AI8.51 / 10 CI [8.37, 8.64]n=100 · ranked$0.04477968% cheaperbatch
Claude Sonnet 4.6 best Anthropic8.79 / 10 CI [8.57, 9.00]n=86 · high$0.138282(anchor)batch
Claude Opus 4.7 Anthropic8.75 / 10 CI [8.63, 8.87]n=86 · ranked$0.230470batch
GPT-5.5 OpenAI8.73 / 10 CI [8.58, 8.88]n=80 · ranked$0.273628batch

Typical call shape for this task: 5873 input tokens → 17263 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.