Cost mode:

Category: Financial Analysis & Trading Decisions · Rail: absolute · Typical I/O: 12106→5513 tokens

Models

Frontier on this task: GPT-5.5 at 8.84 / 10. Quality bar at 95%: 8.40.

024681095% barKimi K2.6$0.020132/call82% cheaperGPT-5.5$0.112960/call0% cheaperQwen 3.5 Flash$0.001797/call98% cheaperQwen 3.6 Plus$0.014685/call87% cheaperHaiku 4.5$0.019836/call82% cheaperClaude Opus 4.7$0.099178/call12% cheaperClaude Sonnet 4.6$0.059506/call47% cheaperDeepSeek V4 Flash$0.003238/call97% cheaperDeepSeek V4 Pro$0.040250/call64% cheaperGemini 3 Flash Preview$0.011296/call90% cheaperGemini 3.1 Flash Lite$0.005648/call95% cheaperGemini 3.1 Pro Preview$0.045184/call60% cheaperMiniMax M2.5$0.010247/call91% cheaperGPT-5.4 mini$0.016944/call85% cheaperGPT-5.4 nano$0.004656/call96% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Kimi K2.6 Moonshot AI8.54 / 10 CI [8.39, 8.69]n=100 · ranked$0.02013282% cheaperbatch
GPT-5.5 best OpenAI8.84 / 10 CI [8.74, 8.95]n=84 · ranked$0.112960(anchor)batch

Typical call shape for this task: 12106 input tokens → 5513 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.