Cost mode:

Category: Structured Data & Fact Extraction · Rail: absolute · Typical I/O: 2668→318 tokens

Models

Frontier on this task: DeepSeek V4 Pro at 7.45 / 10. Quality bar at 95%: 7.08.

024681095% barDeepSeek V4 Pro$0.005749/call0% cheaperQwen 3.5 Flash$0.000163/call97% cheaperQwen 3.6 Plus$0.001487/call74% cheaperHaiku 4.5$0.002129/call63% cheaperClaude Opus 4.7$0.010645/call-85% cheaperClaude Sonnet 4.6$0.006387/call-11% cheaperGemini 3 Flash Preview$0.001144/call80% cheaperGemini 3.1 Flash Lite$0.000572/call90% cheaperGemini 3.1 Pro Preview$0.004576/call20% cheaperMiniMax M2.5$0.001182/call79% cheaperKimi K2.6$0.002284/call60% cheaperGPT-5.4 mini$0.001716/call70% cheaperGPT-5.4 nano$0.000466/call92% cheaperGPT-5.5$0.011440/call-99% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
DeepSeek V4 Pro best DeepSeek7.45 / 10 CI [7.00, 7.91]n=100 · high$0.005749(anchor)sync

Typical call shape for this task: 2668 input tokens → 318 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.