Cost mode:

Category: Structured Data & Fact Extraction · Rail: absolute · Typical I/O: 844→1503 tokens

Models

Frontier on this task: Kimi K2.6 at 9.25 / 10. Quality bar at 95%: 8.79.

024681095% barKimi K2.6$0.004088/call0% cheaperDeepSeek V4 Pro$0.006699/call-64% cheaperClaude Sonnet 4.6$0.012538/call-207% cheaperClaude Opus 4.7$0.020898/call-411% cheaperGemini 3 Flash Preview$0.002466/call40% cheaperGemini 3.1 Flash Lite$0.001233/call70% cheaperGemini 3.1 Pro Preview$0.009862/call-141% cheaperGPT-5.5$0.024655/call-503% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Kimi K2.6 best Moonshot AI9.25 / 10 CI [9.07, 9.44]n=33 · ranked$0.004088(anchor)batch
DeepSeek V4 Pro DeepSeek9.10 / 10 CI [8.80, 9.40]n=27 · high$0.006699sync
Claude Sonnet 4.6 Anthropic9.22 / 10 CI [8.95, 9.49]n=25 · high$0.012538batch
Claude Opus 4.7 Anthropic9.15 / 10 CI [8.98, 9.32]n=25 · ranked$0.020898batch

Typical call shape for this task: 844 input tokens → 1503 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.