Cost mode:

Category: Content Summarization & Synthesis · Rail: absolute · Typical I/O: 2859→1812 tokens

Models

Frontier on this task: Kimi K2.6 at 8.68 / 10. Quality bar at 95%: 8.25.

024681095% barKimi K2.6$0.005978/call0% cheaperGemini 3.1 Pro Preview$0.013731/call-130% cheaperQwen 3.5 Flash$0.000557/call91% cheaperQwen 3.6 Plus$0.004463/call25% cheaperHaiku 4.5$0.005960/call0% cheaperClaude Opus 4.7$0.029798/call-398% cheaperClaude Sonnet 4.6$0.017878/call-199% cheaperDeepSeek V4 Flash$0.000908/call85% cheaperDeepSeek V4 Pro$0.011280/call-89% cheaperGemini 3 Flash Preview$0.003433/call43% cheaperGemini 3.1 Flash Lite$0.001716/call71% cheaperMiniMax M2.5$0.003032/call49% cheaperGPT-5.4 mini$0.005149/call14% cheaperGPT-5.4 nano$0.001418/call76% cheaperGPT-5.5$0.034328/call-474% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
Kimi K2.6 best Moonshot AI8.68 / 10 CI [8.55, 8.81]n=90 · ranked$0.005978(anchor)batch
Gemini 3.1 Pro Preview Gemini8.25 / 10 CI [8.13, 8.38]n=69 · ranked$0.013731batch

Typical call shape for this task: 2859 input tokens → 1812 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.