Cost mode:

Category: Content Summarization & Synthesis · Rail: absolute · Typical I/O: 1666→1260 tokens

Models

Frontier on this task: GPT-5.4 nano at 8.92 / 10. Quality bar at 95%: 8.47.

024681095% barGPT-5.4 nano$0.000954/call0% cheaperKimi K2.6$0.003974/call-317% cheaperQwen 3.5 Flash$0.000378/call60% cheaperHaiku 4.5$0.003983/call-318% cheaperClaude Opus 4.7$0.019915/call-1988% cheaperClaude Sonnet 4.6$0.011949/call-1153% cheaperDeepSeek V4 Flash$0.000586/call39% cheaperDeepSeek V4 Pro$0.007284/call-664% cheaperGemini 3 Flash Preview$0.002306/call-142% cheaperGemini 3.1 Flash Lite$0.001153/call-21% cheaperGemini 3.1 Pro Preview$0.009226/call-867% cheaperMiniMax M2.5$0.002012/call-111% cheaperGPT-5.4 mini$0.003460/call-263% cheaperGPT-5.5$0.023065/call-2318% cheaper

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Cost breakdown

ModelQualitySampleBlended cost / callSavings vs bestMode
GPT-5.4 nano best OpenAI8.92 / 10 CI [8.79, 9.04]n=100 · ranked$0.000954(anchor)batch
Kimi K2.6 Moonshot AI8.61 / 10 CI [8.45, 8.78]n=100 · ranked$0.003974batch

Typical call shape for this task: 1666 input tokens → 1260 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.