Best LLMs for Language Detection — DTP Benchmark
Configuration for Language Detection.
Models
Frontier on this task: Gemini 3 Flash Preview at 10.01 / 10. Quality bar at 95%: 9.51.
point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first.
Cost breakdown
| Model | Quality | Sample | Blended cost / call | Savings vs best | Mode |
|---|---|---|---|---|---|
| GPT-5.4 nano OpenAI | 9.98 / 10 CI [9.80, 10.00] | n=85 · ranked | $0.000040 | 60% cheaper | batch |
| DeepSeek V4 Flash DeepSeek | 9.94 / 10 CI [9.67, 10.00] | n=91 · high | $0.000042 | 58% cheaper | sync |
| Gemini 3.1 Flash Lite Gemini | 9.99 / 10 CI [9.76, 10.00] | n=84 · high | $0.000050 | 50% cheaper | batch |
| Gemini 3 Flash Preview best Gemini | 10.01 / 10 CI [9.98, 10.00] | n=84 · ranked | $0.000100 | (anchor) | batch |
| MiniMax M2.5 MiniMax | 10.00 / 10 CI [9.80, 10.00] | n=100 · ranked | $0.000105 | — | sync |
| GPT-5.4 mini OpenAI | 9.99 / 10 CI [9.96, 10.00] | n=85 · ranked | $0.000149 | — | batch |
| Haiku 4.5 Anthropic | 9.99 / 10 CI [9.96, 10.00] | n=82 · ranked | $0.000187 | — | batch |
| Kimi K2.6 Moonshot AI | 9.98 / 10 CI [9.84, 10.00] | n=100 · ranked | $0.000202 | — | batch |
| Gemini 3.1 Pro Preview Gemini | 10.01 / 10 CI [9.98, 10.00] | n=84 · ranked | $0.000398 | — | batch |
| DeepSeek V4 Pro DeepSeek | 9.95 / 10 CI [9.71, 10.00] | n=89 · high | $0.000525 | — | sync |
| Claude Sonnet 4.6 Anthropic | 9.99 / 10 CI [9.96, 10.00] | n=82 · ranked | $0.000561 | — | batch |
| GPT-5.5 OpenAI | 9.99 / 10 CI [9.96, 10.00] | n=85 · ranked | $0.000995 | — | batch |
Typical call shape for this task: 254 input tokens → 24 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.