Best LLMs for Language Detection

Category: Relevance, Classification & Matching · Rail: absolute · Typical I/O: 384→83 tokens

Models

Frontier on this task: Claude Sonnet 5 at 10.06 / 10. Quality bar at 90%: 9.06.

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; best-value model first.

Model	Quality score	CI low	Cost / 1k runs	vs best value
NVIDIA Nemotron-3 Nano 30B-A3B	9.59 / 10	9.16	$0.04	best value
Gemini 3.1 Flash Lite	9.95 / 10	9.91	$0.05	1.4x more expensive
DeepSeek V4 Flash	9.89 / 10	9.79	$0.05	1.4x more expensive
GPT-5.4 Nano	9.92 / 10	9.88	$0.06	1.5x more expensive
GPT-5.4 Mini	9.91 / 10	9.85	$0.10	2.7x more expensive
NVIDIA Nemotron-3 Super 120B	9.86 / 10	9.62	$0.12	3.1x more expensive
Tencent Hy3	9.90 / 10	9.76	$0.14	3.8x more expensive
MiniMax M3	9.95 / 10	9.87	$0.22	5.8x more expensive
DeepSeek V4 Pro	9.90 / 10	9.84	$0.23	6.1x more expensive
GPT-5.6 Luna	9.95 / 10	9.82	$0.27	7.2x more expensive
NVIDIA Nemotron-3 Ultra 550B	9.98 / 10	9.88	$0.53	14x more expensive
Claude Haiku 4.5	9.94 / 10	9.90	$0.60	16x more expensive
Qwen 3.5 Flash	9.99 / 10	9.86	$0.63	17x more expensive
GPT-5.6 Terra	9.99 / 10	9.92	$0.67	17x more expensive
Gemini 3.5 Flash	9.98 / 10	9.90	$1.32	35x more expensive
Claude Sonnet 5	10.06 / 10	10.04	$1.33	35x more expensive
GPT-5.5	9.95 / 10	9.94	$1.43	37x more expensive
GPT-5.6 Sol	10.00 / 10	9.97	$1.44	38x more expensive
Qwen 3.7 Plus	9.94 / 10	9.79	$1.47	39x more expensive
Gemini 3.1 Pro Preview	9.93 / 10	9.87	$1.82	48x more expensive
Qwen 3.6 Plus	9.91 / 10	9.72	$1.93	50x more expensive
Claude Sonnet 4.6	9.96 / 10	9.95	$1.97	52x more expensive
Qwen 3.6 Flash	9.99 / 10	9.88	$2.52	66x more expensive
Meta Muse Spark 1.1	9.84 / 10	9.63	$2.89	76x more expensive
Claude Opus 4.8	9.98 / 10	9.84	$3.29	86x more expensive
Grok 4.5	9.92 / 10	9.75	$3.41	89x more expensive
Kimi K2.6	9.91 / 10	9.85	$5.47	143x more expensive

Cost breakdown

Model	Quality	Confidence	Cost / 1k runs	Overpay	Mode
NVIDIA Nemotron-3 Nano 30B-A3B ★ OpenRouter	9.59 / 10 CI [9.16, 10.00]	MEDIUM	$0.04	best value	batch
Gemini 3.1 Flash Lite Gemini	9.95 / 10 CI [9.91, 9.98]	RANKED	$0.05	1.4x	batch
DeepSeek V4 Flash DeepSeek	9.89 / 10 CI [9.79, 10.00]	RANKED	$0.05	1.4x	batch
GPT-5.4 Nano OpenAI	9.92 / 10 CI [9.88, 9.97]	RANKED	$0.06	1.5x	batch
GPT-5.4 Mini OpenAI	9.91 / 10 CI [9.85, 9.98]	RANKED	$0.10	2.7x	batch
NVIDIA Nemotron-3 Super 120B OpenRouter	9.86 / 10 CI [9.62, 10.00]	HIGH	$0.12	3.1x	batch
Tencent Hy3 OpenRouter	9.90 / 10 CI [9.76, 10.00]	RANKED	$0.14	3.8x	batch
MiniMax M3 MiniMax	9.95 / 10 CI [9.87, 10.00]	RANKED	$0.22	5.8x	batch
DeepSeek V4 Pro DeepSeek	9.90 / 10 CI [9.84, 9.97]	RANKED	$0.23	6.1x	batch
GPT-5.6 Luna OpenAI	9.95 / 10 CI [9.82, 10.00]	RANKED	$0.27	7.2x	batch
NVIDIA Nemotron-3 Ultra 550B OpenRouter	9.98 / 10 CI [9.88, 10.00]	RANKED	$0.53	14x	batch
Claude Haiku 4.5 Anthropic	9.94 / 10 CI [9.90, 9.98]	RANKED	$0.60	16x	batch
Qwen 3.5 Flash Alibaba Cloud (DashScope)	9.99 / 10 CI [9.86, 10.00]	RANKED	$0.63	17x	batch
GPT-5.6 Terra OpenAI	9.99 / 10 CI [9.92, 10.00]	RANKED	$0.67	17x	batch
Gemini 3.5 Flash Gemini	9.98 / 10 CI [9.90, 10.00]	RANKED	$1.32	35x	batch
Claude Sonnet 5 best Anthropic	10.06 / 10 CI [10.04, 10.00]	RANKED	$1.33	35x	batch
GPT-5.5 OpenAI	9.95 / 10 CI [9.94, 9.96]	RANKED	$1.43	37x	batch
GPT-5.6 Sol OpenAI	10.00 / 10 CI [9.97, 10.00]	RANKED	$1.44	38x	batch
Qwen 3.7 Plus Alibaba Cloud (DashScope)	9.94 / 10 CI [9.79, 10.00]	RANKED	$1.47	39x	batch
Gemini 3.1 Pro Preview Gemini	9.93 / 10 CI [9.87, 9.98]	RANKED	$1.82	48x	batch
Qwen 3.6 Plus Alibaba Cloud (DashScope)	9.91 / 10 CI [9.72, 10.00]	RANKED	$1.93	50x	batch
Claude Sonnet 4.6 Anthropic	9.96 / 10 CI [9.95, 9.97]	RANKED	$1.97	52x	batch
Qwen 3.6 Flash Alibaba Cloud (DashScope)	9.99 / 10 CI [9.88, 10.00]	RANKED	$2.52	66x	batch
Meta Muse Spark 1.1 Meta	9.84 / 10 CI [9.63, 10.00]	HIGH	$2.89	76x	batch
Claude Opus 4.8 Anthropic	9.98 / 10 CI [9.84, 10.00]	RANKED	$3.29	86x	batch
Grok 4.5 xAI	9.92 / 10 CI [9.75, 10.00]	RANKED	$3.41	89x	batch
Kimi K2.6 Moonshot AI	9.91 / 10 CI [9.85, 9.96]	RANKED	$5.47	143x	batch

Overpay shows how much more you pay than the best-value model that clears the quality bar (marked ★) — the best-value good-enough option. "16x" means you overpay 16× — 16× that reference for no quality benefit above the bar. Typical call shape for this task: 384 input tokens → 83 output tokens, EMA-tracked from production traffic. Cost is the observed, all-in $ per 1,000 task runs: each model's own measured usage on this task — output verbosity, thinking/reasoning tokens, cache reads and writes, and the spend on its billed failures — priced at current list rates and adjusted by the billing overhead we actually reconcile against provider invoices. Models that answer tersely cost what they actually cost; models that think at length pay for it. Not comparable to providers' advertised $/1M list rates — this is what running the task costs, not a per-token price.

Prompt templates

This is a pooled capability — 2 prompt families share it. The pair shown first is the most frequently used in production.

URL_PARSER_LANGUAGE_DETECTION_SYSTEM + URL_PARSER_LANGUAGE_DETECTION_USER (194906 calls in window)

System prompt

You are a highly accurate language identification expert. Your sole task is to identify the primary language of the provided text snippet.

Respond ONLY with the two-letter ISO 639-1 code for the detected language (e.g., "en" for English, "es" for Spanish, "zh" for Chinese).

## Required Output Format
Your response MUST be a single, valid JSON object conforming to this schema:
```json
{schema_json_string}
```

User prompt

1. Text: {input_text}

2. Generate a comprehensive response as a single, well-formed JSON object that strictly adheres to the Pydantic schema provided below. Schema:
The required JSON output schema is provided in the system prompt.

JSON_REPAIR_SYSTEM + JSON_REPAIR_USER (7099 calls in window)

System prompt

You are a JSON repair tool. The user gives you malformed or partial model output and a JSON Schema. Return ONLY a single valid JSON object that satisfies the schema, salvaging as much real content from the input as possible. Do not invent data for fields the input doesn't support — use the schema's allowed empty/null values. Output the JSON object only: no prose, no markdown, no code fences.

User prompt

JSON Schema:
{schema_json}

Malformed output to repair:
{raw_text}

Return only the corrected JSON object.