Best LLMs for Structured Output Extraction

Category: Structured Data & Fact Extraction · Rail: absolute · Typical I/O: 4084→5339 tokens

Models

Frontier on this task: Gemini 3.1 Pro Preview at 9.84 / 10. Quality bar at 90%: 8.86.

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; best-value model first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Model	Quality score	CI low	Cost / 1k runs	vs best value
DeepSeek V4 Flash	9.54 / 10	9.40	$0.40	best value
MiniMax M3	9.46 / 10	9.30	$5.29	13x more expensive
GPT-5.4 Nano	9.46 / 10	9.22	$7.61	19x more expensive
DeepSeek V4 Pro	9.54 / 10	9.40	$11.52	29x more expensive
GPT-5.6 Luna	9.51 / 10	9.27	$12.15	30x more expensive
Qwen 3.7 Plus	9.31 / 10	9.17	$22.47	56x more expensive
Qwen 3.6 Flash	8.99 / 10	8.49	$23.13	58x more expensive
Grok 4.5	9.23 / 10	9.04	$29.30	73x more expensive
Qwen 3.6 Plus	9.65 / 10	9.47	$37.25	93x more expensive
Gemini 3.5 Flash	9.70 / 10	9.64	$42.96	107x more expensive
Kimi K2.6	9.71 / 10	9.61	$55.73	139x more expensive
Gemini 3.1 Pro Preview	9.84 / 10	9.74	$62.16	155x more expensive
GPT-5.6 Sol	9.56 / 10	9.32	$62.47	156x more expensive
GPT-5.5	8.96 / 10	8.55	$205.70	513x more expensive
Claude Haiku 4.5	8.28 / 10	7.90	$30.67	77x more expensive
Claude Sonnet 4.6	8.66 / 10	8.26	$105.33	263x more expensive
GPT-5.4 Mini	7.94 / 10	7.52	$18.48	46x more expensive
Meta Muse Spark 1.1	8.80 / 10	8.33	$38.98	97x more expensive
GPT-5.6 Terra	8.60 / 10	8.11	$33.58	84x more expensive

Cost breakdown

Model	Quality	Confidence	Cost / 1k runs	Overpay	Mode
DeepSeek V4 Flash ★ DeepSeek	9.54 / 10 CI [9.40, 9.68]	RANKED	$0.40	best value	batch
MiniMax M3 MiniMax	9.46 / 10 CI [9.30, 9.61]	RANKED	$5.29	13x	batch
GPT-5.4 Nano OpenAI	9.46 / 10 CI [9.22, 9.71]	HIGH	$7.61	19x	batch
DeepSeek V4 Pro DeepSeek	9.54 / 10 CI [9.40, 9.69]	RANKED	$11.52	29x	batch
GPT-5.6 Luna OpenAI	9.51 / 10 CI [9.27, 9.75]	HIGH	$12.15	30x	batch
Qwen 3.7 Plus Alibaba Cloud (DashScope)	9.31 / 10 CI [9.17, 9.44]	RANKED	$22.47	56x	batch
Qwen 3.6 Flash Alibaba Cloud (DashScope)	8.99 / 10 CI [8.49, 9.48]	MEDIUM	$23.13	58x	batch
Grok 4.5 xAI	9.23 / 10 CI [9.04, 9.42]	RANKED	$29.30	73x	batch
Qwen 3.6 Plus Alibaba Cloud (DashScope)	9.65 / 10 CI [9.47, 9.84]	RANKED	$37.25	93x	batch
Gemini 3.5 Flash Gemini	9.70 / 10 CI [9.64, 9.77]	RANKED	$42.96	107x	batch
Kimi K2.6 Moonshot AI	9.71 / 10 CI [9.61, 9.81]	RANKED	$55.73	139x	batch
Gemini 3.1 Pro Preview best Gemini	9.84 / 10 CI [9.74, 9.94]	RANKED	$62.16	155x	batch
GPT-5.6 Sol OpenAI	9.56 / 10 CI [9.32, 9.79]	HIGH	$62.47	156x	batch
GPT-5.5 OpenAI	8.96 / 10 CI [8.55, 9.36]	MEDIUM	$205.70	513x	batch

Overpay shows how much more you pay than the best-value model that clears the quality bar (marked ★) — the best-value good-enough option. "16x" means you overpay 16× — 16× that reference for no quality benefit above the bar. Typical call shape for this task: 4084 input tokens → 5339 output tokens, EMA-tracked from production traffic. Cost is the observed, all-in $ per 1,000 task runs: each model's own measured usage on this task — output verbosity, thinking/reasoning tokens, cache reads and writes, and the spend on its billed failures — priced at current list rates and adjusted by the billing overhead we actually reconcile against provider invoices. Models that answer tersely cost what they actually cost; models that think at length pay for it. Not comparable to providers' advertised $/1M list rates — this is what running the task costs, not a per-token price.

Prompt templates

The system + user template pair used for this task.

STRUCTURED_OUTPUT_EXTRACTION_SYSTEM_PROMPT + STRUCTURED_OUTPUT_EXTRACTION_USER_PROMPT (1565 calls in window)

System prompt

You are a JSON extraction assistant. Extract structured data from the provided text that matches the given JSON schema. Output ONLY the structured data matching the schema.

User prompt

Extract the structured data from the following text that matches this JSON schema:

## JSON Schema:
```json
{schema_json}
```

## Text to extract from:
{raw_text}