Best LLMs for Structured Data & Fact Extraction — DTP Benchmark
Precise pattern recognition and field-level information retrieval, strict schema adherence.
Precise pattern recognition and field-level information retrieval, strict schema adherence.
Task-by-task breakdown
claim_extraction autogenerated
| Rank | Model | Quality | Cost / call | vs best |
|---|---|---|---|---|
| 1 | DeepSeek V4 Flash best | 7.46 | $0.000547 | — |
| 2 | DeepSeek V4 Pro | 7.45 | $0.006793 | -1142% |
Generic TOC Extraction
Extracts table of contents or section structure from long documents (articles, reports). Language-agnostic.
No model has reached MEDIUM confidence yet — accumulating evidence.
region_identification autogenerated
| Rank | Model | Quality | Cost / call | vs best |
|---|---|---|---|---|
| 1 | DeepSeek V4 Pro | 9.10 | $0.006699 | 2% |
| 2 | Kimi K2.6 best | 9.25 | $0.006814 | — |
| 3 | Claude Sonnet 4.6 | 9.22 | $0.012538 | -84% |
| 4 | Claude Opus 4.7 | 9.15 | $0.020898 | -207% |
S1 TOC extraction
No model has reached MEDIUM confidence yet — accumulating evidence.
structured_output_extraction autogenerated
| Rank | Model | Quality | Cost / call | vs best |
|---|---|---|---|---|
| 1 | Qwen 3.5 Flash | 9.78 | $0.000374 | 88% |
| 2 | DeepSeek V4 Flash | 9.75 | $0.000688 | 78% |
| 3 | GPT-5.4 nano | 9.76 | $0.001944 | 37% |
| 4 | MiniMax M2.5 | 9.71 | $0.002152 | 30% |
| 5 | Qwen 3.6 Plus best | 9.84 | $0.003067 | — |