Best LLMs for Topic Discovery Clustering (pooled) — DTP Benchmark
Pooled TT for thematic topic discovery from a batch of items: topic_clustering_batch (content-driven) and topic_clustering_claims (claim-driven). Same output schema, same capability.
Models
Frontier on this task: Claude Opus 4.7 at 8.77 / 10. Quality bar at 95%: 8.33.
point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.
Cost breakdown
| Model | Quality | Sample | Blended cost / call | Savings vs best | Mode |
|---|---|---|---|---|---|
| Claude Opus 4.7 best Anthropic | 8.77 / 10 CI [8.54, 9.00] | n=98 · high | $0.305172 | (anchor) | batch |
Typical call shape for this task: 61989 input tokens → 12016 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.