Best LLMs for Substack Newsletter (pooled) — DTP Benchmark
Pooled TT for Substack opener and summary newsletter generation. Same role/voice; opener is shorter announcement, summary is longer recap with per-article links.
Models
Frontier on this task: Claude Opus 4.7 at 9.51 / 10. Quality bar at 95%: 9.04.
point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; cheapest qualifier first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.
Cost breakdown
| Model | Quality | Sample | Blended cost / call | Savings vs best | Mode |
|---|---|---|---|---|---|
| Qwen 3.6 Plus Alibaba Cloud (DashScope) | 9.09 / 10 CI [8.95, 9.23] | n=6 · ranked | $0.004247 | 92% cheaper | sync |
| Kimi K2.6 Moonshot AI | 9.21 / 10 CI [9.01, 9.40] | n=6 · ranked | $0.008864 | 84% cheaper | batch |
| Claude Opus 4.7 best Anthropic | 9.51 / 10 CI [9.30, 9.72] | n=5 · ranked | $0.054900 | (anchor) | batch |
Typical call shape for this task: 535 input tokens → 2089 output tokens, EMA-tracked from production traffic. Blended cost = (in × in_price + out × out_price), rounded to 6 decimals.