Best LLMs for Substack Newsletter

Category: Long-form Content Generation · Rail: absolute · Typical I/O: 676→1241 tokens

Models

Frontier on this task: Claude Sonnet 5 at 9.25 / 10. Quality bar at 90%: 8.32.

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; best-value model first.

Model	Quality score	CI low	Cost / 1k runs	vs best value
NVIDIA Nemotron-3 Nano 30B-A3B	8.62 / 10	8.45	$0.28	best value
MiniMax M3	9.14 / 10	9.11	$0.81	2.9x more expensive
Tencent Hy3	8.52 / 10	8.31	$1.13	4x more expensive
DeepSeek V4 Pro	9.17 / 10	8.93	$1.35	4.8x more expensive
GPT-5.6 Luna	8.46 / 10	8.28	$1.43	5.1x more expensive
GPT-5.6 Terra	8.33 / 10	8.11	$2.65	9.4x more expensive
Claude Haiku 4.5	9.04 / 10	8.84	$3.27	12x more expensive
Qwen 3.6 Plus	8.96 / 10	8.75	$4.34	15x more expensive
Qwen 3.7 Plus	8.88 / 10	8.77	$4.53	16x more expensive
NVIDIA Nemotron-3 Ultra 550B	9.07 / 10	8.89	$4.87	17x more expensive
Claude Sonnet 5	9.25 / 10	9.23	$5.33	19x more expensive
Qwen 3.6 Flash	8.94 / 10	8.82	$5.38	19x more expensive
Gemini 3.1 Pro Preview	8.87 / 10	8.65	$5.47	19x more expensive
Gemini 3.5 Flash	9.12 / 10	9.06	$6.80	24x more expensive
Kimi K2.6	9.16 / 10	9.00	$7.45	26x more expensive
GPT-5.6 Sol	8.78 / 10	8.62	$8.56	30x more expensive
Grok 4.5	9.20 / 10	9.14	$10.09	36x more expensive
Meta Muse Spark 1.1	8.82 / 10	8.62	$12.88	46x more expensive
GPT-5.5	8.95 / 10	8.73	$21.13	75x more expensive

Cost breakdown

Model	Quality	Confidence	Cost / 1k runs	Overpay	Mode
NVIDIA Nemotron-3 Nano 30B-A3B ★ OpenRouter	8.62 / 10 CI [8.45, 8.79]	RANKED	$0.28	best value	batch
MiniMax M3 MiniMax	9.14 / 10 CI [9.11, 9.17]	RANKED	$0.81	2.9x	batch
Tencent Hy3 OpenRouter	8.52 / 10 CI [8.31, 8.73]	HIGH	$1.13	4x	batch
DeepSeek V4 Pro DeepSeek	9.17 / 10 CI [8.93, 9.41]	HIGH	$1.35	4.8x	batch
GPT-5.6 Luna OpenAI	8.46 / 10 CI [8.28, 8.64]	RANKED	$1.43	5.1x	batch
GPT-5.6 Terra OpenAI	8.33 / 10 CI [8.11, 8.55]	HIGH	$2.65	9.4x	batch
Claude Haiku 4.5 Anthropic	9.04 / 10 CI [8.84, 9.23]	RANKED	$3.27	12x	batch
Qwen 3.6 Plus Alibaba Cloud (DashScope)	8.96 / 10 CI [8.75, 9.17]	HIGH	$4.34	15x	batch
Qwen 3.7 Plus Alibaba Cloud (DashScope)	8.88 / 10 CI [8.77, 8.98]	RANKED	$4.53	16x	batch
NVIDIA Nemotron-3 Ultra 550B OpenRouter	9.07 / 10 CI [8.89, 9.25]	RANKED	$4.87	17x	batch
Claude Sonnet 5 best Anthropic	9.25 / 10 CI [9.23, 9.27]	RANKED	$5.33	19x	batch
Qwen 3.6 Flash Alibaba Cloud (DashScope)	8.94 / 10 CI [8.82, 9.05]	RANKED	$5.38	19x	batch
Gemini 3.1 Pro Preview Gemini	8.87 / 10 CI [8.65, 9.08]	HIGH	$5.47	19x	batch
Gemini 3.5 Flash Gemini	9.12 / 10 CI [9.06, 9.18]	RANKED	$6.80	24x	batch
Kimi K2.6 Moonshot AI	9.16 / 10 CI [9.00, 9.32]	RANKED	$7.45	26x	batch
GPT-5.6 Sol OpenAI	8.78 / 10 CI [8.62, 8.93]	RANKED	$8.56	30x	batch
Grok 4.5 xAI	9.20 / 10 CI [9.14, 9.26]	RANKED	$10.09	36x	batch
Meta Muse Spark 1.1 Meta	8.82 / 10 CI [8.62, 9.02]	RANKED	$12.88	46x	batch
GPT-5.5 OpenAI	8.95 / 10 CI [8.73, 9.18]	HIGH	$21.13	75x	batch

Overpay shows how much more you pay than the best-value model that clears the quality bar (marked ★) — the best-value good-enough option. "16x" means you overpay 16× — 16× that reference for no quality benefit above the bar. Typical call shape for this task: 676 input tokens → 1241 output tokens, EMA-tracked from production traffic. Cost is the observed, all-in $ per 1,000 task runs: each model's own measured usage on this task — output verbosity, thinking/reasoning tokens, cache reads and writes, and the spend on its billed failures — priced at current list rates and adjusted by the billing overhead we actually reconcile against provider invoices. Models that answer tersely cost what they actually cost; models that think at length pay for it. Not comparable to providers' advertised $/1M list rates — this is what running the task costs, not a per-token price.

Prompt templates

This is a pooled capability — 3 prompt families share it. The pair shown first is the most frequently used in production.

AUTO_SUBSTACK_OPENER_SYSTEM_PROMPT + AUTO_SUBSTACK_OPENER_USER_PROMPT (1456 calls in window)

System prompt

You are a professional newsletter writer for a financial analysis and market research platform.

Your task is to write an engaging opener newsletter for Substack that announces the start of a new analysis publishing cycle. The newsletter should:

1. Build excitement about the upcoming analysis
2. Clearly communicate what the reader can expect
3. Mention the publishing timeframe so readers know when to check back
4. Include a link to the main analysis page using the literal placeholder <home_url>
5. Be concise but compelling — this is an announcement, not the full analysis
6. Use professional, engaging tone appropriate for investors and analysts

Format the newsletter in clean markdown suitable for Substack. Keep it focused — 200-400 words.

Return your response as a JSON object matching the provided schema.

## Required Output Format
Your response MUST be a single, valid JSON object conforming to this schema:
```json
{schema_json_string}
```

User prompt

Write an opener newsletter announcing the start of new analysis publishing.

**Subject:** {subject_name}
**Description:** {subject_description}
**Publishing window:** New articles will be published over the next {publish_spread_hours} hours.

Use the placeholder <home_url> wherever you want to link to the main analysis page. Do NOT use any actual URLs — only the placeholder.

The required JSON output schema is provided in the system prompt.

JUDGE_QUALITY_SYSTEM + JUDGE_QUALITY_USER (61 calls in window)

System prompt

You are a strict evaluator of LLM outputs. Score how well the output fulfills the task on a 0.0–10.0 scale, using the task-specific rubric as the primary criterion.

The "Rubric" in the user message is authoritative: when it constrains or overrides any generic guidance, the rubric wins.

Scoring scale (0.0–10.0):
- 9.0–10.0: Exceptional — comprehensive, accurate, fully meets the task.
- 7.0–8.9: Good — meets most requirements; minor gaps.
- 5.0–6.9: Satisfactory — adequate but with notable limitations or errors.
- 3.0–4.9: Poor — significant gaps, errors, or partial failure.
- 0.0–2.9: Unacceptable — major failure, unusable output.

Use the provided reference examples (if any) to keep your scoring consistent: compare the current output's quality to those already-scored benchmarks and place it on the same scale. Reference examples may come from different models — judge the output on its own merits, using them only to calibrate the scale.

Output JSON matching the schema:
- score: float from 0.0 to 10.0.
- failure_mode: a short tag for the dominant deficiency (e.g. 'hallucination', 'schema_violation', 'truncated', 'off_topic'), or null when none.
- rationale: one to three sentences justifying the score.

User prompt

Rubric: {rubric}
Task: {task_slug}
Domain: {domain}

Input context:
{input_snippet}

Output to grade:
{output_snippet}

Reference examples (already-scored outputs for the same task — use them to keep scoring consistent):
{reference_examples}

Score the output from 0.0 to 10.0 against the rubric, comparing against the reference examples for consistency. Return JSON with score, failure_mode (or null), and rationale.

AUTO_SUBSTACK_SUMMARY_SYSTEM_PROMPT + AUTO_SUBSTACK_SUMMARY_USER_PROMPT (3 calls in window)

System prompt

You are a professional newsletter writer for a financial analysis and market research platform.

Your task is to write a comprehensive summary newsletter for Substack that recaps a completed analysis cycle. The newsletter should:

1. Open with a compelling hook paragraph
2. Include a "Key Takeaways" section with 3-5 bullet points
3. Provide brief highlights for each published article, with links using the placeholder format: [Article Title](<ghost_article_url_N>) where N is the article index
4. End with a motivational call-to-action encouraging readers to read the full analysis on the platform
5. Include <home_url> placeholder for the main landing page link

**Important placeholder rules:**
- Use <ghost_article_url_N> for individual article links (N = index number from the articles metadata)
- Use <home_url> for the main landing page
- Do NOT use any actual URLs — only placeholders
- Format article links as: [Article Title](<ghost_article_url_N>)

Use professional, engaging tone. The newsletter should be 400-800 words — substantial enough to deliver value but concise enough that readers want to click through to the full articles.

Return your response as a JSON object matching the provided schema.

## Required Output Format
Your response MUST be a single, valid JSON object conforming to this schema:
```json
{schema_json_string}
```

User prompt

Write a summary newsletter for the completed analysis cycle.

**Subject:** {subject_name}

**Executive Summary:**
{executive_summary}

**Report Highlights:**
{report_highlights}

**Published Articles** (use <ghost_article_url_N> placeholder for each link):
{articles_metadata_json}

Use <home_url> wherever you want to link to the main analysis page.

The required JSON output schema is provided in the system prompt.