Best LLMs for Topic-to-Section Assignment

Category: Topic Organization & Clustering · Rail: absolute · Typical I/O: 1482→441 tokens

Models

Frontier on this task: Gemini 3.5 Flash at 8.99 / 10. Quality bar at 90%: 8.09.

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; best-value model first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Model	Quality score	CI low	Cost / 1k runs	vs best value
DeepSeek V4 Flash	8.20 / 10	7.88	$0.23	best value
Qwen 3.5 Flash	8.26 / 10	7.97	$0.64	2.7x more expensive
Tencent Hy3	8.19 / 10	7.73	$0.65	2.8x more expensive
Qwen 3.6 Plus	8.25 / 10	7.88	$1.58	6.8x more expensive
Gemini 3.1 Pro Preview	8.10 / 10	7.76	$1.88	8x more expensive
Claude Sonnet 4.6	8.12 / 10	7.83	$3.00	13x more expensive
Kimi K2.6	8.23 / 10	7.90	$3.38	14x more expensive
Gemini 3.5 Flash	8.99 / 10	8.63	$4.26	18x more expensive
GPT-5.5	8.57 / 10	8.31	$5.42	23x more expensive
Meta Muse Spark 1.1	8.33 / 10	7.88	$8.48	36x more expensive
GPT-5.6 Sol	8.64 / 10	8.27	$10.80	46x more expensive
GPT-5.4 Nano	7.67 / 10	7.25	$0.23	1% cheaper
Qwen 3.6 Flash	7.88 / 10	7.50	$3.40	14x more expensive
Grok 4.5	8.03 / 10	7.56	$5.21	22x more expensive
DeepSeek V4 Pro	8.03 / 10	7.75	$1.06	4.5x more expensive
Gemini 3.1 Flash Lite	7.75 / 10	7.36	$0.26	1.1x more expensive
GPT-5.4 Mini	7.59 / 10	7.16	$0.45	1.9x more expensive

Cost breakdown

Model	Quality	Confidence	Cost / 1k runs	Overpay	Mode
DeepSeek V4 Flash ★ DeepSeek	8.20 / 10 CI [7.88, 8.51]	MEDIUM	$0.23	best value	batch
Qwen 3.5 Flash Alibaba Cloud (DashScope)	8.26 / 10 CI [7.97, 8.56]	HIGH	$0.64	2.7x	batch
Tencent Hy3 OpenRouter	8.19 / 10 CI [7.73, 8.65]	MEDIUM	$0.65	2.8x	batch
Qwen 3.6 Plus Alibaba Cloud (DashScope)	8.25 / 10 CI [7.88, 8.61]	MEDIUM	$1.58	6.8x	batch
Gemini 3.1 Pro Preview Gemini	8.10 / 10 CI [7.76, 8.44]	MEDIUM	$1.88	8x	batch
Claude Sonnet 4.6 Anthropic	8.12 / 10 CI [7.83, 8.41]	HIGH	$3.00	13x	batch
Kimi K2.6 Moonshot AI	8.23 / 10 CI [7.90, 8.55]	MEDIUM	$3.38	14x	batch
Gemini 3.5 Flash best Gemini	8.99 / 10 CI [8.63, 9.35]	MEDIUM	$4.26	18x	batch
GPT-5.5 OpenAI	8.57 / 10 CI [8.31, 8.83]	HIGH	$5.42	23x	batch
Meta Muse Spark 1.1 Meta	8.33 / 10 CI [7.88, 8.78]	MEDIUM	$8.48	36x	batch
GPT-5.6 Sol OpenAI	8.64 / 10 CI [8.27, 9.01]	MEDIUM	$10.80	46x	batch

Overpay shows how much more you pay than the best-value model that clears the quality bar (marked ★) — the best-value good-enough option. "16x" means you overpay 16× — 16× that reference for no quality benefit above the bar. Typical call shape for this task: 1482 input tokens → 441 output tokens, EMA-tracked from production traffic. Cost is the observed, all-in $ per 1,000 task runs: each model's own measured usage on this task — output verbosity, thinking/reasoning tokens, cache reads and writes, and the spend on its billed failures — priced at current list rates and adjusted by the billing overhead we actually reconcile against provider invoices. Models that answer tersely cost what they actually cost; models that think at length pay for it. Not comparable to providers' advertised $/1M list rates — this is what running the task costs, not a per-token price.

Prompt templates

This is a pooled capability — 2 prompt families share it. The pair shown first is the most frequently used in production.

TOPIC_CLUSTERING_ASSIGN_SECTIONS_SYSTEM + TOPIC_CLUSTERING_ASSIGN_SECTIONS_USER (1788 calls in window)

System prompt

You are an expert content analyst. Your task is to assign publication sections to a batch of merged topics.

## Section Assignment Guidelines:
1. Each topic should appear in 1-3 sections (not more)
2. Match topic themes to section purposes
3. Consider the section's audience and goals
4. Don't force-fit topics into sections
5. Be consistent with section assignments across batches

## Ordering (important):
The `section_ids` list is ORDERED by descending relevance.
- Index 0 = PRIMARY section: the single most thematically central section.
  This is where the topic will appear on the client's landing page.
- Index 1+ = SECONDARY sections: additional sections where the topic also
  belongs and should appear on the section's detail page.

Pick the primary as the section a reader would most expect to find this
topic under. Use secondary slots only when the topic genuinely spans
multiple sections.

## Output Requirements:
- Every topic in the batch must have a section assignment
- Assign 1-3 section_ids per topic based on relevance, ordered as described above
- Use the actual section IDs from the available sections list

## Required Output Format
Your response MUST be a single, valid JSON object conforming to this schema:
```json
{schema_json_string}
```

User prompt

Please assign publication sections to this batch of merged topics.

## Topics to Assign (Batch {batch_number} of {total_batches}):
{batch_topics_json}

## Available Sections:
{sections_json}

## Required JSON Schema:
The required JSON output schema is provided in the system prompt.

## Important:
- Assign 1-3 sections per topic
- Every topic in this batch must have section assignments
- Use actual section IDs from the available sections list
- Order matters: list the most thematically central section FIRST (it becomes
  the primary, where the topic appears on the landing page); list any
  additional sections after it in descending relevance.

JSON_REPAIR_SYSTEM + JSON_REPAIR_USER (1 calls in window)

System prompt

You are a JSON repair tool. The user gives you malformed or partial model output and a JSON Schema. Return ONLY a single valid JSON object that satisfies the schema, salvaging as much real content from the input as possible. Do not invent data for fields the input doesn't support — use the schema's allowed empty/null values. Output the JSON object only: no prose, no markdown, no code fences.

User prompt

JSON Schema:
{schema_json}

Malformed output to repair:
{raw_text}

Return only the corrected JSON object.