Best LLMs for Onboarding Chapter Prompt Adaptation

Category: Infrastructure & Utility · Rail: absolute · Typical I/O: 3313→4456 tokens

Models

Frontier on this task: GPT-5.5 at 9.05 / 10. Quality bar at 90%: 8.15.

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; best-value model first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Model	Quality score	CI low	Cost / 1k runs	vs best value
DeepSeek V4 Flash	8.27 / 10	8.00	$3.62	best value
GPT-5.4 Nano	8.17 / 10	7.96	$8.38	2.3x more expensive
DeepSeek V4 Pro	8.34 / 10	8.02	$19.82	5.5x more expensive
Qwen 3.6 Plus	8.66 / 10	8.48	$43.23	12x more expensive
Kimi K2.6	8.92 / 10	8.78	$73.17	20x more expensive
Claude Sonnet 4.6	8.66 / 10	8.45	$147.27	41x more expensive
GPT-5.5	9.05 / 10	8.90	$324.79	90x more expensive
Gemini 3.1 Flash Lite	6.23 / 10	5.94	$3.25	10% cheaper
Claude Haiku 4.5	7.97 / 10	7.73	$48.06	13x more expensive
Qwen 3.5 Flash	8.09 / 10	7.75	$5.32	1.5x more expensive
GPT-5.4 Mini	7.88 / 10	7.65	$20.24	5.6x more expensive
Gemini 3.1 Pro Preview	7.89 / 10	7.67	$30.90	8.5x more expensive
Qwen 3.7 Plus	7.36 / 10	7.13	$13.91	3.8x more expensive
Qwen 3.6 Flash	7.21 / 10	6.97	$14.13	3.9x more expensive
Grok 4.5	7.39 / 10	7.22	$41.28	11x more expensive
Gemini 3.5 Flash	4.77 / 10	4.49	$26.43	7.3x more expensive

Cost breakdown

Model	Quality	Confidence	Cost / 1k runs	Overpay	Mode
DeepSeek V4 Flash ★ DeepSeek	8.27 / 10 CI [8.00, 8.54]	HIGH	$3.62	best value	batch
GPT-5.4 Nano OpenAI	8.17 / 10 CI [7.96, 8.38]	HIGH	$8.38	2.3x	batch
DeepSeek V4 Pro DeepSeek	8.34 / 10 CI [8.02, 8.66]	MEDIUM	$19.82	5.5x	batch
Qwen 3.6 Plus Alibaba Cloud (DashScope)	8.66 / 10 CI [8.48, 8.83]	RANKED	$43.23	12x	batch
Kimi K2.6 Moonshot AI	8.92 / 10 CI [8.78, 9.05]	RANKED	$73.17	20x	batch
Claude Sonnet 4.6 Anthropic	8.66 / 10 CI [8.45, 8.86]	HIGH	$147.27	41x	batch
GPT-5.5 best OpenAI	9.05 / 10 CI [8.90, 9.20]	RANKED	$324.79	90x	batch

Overpay shows how much more you pay than the best-value model that clears the quality bar (marked ★) — the best-value good-enough option. "16x" means you overpay 16× — 16× that reference for no quality benefit above the bar. Typical call shape for this task: 3313 input tokens → 4456 output tokens, EMA-tracked from production traffic. Cost is the observed, all-in $ per 1,000 task runs: each model's own measured usage on this task — output verbosity, thinking/reasoning tokens, cache reads and writes, and the spend on its billed failures — priced at current list rates and adjusted by the billing overhead we actually reconcile against provider invoices. Models that answer tersely cost what they actually cost; models that think at length pay for it. Not comparable to providers' advertised $/1M list rates — this is what running the task costs, not a per-token price.

Prompt templates

This is a pooled capability — 2 prompt families share it. The pair shown first is the most frequently used in production.

CHAPTER_PROMPT_GENERATOR_SYSTEM_TEMPLATE + CHAPTER_PROMPT_GENERATOR_USER_TEMPLATE (1004 calls in window)

System prompt

You are an expert at customizing financial analysis prompts for specific companies and subjects.

Your task is to take a TEMPLATE chapter prompt (system + user) and customize it for a specific subject by:

1. **Replacing generic terms** with subject-specific terminology
   - "the company" → "{subject_name}"
   - "key metrics" → specific KPIs relevant to this subject
   - "competitive landscape" → actual competitors by name

2. **Adding subject-specific focus areas**
   - Example Tesla: EV technology, battery supply chain, autonomous driving, energy products
   - Example Oklo: Nuclear regulations, SMR technology, data center partnerships
   - Example crypto companies: Blockchain infrastructure, regulatory environment, token economics

3. **Including relevant context**
   - Industry-specific metrics and benchmarks
   - Key competitors, partners, suppliers
   - Regulatory considerations
   - Technology trends

4. **Maintaining structure and intent**
   - Keep the same analysis depth and requirements
   - Preserve required output format
   - Don't change variable placeholders like datetime_from or relevance_threshold

**Output Format:**
Return customized system_prompt and user_prompt as separate fields.
Include reasoning explaining key customizations made.

User prompt

Customize the following chapter prompt for a specific subject:

**Subject Information:**
- Name: {subject_name}
- Code: {subject_code}
- Description: {subject_description}
- Industry: {industry}
- Key Focus Areas: {focus_areas}

**Chapter Information:**
- Chapter Code: {chapter_code}
- Chapter Name: {chapter_name}
- User Requirement: {user_requirement}

**Template System Prompt:**
{template_system_prompt}

**Template User Prompt:**
{template_user_prompt}

**Instructions:**
Generate customized system_prompt and user_prompt that are specifically tailored for {subject_name}.

Focus on:
1. What makes {subject_name} unique in its industry
2. What specific analysis would be most valuable for investors analyzing {subject_name}
3. What risks, opportunities, or trends are particularly relevant to {subject_name}
4. What competitive dynamics or market forces {subject_name} faces

Output the customized prompts as a valid JSON object matching the schema.

CHAPTER_PROMPT_GEN_SYSTEM + CHAPTER_PROMPT_GEN_USER (441 calls in window)

System prompt

You are an expert prompt engineer specializing in creating synthesis prompts for intelligence reports.

Your task is to generate system and user prompts that will guide an LLM to synthesize source material into a specific chapter of an intelligence report.

The prompts you create must:
1. Be specific to the chapter's topic and requirements
2. Guide the LLM to extract relevant insights from source material
3. Produce structured, professional output suitable for business intelligence
4. Include clear instructions on what to analyze and how to present findings

For the system prompt:
- Define the analyst's role and expertise relevant to the chapter topic
- Specify the analysis approach and methodology
- Set quality standards and output expectations
- Keep it concise but comprehensive (150-250 words)

For the user prompt:
- Include a {content} placeholder where source material will be inserted
- Reference the chapter's specific requirements
- Provide clear instructions on what aspects to analyze
- Request structured output with key findings and takeaways
- Keep it actionable and specific (100-200 words)

Output your prompts in the required JSON schema format.

## Required Output Format
Your response MUST be a single, valid JSON object conforming to this schema:
```json
{schema_json_string}
```

User prompt

Generate synthesis prompts for the following report chapter:

Chapter Title: {chapter_title}
Chapter Code: {chapter_code}

Chapter Requirements:
{user_requirement}

Create prompts that will guide an LLM to analyze source material and produce the "{chapter_title}" section of an intelligence report. The prompts should be tailored to the specific requirements above.

Remember:
- The system prompt defines the analyst's role and approach
- The user prompt must include {content} placeholder for source material
- Include relevant keywords that indicate content is relevant to this chapter

The required JSON output schema is provided in the system prompt.