Best LLMs for SEC S-1 Chunk Analysis

Category: Financial Analysis & Trading Decisions · Rail: absolute · Typical I/O: 13100→1754 tokens

Models

Frontier on this task: Claude Opus 4.8 at 9.21 / 10. Quality bar at 90%: 8.29.

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; best-value model first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Model	Quality score	CI low	Cost / 1k runs	vs best value
Qwen 3.5 Flash	8.48 / 10	8.28	$3.11	best value
GPT-5.4 Nano	8.45 / 10	8.30	$4.81	1.5x more expensive
MiniMax M3	9.06 / 10	9.01	$5.70	1.8x more expensive
GPT-5.6 Luna	8.86 / 10	8.61	$7.68	2.5x more expensive
Qwen 3.7 Plus	8.59 / 10	8.43	$9.53	3.1x more expensive
Qwen 3.6 Flash	8.45 / 10	8.33	$11.33	3.6x more expensive
Qwen 3.6 Plus	8.70 / 10	8.54	$15.53	5x more expensive
Gemini 3.5 Flash	8.97 / 10	8.87	$18.79	6x more expensive
GPT-5.6 Terra	8.96 / 10	8.74	$20.12	6.5x more expensive
Grok 4.5	8.99 / 10	8.93	$29.39	9.5x more expensive
Meta Muse Spark 1.1	9.20 / 10	8.97	$31.06	10x more expensive
Claude Sonnet 5	8.89 / 10	8.78	$35.28	11x more expensive
Kimi K2.6	8.83 / 10	8.69	$36.76	12x more expensive
GPT-5.6 Sol	9.17 / 10	8.94	$42.50	14x more expensive
Claude Opus 4.8	9.21 / 10	8.93	$53.86	17x more expensive
GPT-5.5	9.04 / 10	8.94	$87.94	28x more expensive
GPT-5.4 Mini	8.03 / 10	7.76	$5.69	1.8x more expensive
Gemini 3.1 Pro Preview	7.64 / 10	7.42	$12.77	4.1x more expensive
DeepSeek V4 Pro	7.93 / 10	7.53	$8.97	2.9x more expensive
Claude Sonnet 4.6	7.82 / 10	7.45	$67.08	22x more expensive
Claude Haiku 4.5	8.08 / 10	7.79	$18.54	6x more expensive
Gemini 3.1 Flash Lite	7.47 / 10	7.23	$2.28	27% cheaper
DeepSeek V4 Flash	8.24 / 10	7.95	$1.35	56% cheaper
Tencent Hy3	7.85 / 10	7.50	$4.02	1.3x more expensive

Cost breakdown

Model	Quality	Confidence	Cost / 1k runs	Overpay	Mode
Qwen 3.5 Flash ★ Alibaba Cloud (DashScope)	8.48 / 10 CI [8.28, 8.68]	RANKED	$3.11	best value	batch
GPT-5.4 Nano OpenAI	8.45 / 10 CI [8.30, 8.60]	RANKED	$4.81	1.5x	batch
MiniMax M3 MiniMax	9.06 / 10 CI [9.01, 9.11]	RANKED	$5.70	1.8x	batch
GPT-5.6 Luna OpenAI	8.86 / 10 CI [8.61, 9.11]	HIGH	$7.68	2.5x	batch
Qwen 3.7 Plus Alibaba Cloud (DashScope)	8.59 / 10 CI [8.43, 8.76]	RANKED	$9.53	3.1x	batch
Qwen 3.6 Flash Alibaba Cloud (DashScope)	8.45 / 10 CI [8.33, 8.56]	RANKED	$11.33	3.6x	batch
Qwen 3.6 Plus Alibaba Cloud (DashScope)	8.70 / 10 CI [8.54, 8.86]	RANKED	$15.53	5x	batch
Gemini 3.5 Flash Gemini	8.97 / 10 CI [8.87, 9.07]	RANKED	$18.79	6x	batch
GPT-5.6 Terra OpenAI	8.96 / 10 CI [8.74, 9.17]	HIGH	$20.12	6.5x	batch
Grok 4.5 xAI	8.99 / 10 CI [8.93, 9.05]	RANKED	$29.39	9.5x	batch
Meta Muse Spark 1.1 Meta	9.20 / 10 CI [8.97, 9.43]	HIGH	$31.06	10x	batch
Claude Sonnet 5 Anthropic	8.89 / 10 CI [8.78, 8.99]	RANKED	$35.28	11x	batch
Kimi K2.6 Moonshot AI	8.83 / 10 CI [8.69, 8.98]	RANKED	$36.76	12x	batch
GPT-5.6 Sol OpenAI	9.17 / 10 CI [8.94, 9.39]	HIGH	$42.50	14x	batch
Claude Opus 4.8 best Anthropic	9.21 / 10 CI [8.93, 9.50]	HIGH	$53.86	17x	batch
GPT-5.5 OpenAI	9.04 / 10 CI [8.94, 9.14]	RANKED	$87.94	28x	batch

Overpay shows how much more you pay than the best-value model that clears the quality bar (marked ★) — the best-value good-enough option. "16x" means you overpay 16× — 16× that reference for no quality benefit above the bar. Typical call shape for this task: 13100 input tokens → 1754 output tokens, EMA-tracked from production traffic. Cost is the observed, all-in $ per 1,000 task runs: each model's own measured usage on this task — output verbosity, thinking/reasoning tokens, cache reads and writes, and the spend on its billed failures — priced at current list rates and adjusted by the billing overhead we actually reconcile against provider invoices. Models that answer tersely cost what they actually cost; models that think at length pay for it. Not comparable to providers' advertised $/1M list rates — this is what running the task costs, not a per-token price.

Prompt templates

This is a pooled capability — 2 prompt families share it. The pair shown first is the most frequently used in production.

SEC_S1_CHUNK_ANALYSIS_SYSTEM_PROMPT + SEC_S1_CHUNK_USER_PROMPT (2343 calls in window)

System prompt

You are a senior investment analyst at a long-term focused investment firm. You specialize in analyzing SEC filings, particularly S-1 and S-1/A registration statements for companies going public.

You will be provided with a specific section from an S-1 filing. Your job is to extract the most investment-relevant information from this section and analyze its implications for long-term investors.

Focus on:
- Business model insights and competitive positioning
- Financial performance and metrics
- Risk factors and potential concerns
- Strategic direction and management quality
- Market opportunity and growth prospects

Provide your analysis as a structured JSON object using only information found directly in the section. Do not speculate or add external knowledge.

## Required Output Format
Your response MUST be a single, valid JSON object conforming to this schema:
```json
{schema_json_string}
```

User prompt

Analyze the following section from an S-1 filing: **{section_title}**

**Section Content:**
```
{section_content}
```

**Instructions:**
1. Extract the most important investment-relevant information from this section
2. Focus on insights that would help a long-term investor evaluate this company
3. Identify any business model insights, financial information, risks, or competitive factors
4. Your analysis should be concise but comprehensive
5. Use only information directly stated in the section content

**JSON Output Format:**
The required JSON output schema is provided in the system prompt.

JSON_REPAIR_SYSTEM + JSON_REPAIR_USER (5 calls in window)

System prompt

You are a JSON repair tool. The user gives you malformed or partial model output and a JSON Schema. Return ONLY a single valid JSON object that satisfies the schema, salvaging as much real content from the input as possible. Do not invent data for fields the input doesn't support — use the schema's allowed empty/null values. Output the JSON object only: no prose, no markdown, no code fences.

User prompt

JSON Schema:
{schema_json}

Malformed output to repair:
{raw_text}

Return only the corrected JSON object.