GPT-5.5

Provider: OpenAI
Model name: gpt-5.5

Qualifies on: 30 / 52 tasks (at 90% bar)
Best value on: 0 tasks

Per-task breakdown

Task	Category	Quality (% of best)	Confidence	Overpay
Topic Report Relevance Scoring	Relevance, Classification & Matching	91%	HIGH	4.3x
Author Matching	Relevance, Classification & Matching	93%	HIGH	6.4x
Investment Panel Voting	Financial Analysis & Trading Decisions	95%	MEDIUM	7.9x
Engagement Reply Draft	Social & Promotional Content	98%	HIGH	8.8x
Topic Cluster Naming	Topic Organization & Clustering	99%	RANKED	9.8x
Research Query Generation best	Infrastructure & Utility	100%	MEDIUM	9.9x
Geographic Region Identification	Structured Data & Fact Extraction	93%	RANKED	11x
Social Post Promotion	Social & Promotional Content	92%	HIGH	13x
Research Query Validation	Infrastructure & Utility	100%	HIGH	18x
Vetted News Site Selection	Relevance, Classification & Matching	99%	MEDIUM	21x
Topic Grouping and Client Matching	Relevance, Classification & Matching	94%	MEDIUM	21x
SEC Filing Analysis	Financial Analysis & Trading Decisions	92%	RANKED	21x
Theme Generation	Long-form Content Generation	93%	HIGH	21x
Activity Feed Blurb Generation	Social & Promotional Content	98%	RANKED	22x
Topic-to-Section Assignment	Topic Organization & Clustering	95%	HIGH	23x
X Post Selection best	Relevance, Classification & Matching	100%	RANKED	25x
S-1 TOC Extraction	Structured Data & Fact Extraction	90%	MEDIUM	25x
SEC S-1 Chunk Analysis	Financial Analysis & Trading Decisions	98%	RANKED	28x
Trading Recommendation	Financial Analysis & Trading Decisions	99%	RANKED	30x
Language Detection	Relevance, Classification & Matching	99%	RANKED	37x
Publication Title Generation	Content Summarization & Synthesis	90%	RANKED	47x
Engagement Triage	Relevance, Classification & Matching	95%	RANKED	55x
Executive Summary Generation best	Content Summarization & Synthesis	100%	MEDIUM	56x
Author Living-Person Safety Check	Relevance, Classification & Matching	98%	MEDIUM	59x
Substack Newsletter	Long-form Content Generation	97%	HIGH	75x
Onboarding Chapter Prompt Adaptation best	Infrastructure & Utility	100%	RANKED	90x
Image Prompt Generation	Infrastructure & Utility	92%	RANKED	90x
Prompt Adaptation	Infrastructure & Utility	93%	RANKED	104x
Author Voice Generation	Long-form Content Generation	95%	MEDIUM	117x
Structured Output Extraction	Structured Data & Fact Extraction	91%	MEDIUM	513x

Overpay — how much more you pay by running this model instead of the best-value model that clears the quality bar on that task (marked ★). "16x" means you overpay 16× — the same output for 16× the best-value good-enough option; ★ means this model is that option (no overpayment). Confidence — how sure we are about the quality score (more judgments + more agreement = higher confidence): RANKED many independent judges scored this model's outputs and their agreement is very high (most confident) — HIGH many judges have scored it and they mostly agree (well-pinned) — MEDIUM enough judges have weighed in to publish, but they disagree more than we'd like (treat with a small grain of salt). LOW-confidence cells are hidden everywhere on the site. See the methodology for the exact thresholds.

GPT-5.5 at a glance

Cost vs quality across all tasks

Per-task breakdown