Claude Sonnet 4.6

Provider: Anthropic
Model name: claude-sonnet-4-6

Qualifies on: 19 / 52 tasks (at 90% bar)
Best value on: 0 tasks

Cost vs quality across all tasks

within ~1.3× of the best-value model · 1.3–2× · >2× · ★ this model is the best-value pick on that task. Top-right = best quadrant. Only tasks where this model qualifies at the 90% bar are plotted.

Per-task breakdown

Task	Category	Quality (% of best)	Confidence	Overpay
Author Matching	Relevance, Classification & Matching	91%	MEDIUM	3.6x
Investment Panel Voting best	Financial Analysis & Trading Decisions	100%	RANKED	5.1x
Claim-Referenced Analyst Writing	Long-form Content Generation	90%	HIGH	5.2x
Topic Cluster Naming	Topic Organization & Clustering	98%	HIGH	6.7x
Social Post Promotion	Social & Promotional Content	90%	HIGH	9.9x
Theme Generation	Long-form Content Generation	90%	HIGH	11x
Topic-to-Section Assignment	Topic Organization & Clustering	90%	HIGH	13x
Prompt Adaptation best	Infrastructure & Utility	100%	RANKED	13x
Geographic Region Identification	Structured Data & Fact Extraction	97%	HIGH	16x
Activity Feed Blurb Generation	Social & Promotional Content	93%	MEDIUM	18x
Trading Recommendation	Financial Analysis & Trading Decisions	99%	HIGH	19x
X Post Selection	Relevance, Classification & Matching	100%	HIGH	23x
Publication Title Generation	Content Summarization & Synthesis	92%	RANKED	24x
Executive Summary Generation	Content Summarization & Synthesis	98%	MEDIUM	30x
Engagement Triage best	Relevance, Classification & Matching	100%	RANKED	34x
Onboarding Chapter Prompt Adaptation	Infrastructure & Utility	96%	HIGH	41x
Image Prompt Generation	Infrastructure & Utility	90%	RANKED	51x
Language Detection	Relevance, Classification & Matching	99%	RANKED	52x
Author Voice Generation	Long-form Content Generation	94%	RANKED	58x

Overpay — how much more you pay by running this model instead of the best-value model that clears the quality bar on that task (marked ★). "16x" means you overpay 16× — the same output for 16× the best-value good-enough option; ★ means this model is that option (no overpayment). Confidence — how sure we are about the quality score (more judgments + more agreement = higher confidence): RANKED many independent judges scored this model's outputs and their agreement is very high (most confident) — HIGH many judges have scored it and they mostly agree (well-pinned) — MEDIUM enough judges have weighed in to publish, but they disagree more than we'd like (treat with a small grain of salt). LOW-confidence cells are hidden everywhere on the site. See the methodology for the exact thresholds.