Haiku 4.5 on DTP Benchmark

At a glance

Good enough on 3/35 tasks at the 95% bar. Cheapest qualifier on 0 tasks. Doesn't qualify on any: Financial Analysis & Trading Decisions, Structured Data & Fact Extraction, Content Summarization & Synthesis, Social & Promotional Content, Infrastructure & Utility.

Provider: Anthropic
Model name: claude-haiku-4-5
Qualifies on: 3 / 35 tasks (at 90% bar)
Cheapest qualifier on: 0 tasks

Cost vs quality across all tasks

qualifies at 90% bar · doesn't qualify · ★ this model is the best on that task. Lower + further right = cheaper + higher quality. Y-axis is log-scaled.

Per-task breakdown

Category	Task	Quality	Confidence	Cost / call	vs best	Qualifies @ 90%
Structured Data & Fact Extraction	claim_extraction autogenerated	6.63	high · n=100	$0.002858	-422%	no
Structured Data & Fact Extraction	Generic TOC Extraction	0.00	low · n=0	$0.001750	—	no
Structured Data & Fact Extraction	region_identification autogenerated	7.92	low · n=25	$0.004180	39%	no
Structured Data & Fact Extraction	S1 TOC extraction	0.00	low · n=0	$0.006824	—	no
Structured Data & Fact Extraction	structured_output_extraction autogenerated	8.38	medium · n=91	$0.008306	-171%	no
Financial Analysis & Trading Decisions	onboarding_prospect_analysis autogenerated	0.00	low · n=0	$0.004592	—	no
Financial Analysis & Trading Decisions	SEC Filling Analysis	5.98	medium · n=100	$0.012259	83%	no
Financial Analysis & Trading Decisions	sec-s1-chunk-analysis autogenerated	7.67	high · n=81	$0.020164	82%	no
Financial Analysis & Trading Decisions	synthesis_analysis autogenerated	7.64	medium · n=29	$0.021765	67%	no
Financial Analysis & Trading Decisions	Trading Recommendation	8.24	high · n=86	$0.046094	67%	no
Infrastructure & Utility	claim_refinement autogenerated	6.31	medium · n=100	$0.002920	-40%	no
Infrastructure & Utility	image_prompt_generation autogenerated	7.59	ranked · n=100	$0.005360	80%	no
Infrastructure & Utility	LLM Prompt Adaptation	7.38	high · n=100	$0.014782	67%	no
Infrastructure & Utility	markdown_newline_repair autogenerated	1.50	low · n=1	$0.112286	—	no
Infrastructure & Utility	metadata_paragraph_improvement autogenerated	5.17	medium · n=90	$0.001074	-32%	no
Infrastructure & Utility	onboarding_chapter_prompt_generation autogenerated	8.22	high · n=75	$0.072700	83%	no
Infrastructure & Utility	Translation	6.34	medium · n=93	$0.005402	83%	no
Long-form Content Generation	author_soul_generation autogenerated	9.14	ranked · n=84	$0.023960	80%	✓
Long-form Content Generation	Claim-Referenced Analyst Writing (pooled)	8.02	high · n=37	$0.017526	67%	no
Long-form Content Generation	onboarding_chapter_generation autogenerated	0.00	low · n=0	$0.032188	—	no
Long-form Content Generation	section_generation autogenerated	6.73	high · n=6	$0.006250	-29%	no
Long-form Content Generation	Substack Newsletter (pooled)	8.87	ranked · n=5	$0.011004	80%	no
Long-form Content Generation	theme_generation autogenerated	0.00	low · n=0	$0.010548	—	no
Relevance, Classification & Matching	at_content_domain_suggest autogenerated	0.00	low · n=0	$0.002906	—	no
Relevance, Classification & Matching	Author Matching	6.62	high · n=100	$0.007162	71%	no
Relevance, Classification & Matching	Language Detection	9.99	ranked · n=82	$0.000213	-90%	✓
Relevance, Classification & Matching	Relevance Scoring (POST)	3.28	high · n=100	$0.001750	56%	no
Relevance, Classification & Matching	Relevance Scoring (Topic Report)	7.19	medium · n=44	$0.001750	-994%	no
Relevance, Classification & Matching	Relevance Scoring (X Post)	6.09	low · n=11	$0.001750	—	no
Relevance, Classification & Matching	subreddit_selection autogenerated	4.60	high · n=63	$0.003952	81%	no
Relevance, Classification & Matching	subreddit_vetting autogenerated	6.50	low · n=1	$0.006025	—	no
Relevance, Classification & Matching	topic_client_matching autogenerated	5.09	medium · n=100	$0.019231	-523%	no
Relevance, Classification & Matching	vetted_site_selection autogenerated	6.88	low · n=2	$0.023179	—	no
Relevance, Classification & Matching	x_post_selection autogenerated	7.79	medium · n=100	$0.003228	80%	no
Social & Promotional Content	auto_reddit_post_generation autogenerated	7.49	high · n=100	$0.061540	-23%	no
Social & Promotional Content	Social Post Promo (pooled)	7.39	ranked · n=94	$0.003608	-2553%	no
Social & Promotional Content	x_com_messages_for_promotion autogenerated	0.00	low · n=0	$0.032237	—	no
Content Summarization & Synthesis	content-summarization autogenerated	6.47	ranked · n=100	$0.008622	-320%	no
Content Summarization & Synthesis	executive_summary_generation autogenerated	0.00	low · n=0	$0.035708	—	no
Content Summarization & Synthesis	synthesis_of_titles_for_publication autogenerated	8.01	high · n=66	$0.006009	40%	no
Topic Organization & Clustering	ps_section_reassignment autogenerated	0.00	low · n=0	$0.008722	—	no
Topic Organization & Clustering	Topic Discovery Clustering (pooled)	7.13	high · n=100	$0.061034	80%	no
Topic Organization & Clustering	topic_cluster_naming autogenerated	8.22	high · n=72	$0.006792	81%	✓
Topic Organization & Clustering	topic_sequence_determination autogenerated	0.00	low · n=0	$0.006870	—	no