Gemini 3 Flash Preview on DTP Benchmark

At a glance

Good enough on 5/35 tasks at the 95% bar. Cheapest qualifier on 1 task. Doesn't qualify on any: Financial Analysis & Trading Decisions, Structured Data & Fact Extraction, Content Summarization & Synthesis, Long-form Content Generation, Social & Promotional Content, Topic Organization & Clustering.

Provider: Gemini
Model name: gemini-3-flash-preview
Qualifies on: 5 / 35 tasks (at 90% bar)
Cheapest qualifier on: 1 tasks

Cost vs quality across all tasks

qualifies at 90% bar · doesn't qualify · ★ this model is the best on that task. Lower + further right = cheaper + higher quality. Y-axis is log-scaled.

Per-task breakdown

Category	Task	Quality	Confidence	Cost / call	vs best	Qualifies @ 90%
Structured Data & Fact Extraction	claim_extraction autogenerated	6.71	high · n=100	$0.001580	-189%	no
Structured Data & Fact Extraction	Generic TOC Extraction	0.00	low · n=0	$0.001000	—	no
Structured Data & Fact Extraction	region_identification autogenerated	8.06	high · n=25	$0.002466	64%	no
Structured Data & Fact Extraction	S1 TOC extraction	0.00	low · n=0	$0.003965	—	no
Structured Data & Fact Extraction	structured_output_extraction autogenerated	8.46	medium · n=88	$0.004718	-54%	no
Financial Analysis & Trading Decisions	onboarding_prospect_analysis autogenerated	0.00	low · n=0	$0.002664	—	no
Financial Analysis & Trading Decisions	SEC Filling Analysis	7.02	ranked · n=100	$0.007270	90%	no
Financial Analysis & Trading Decisions	sec-s1-chunk-analysis autogenerated	7.51	high · n=83	$0.011504	90%	no
Financial Analysis & Trading Decisions	synthesis_analysis autogenerated	6.71	medium · n=28	$0.011326	83%	no
Financial Analysis & Trading Decisions	Trading Recommendation	6.54	ranked · n=78	$0.027363	80%	no
Infrastructure & Utility	claim_refinement autogenerated	7.73	ranked · n=100	$0.001599	23%	✓
Infrastructure & Utility	image_prompt_generation autogenerated	7.60	ranked · n=100	$0.003100	88%	no
Infrastructure & Utility	LLM Prompt Adaptation	7.54	ranked · n=100	$0.008759	80%	no
Infrastructure & Utility	markdown_newline_repair autogenerated	0.00	low · n=0	$0.066854	—	no
Infrastructure & Utility	metadata_paragraph_improvement autogenerated	7.71	high · n=93	$0.000626	23%	no
Infrastructure & Utility	onboarding_chapter_prompt_generation autogenerated	6.12	high · n=82	$0.043397	90%	no
Infrastructure & Utility	query_generation autogenerated	8.29	high · n=71	$0.002254	90%	no
Infrastructure & Utility	query_validation autogenerated	8.29	high · n=100	$0.001030	-509%	✓
Infrastructure & Utility	Translation	7.81	ranked · n=92	$0.003179	90%	✓
Long-form Content Generation	author_soul_generation autogenerated	8.26	ranked · n=89	$0.014283	88%	no
Long-form Content Generation	Claim-Referenced Analyst Writing (pooled)	6.45	high · n=55	$0.009648	82%	no
Long-form Content Generation	onboarding_chapter_generation autogenerated	0.00	low · n=0	$0.019231	—	no
Long-form Content Generation	section_generation autogenerated	7.90	high · n=5	$0.003716	23%	no
Long-form Content Generation	Substack Newsletter (pooled)	8.02	medium · n=5	$0.006550	88%	no
Long-form Content Generation	theme_generation autogenerated	0.00	low · n=0	$0.005774	—	no
Relevance, Classification & Matching	at_content_domain_suggest autogenerated	0.00	low · n=0	$0.001525	—	no
Relevance, Classification & Matching	Author Matching	8.34	ranked · n=100	$0.003595	85%	no
Relevance, Classification & Matching	author_living_check autogenerated	8.52	ranked · n=67	$0.005070	41%	no
Relevance, Classification & Matching	Language Detection	10.01	ranked · n=84	$0.000112	best	✓
Relevance, Classification & Matching	subreddit_selection autogenerated	5.00	high · n=60	$0.002126	90%	no
Relevance, Classification & Matching	subreddit_vetting autogenerated	0.00	low · n=0	$0.003496	—	no
Relevance, Classification & Matching	topic_client_matching autogenerated	7.38	ranked · n=100	$0.010984	-256%	no
Relevance, Classification & Matching	vetted_site_selection autogenerated	7.40	low · n=1	$0.012948	—	no
Relevance, Classification & Matching	x_post_selection autogenerated	8.32	ranked · n=100	$0.001640	90%	✓
Social & Promotional Content	activity_promo_generation autogenerated	8.16	ranked · n=61	$0.001256	89%	no
Social & Promotional Content	auto_reddit_post_generation autogenerated	6.21	ranked · n=100	$0.036308	28%	no
Social & Promotional Content	Social Post Promo (pooled)	7.42	ranked · n=97	$0.001930	-1319%	no
Social & Promotional Content	x_com_messages_for_promotion autogenerated	0.00	low · n=0	$0.017268	—	no
Content Summarization & Synthesis	content-summarization autogenerated	4.75	ranked · n=100	$0.004965	-142%	no
Content Summarization & Synthesis	Direct Browse Content Synthesis	3.50	low · n=1	$0.001386	—	no
Content Summarization & Synthesis	executive_summary_generation autogenerated	7.56	low · n=2	$0.020096	—	no
Content Summarization & Synthesis	synthesis_of_titles_for_publication autogenerated	7.84	ranked · n=69	$0.003465	65%	no
Topic Organization & Clustering	ps_section_reassignment autogenerated	0.00	low · n=0	$0.004840	—	no
Topic Organization & Clustering	Topic Discovery Clustering (pooled)	5.62	ranked · n=100	$0.033521	89%	no
Topic Organization & Clustering	topic_cluster_naming autogenerated	7.87	ranked · n=92	$0.003499	90%	no
Topic Organization & Clustering	topic_clustering_assign_sections autogenerated	7.93	high · n=99	$0.000666	90%	no
Topic Organization & Clustering	topic_sequence_determination autogenerated	0.00	low · n=0	$0.003938	—	no