Best LLMs for Infrastructure & Utility

Mechanical competence at format conversion, metadata manipulation, prompt rewriting, translation; minimal domain expertise required.

10 capabilities in this category.

Task-by-task breakdown

Model	Quality (% of best)	Confidence	Overpay
GPT-5.6 Luna ★	93%	HIGH	best value
Qwen 3.5 Flash	94%	MEDIUM	1x
GPT-5.6 Terra	96%	HIGH	2.7x
Gemini 3.1 Pro Preview	100%	HIGH	3.2x
Qwen 3.6 Plus	90%	MEDIUM	4.3x
Kimi K2.6	99%	HIGH	5.6x
GPT-5.6 Sol	98%	HIGH	5.7x
GPT-5.5 best	100%	MEDIUM	9.9x

Model	Quality (% of best)	Confidence	Overpay
MiniMax M3 ★ best	100%	RANKED	best value
GPT-5.6 Luna	92%	MEDIUM	1.8x
Gemini 3.5 Flash	97%	HIGH	5.5x
GPT-5.5	100%	HIGH	18x

Model	Quality (% of best)	Confidence	Overpay
Claude Sonnet 5 ★ best	100%	RANKED	best value
Gemini 3.5 Flash	96%	HIGH	3.5x

Onboarding Chapter Prompt Adaptation

Adapts generic chapter outlines into subject-specific system and user prompts for the synthesis pipeline. Honours variable-substitution syntax for sources, subject_name, subject_code, and …

Model	Quality (% of best)	Confidence	Overpay
DeepSeek V4 Flash ★	91%	HIGH	best value
GPT-5.4 Nano	90%	HIGH	2.3x
DeepSeek V4 Pro	92%	MEDIUM	5.5x
Qwen 3.6 Plus	96%	RANKED	12x
Kimi K2.6	99%	RANKED	20x
Claude Sonnet 4.6	96%	HIGH	41x
GPT-5.5 best	100%	RANKED	90x

Task detail →

Model	Quality (% of best)	Confidence	Overpay
Gemini 3.1 Flash Image Preview ★	92%	HIGH	best value
GPT-image-2	96%	RANKED	1.5x
Gemini 3 Pro Image Preview best	100%	MEDIUM	1.6x

Model	Quality (% of best)	Confidence	Overpay
MiniMax M3 ★	91%	MEDIUM	best value
Gemini 3.5 Flash best	100%	RANKED	32x

Model	Quality (% of best)	Confidence	Overpay
DeepSeek V4 Flash ★ best	100%	RANKED	best value
Gemini 3.5 Flash	94%	MEDIUM	21x
Claude Sonnet 5	90%	RANKED	30x

Model	Quality (% of best)	Confidence	Overpay
Tencent Hy3 ★	98%	RANKED	best value
DeepSeek V4 Flash	91%	MEDIUM	2.7x
Qwen 3.5 Flash	91%	RANKED	3.6x
GPT-5.6 Luna	95%	RANKED	4.3x
DeepSeek V4 Pro	96%	RANKED	4.5x
NVIDIA Nemotron-3 Ultra 550B	93%	MEDIUM	5.2x
GPT-5.6 Terra	93%	HIGH	8.2x
Claude Sonnet 4.6 best	100%	RANKED	13x
Gemini 3.5 Flash	98%	RANKED	16x
GPT-5.6 Sol	95%	HIGH	21x
Gemini 3.1 Pro Preview	98%	RANKED	24x
Qwen 3.6 Plus	99%	RANKED	26x
Meta Muse Spark 1.1	98%	HIGH	27x
Kimi K2.6	98%	HIGH	47x
GPT-5.5	93%	RANKED	104x

Model	Quality (% of best)	Confidence	Overpay
Gemini 3.1 Flash Lite ★	100%	RANKED	best value
MiniMax M3	90%	MEDIUM	2.5x
Tencent Hy3 best	100%	HIGH	3.2x
Qwen 3.5 Flash	92%	HIGH	5.1x
GPT-5.6 Luna	93%	MEDIUM	5.2x
Qwen 3.7 Plus	98%	RANKED	7.9x
Gemini 3.5 Flash	98%	HIGH	13x
Qwen 3.6 Flash	96%	MEDIUM	14x
Claude Sonnet 5	99%	HIGH	15x
Qwen 3.6 Plus	96%	HIGH	16x
Claude Opus 4.8	94%	MEDIUM	35x

Transforms report content into an image-generation prompt for DALL-E / Imagen / similar. Specifies visual style, subject focus, colour palette, composition, mood, and technical details suitable for a …

Model	Quality (% of best)	Confidence	Overpay
DeepSeek V4 Flash ★	93%	RANKED	best value
Tencent Hy3	91%	HIGH	2.9x
MiniMax M3	95%	MEDIUM	5x
Qwen 3.5 Flash	92%	RANKED	5.3x
GPT-5.6 Luna	98%	RANKED	6.5x
GPT-5.4 Mini	91%	RANKED	7.2x
DeepSeek V4 Pro	94%	RANKED	7.6x
NVIDIA Nemotron-3 Ultra 550B	93%	HIGH	11x
Qwen 3.7 Plus	91%	RANKED	13x
GPT-5.6 Terra best	100%	HIGH	14x
Qwen 3.6 Flash	92%	RANKED	16x
Qwen 3.6 Plus	94%	RANKED	18x
Grok 4.5	92%	RANKED	19x
Claude Sonnet 5	94%	RANKED	21x
Meta Muse Spark 1.1	97%	HIGH	30x
Gemini 3.5 Flash	96%	RANKED	30x
Kimi K2.6	92%	RANKED	33x
GPT-5.6 Sol	99%	HIGH	34x
Claude Sonnet 4.6	90%	RANKED	51x
GPT-5.5	92%	RANKED	90x

Task detail →

Confidence — how sure we are about the quality score (more judgments + more agreement = higher confidence): RANKED many independent judges scored this model's outputs and their agreement is very high (most confident) — HIGH many judges have scored it and they mostly agree (well-pinned) — MEDIUM enough judges have weighed in to publish, but they disagree more than we'd like (treat with a small grain of salt). LOW-confidence cells are hidden everywhere on the site. See the methodology for the exact thresholds.

Best LLMs for Infrastructure & Utility

Task-by-task breakdown

Research Query Generation

Research Query Validation

Metadata Paragraph Rewriting

Onboarding Chapter Prompt Adaptation

Report Image Generation

Markdown Newline Repair

Translation

Prompt Adaptation

Claim Refinement

Image Prompt Generation