Models — DTP LLM Benchmark
Per-model performance across all task types.
One page per LLM in the benchmark — its performance across every task, where it’s the cheapest qualifier at the default quality bar, and which categories it does and doesn’t qualify on.
- Claude Opus 4.7 on DTP Benchmark
- Claude Sonnet 4.6 on DTP Benchmark
- DeepSeek V4 Flash on DTP Benchmark
- DeepSeek V4 Pro on DTP Benchmark
- Gemini 3 Flash Preview on DTP Benchmark
- Gemini 3 Pro Image Preview on DTP Benchmark
- Gemini 3.1 Flash Image Preview on DTP Benchmark
- Gemini 3.1 Flash Lite on DTP Benchmark
- Gemini 3.1 Pro Preview on DTP Benchmark
- GPT-5.4 mini on DTP Benchmark
- GPT-5.4 nano on DTP Benchmark
- GPT-5.5 on DTP Benchmark
- GPT-image-1.5 on DTP Benchmark
- Haiku 4.5 on DTP Benchmark
- Imagen 4.0 Fast on DTP Benchmark
- Imagen 4.0 on DTP Benchmark
- Imagen 4.0 Ultra on DTP Benchmark
- Kimi K2.6 on DTP Benchmark
- MiniMax M2.5 on DTP Benchmark
- Qwen 3.5 Flash on DTP Benchmark
- Qwen 3.6 Plus on DTP Benchmark