Best LLMs for Social & Promotional Content — DTP Benchmark
Conciseness, platform-native conventions, engagement under tight character limits.
Conciseness, platform-native conventions, engagement under tight character limits.
Task-by-task breakdown
activity_promo_generation autogenerated
| Rank | Model | Quality | Cost / call | vs best |
|---|---|---|---|---|
| 1 | Qwen 3.5 Flash | 8.46 | $0.000092 | 99% |
| 2 | Qwen 3.6 Plus | 8.48 | $0.000816 | 93% |
| 3 | GPT-5.4 mini | 8.62 | $0.001884 | 84% |
| 4 | Kimi K2.6 | 8.64 | $0.002040 | 82% |
| 5 | DeepSeek V4 Pro | 8.58 | $0.002951 | 74% |
auto_reddit_post_generation autogenerated
| Rank | Model | Quality | Cost / call | vs best |
|---|---|---|---|---|
| 1 | Kimi K2.6 best | 8.26 | $0.050155 | — |
| 2 | Claude Sonnet 4.6 | 7.85 | $0.184620 | -268% |
| 3 | Claude Opus 4.7 | 8.18 | $0.307700 | -513% |
Social Post Promo (pooled)
Pooled TT for single-platform article promo posts (X, Bluesky, Mastodon, Threads). Same prompt skeleton, per-platform style addendum and char limit.
| Rank | Model | Quality | Cost / call | vs best |
|---|---|---|---|---|
| 1 | Qwen 3.5 Flash best | 8.35 | $0.000136 | — |
| 2 | Qwen 3.6 Plus | 8.20 | $0.001254 | -822% |
| 3 | Kimi K2.6 | 8.21 | $0.003239 | -2282% |
| 4 | DeepSeek V4 Pro | 8.23 | $0.004962 | -3549% |
| 5 | GPT-5.5 | 7.94 | $0.019300 | -14091% |
x_com_messages_for_promotion autogenerated
No model has reached MEDIUM confidence yet — accumulating evidence.