Thumbnail Bench
Human evaluation of AI image models for YouTube thumbnail generation. Models are tested on prompt-following using production thumbnail templates.
Leaderboard
Text-to-image model rankings based on ai thumbnail generation.
How We Evaluate
Each model creates thumbnails from production TubeSalt templates using identical prompts and default API settings. Thumbnails are typically scored against 10-15 criteria: anatomical accuracy (hands, face, body), skin quality, text and graphics quality, spelling, legibility, composition, framing and prompt-matching. The leaderboard shows average scores across multiple template generations.
| Rank | Model | Score (AVG@10) | Organization |
|---|---|---|---|
| 1 | Imagen 4 Preview | 90.7% | |
| 2 | Hunyuan Image V3 | 88.2% | Tencent |
| 3 | Flux Pro Kontext Max | 88.0% | Black Forest Labs |
| 4 | Flux Krea | 87.5% | Black Forest Labs |
| 5 | Flux Pro Kontext | 86.0% | Black Forest Labs |
| 6 | Ideogram V3 | 80.2% | Ideogram |
| 7 | Seedream V4 | 79.9% | ByteDance |
| 8 | Qwen Image | 75.9% | Alibaba |
| 9 | HiDream Fast | 74.3% | HiDream AI |
| 10 | Flux Dev | 67.5% | Black Forest Labs |
| 11 | HiDream I1 Full | 65.3% | HiDream AI |
Want new model benchmarks and leaderboard updates?
Get notified when we publish new benchmark results and model comparisons.