Settings

Theme

A Debate Tournament for LLMs

pavursec.com

3 points by cloudlandsdev 18 days ago · 1 comment

Reader

cloudlandsdevOP 18 days ago

Overall results from the blog post:

  Rank  Model                   ELO   Win%
  ----------------------------------------
   1    GPT 5.2                 1480  85%
   2    Gemini 3 Pro            1472  74%
   3    Claude Opus 4.6         1389  72%
   4    Claude Opus 4.5         1360  67%
   5    Grok 4.1 Fast           1349  62%
   6    GPT OSS 120B            1322  59%
   7    Gemini 3 Flash Preview  1316  54%
   8    Claude Sonnet 4.5       1265  54%
   9    Gemini 2.5 Flash Lite   1257  44%
  10    Mistral Large 3         1211  41%
  11    DeepSeek V3.2           1194  41%
  12    Meta Maverick           1065  26%
  13    Meta Llama Scout         999  13%
  14    Mistral Small 3          996  10%

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection