Settings

Theme

We Tested 6 AI Models on 3 Common Security Exploits

blog.kilocode.ai

2 points by heymax054 2 months ago · 1 comment

Reader

viraptor 2 months ago

> No self-judging: GPT-5 scored the other 5 models but couldn’t evaluate its own output - I used Claude Opus 4.1 as judge for GPT-5’s submissions to avoid bias.

That's a bit silly, especially since all openai models will share some elements. The points lost meaning there. They could for example use glm for all judging instead. Or go all the way and do a full matrix of everything judging everything else.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection