Settings

Theme

GPT-5.2, Grok 4.1, and DeepSeek v3.2 compare as Santa agents

veris.ai

4 points by _josh_meyer_ 14 days ago · 2 comments

Reader

_josh_meyer_OP 14 days ago

SantaBench, a fun benchmark with a serious methodology. The task: play a cheeky Santa agent who researches users online and roasts them based on their social media.

_josh_meyer_OP 14 days ago

OP here -- I work at Veris and built this. Happy to answer questions about the methodology!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection