Settings

Theme

Claude, GPT, Gemini Play Doom II and 19 More (VideoGameBench)

vgbench.com

4 points by foddiangames 8 months ago · 1 comment

Reader

foddiangamesOP 8 months ago

Researchers from Princeton introduce a research preview of VideoGameBench, a benchmark which challenges vision-language models to complete, in real-time, a suite of 20 different popular video games from both hand-held consoles and PC.

GPT-4o, Claude Sonnet 3.7, Gemini 2.5 Pro, and Gemini 2.0 Flash playing Doom II (default difficulty) on VideoGameBench-Lite with the same input prompt! Models achieve varying levels of success but none are able to pass even the first level.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection