Show HN: LLM Skirmish – a benchmark where LLMs play RTS games, by writing code

5 points by __cayenne__ 13 days ago · 2 comments · 1 min read

Reader

I wanted to create an LLM game benchmark that put this generation of frontier LLMs' top skill, coding, on full display.

Ten years ago, a team released a game called Screeps. It was described as an "MMO RTS sandbox for programmers." In Screeps, human players write javascript strategies that get executed in the game's environment.

The Screeps paradigm, writing code and having it execute in a real-time game environment, is well suited for an LLM benchmark. Drawing on a version of the Screeps open source API, LLM Skirmish pits LLMs head-to-head in a series of 1v1 real-time strategy games.

russellthehippo 13 days ago

Whoa, this is sick. Like adversarial chess training but inverted for model evaluation. The model has to be both correct and fast at code while managing tactics and strategy well. I wonder if it should extend to general-soldier models, like an agent swarm. obv would kill tokens but would be super interesting

zztank 13 days ago

Oof, gonna go sell my Google position.

Such fascinating results and a cool way to design a benchmark

Settings

Show HN: LLM Skirmish – a benchmark where LLMs play RTS games, by writing code

Keyboard Shortcuts