Cyber Model Arena
wiz.ioI'm wondering why they have decided to airgap the models inside docker containers. IMO, this would have been a better comparison if the models were allowed to perform tool calls.
General-Purpose Cyber Benchmark for AI Agents and their Models