Click on a row to explore individual model runs
Visit on a desktop for the full interactive experience
| # | Model | Vendor | Round Average final round reached across all runs (± std. dev.). | Responses with valid tool calls that can be executed in the current game state. | Responses with valid tool calls that cannot be executed in the current game state. | Responses without valid tool calls. |
In / |
Out / |
/ [s] |
/ [m$] |
|---|