OpenGameEval: Eval Framework to Benchmark Agentic AI Assistants
corp.roblox.comOpenGameEval offers a unique testing ground to evaluate core model capabilities related to agenetic reasoning and long-horizon task solving.
OpenGameEval offers a unique testing ground to evaluate core model capabilities related to agenetic reasoning and long-horizon task solving.