cua-bench: make your agents better at computers

1 min read Original article ↗

cua-bench is a collection of desktop and mobile tasks with a harness for evaluation and training to help agent makers quantify their agents' computer-use mastery.

interested in|

Tianbao Xie

Exactly things cua community needed. With this API based interface we are able to scale large amount diverse tasks for RL and data distillation, with help of Codex/Claude Code and other pipelines! Congrats @trycua

view agent performance

task resolution success-rate for top agents and models on cua-bench2.0

Coming Soon

The leaderboard is being prepared. Check back soon!

view cua-bench task examples