xdotli
- Karma
- 15
- Created
- 2 years ago
About
Founder BenchFlow.ai, a benchmark company.Recent Submissions
- 1. ▲ ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (arxiv.org)
- 2. ▲ Chaos of Agent (agentsofchaos.baulab.info)
- 3. ▲ Native CLI scaffolds consistently outper-form OpenCode when using the same model (arxiv.org)
- 4. ▲ We compare model quality in Cursor (cursor.com)
- 5. ▲ Automatically Learning Skills for Coding Agents (gepa-ai.github.io)
- 6. ▲ We Reached 74.8% on terminal-bench with Terminus-KIRA (krafton-ai.github.io)
- 7. ▲ Self-generated skills don't do much for AI agents, but human-curated skills do (theregister.com)
- 8. ▲ First Agent Skills Hackathon by the Authors of SkillsBench (skillathon.ai)
- 9. ▲ The First Agent Skills Benchmark (huggingface.co)
- 10. ▲ GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro (twitter.com)