Settings

Theme

xdotli

Karma
15
Created
2 years ago

About

Founder BenchFlow.ai, a benchmark company.

Recent Submissions

  1. 1. ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (arxiv.org)
  2. 2. Chaos of Agent (agentsofchaos.baulab.info)
  3. 3. Native CLI scaffolds consistently outper-form OpenCode when using the same model (arxiv.org)
  4. 4. We compare model quality in Cursor (cursor.com)
  5. 5. Automatically Learning Skills for Coding Agents (gepa-ai.github.io)
  6. 6. We Reached 74.8% on terminal-bench with Terminus-KIRA (krafton-ai.github.io)
  7. 7. Self-generated skills don't do much for AI agents, but human-curated skills do (theregister.com)
  8. 8. First Agent Skills Hackathon by the Authors of SkillsBench (skillathon.ai)
  9. 9. The First Agent Skills Benchmark (huggingface.co)
  10. 10. GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro (twitter.com)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection