xdotli

Karma: 16
Created: 3 years ago

About

Founder BenchFlow.ai, a benchmark company.

Recent Submissions

1. ▲ A curated, non-BS library of the best resources for evaluating agents (github.com) 3 points · 29 days ago · 0 comments
2. ▲ Frontier Model Training Methodologies (djdumpling.github.io) 2 points · 2 months ago · 1 comment
3. ▲ ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (arxiv.org) 3 points · 3 months ago · 1 comment
4. ▲ Chaos of Agent (agentsofchaos.baulab.info) 1 point · 4 months ago · 1 comment
5. ▲ Native CLI scaffolds consistently outper-form OpenCode when using the same model (arxiv.org) 1 point · 4 months ago · 1 comment
6. ▲ We compare model quality in Cursor (cursor.com) 2 points · 4 months ago · 0 comments
7. ▲ Automatically Learning Skills for Coding Agents (gepa-ai.github.io) 4 points · 5 months ago · 0 comments
8. ▲ We Reached 74.8% on terminal-bench with Terminus-KIRA (krafton-ai.github.io) 2 points · 5 months ago · 0 comments
9. ▲ Self-generated skills don't do much for AI agents, but human-curated skills do (theregister.com) 2 points · 5 months ago · 3 comments
10. ▲ First Agent Skills Hackathon by the Authors of SkillsBench (skillathon.ai) 2 points · 5 months ago · 1 comment

All submissions on HN · View profile on HN