SWE-bench will hit 90% this year.

1 min read Original article ↗

Research

Research insights and updates from the Fabraix team on AI agent security, adversarial testing, and RL safety.