AI Agent Reliability Tracker
hal.cs.princeton.edu> recent capability gains have yielded only small improvements in reliability.
Have I missed something? Why would one expect capability gain to make any such improvement?
> recent capability gains have yielded only small improvements in reliability.
Have I missed something? Why would one expect capability gain to make any such improvement?