Settings

Theme

AI agents fail tasks 70% of the time

arxiv.org

23 points by JTbane 4 months ago · 8 comments

Reader

rogerkirkness 4 months ago

Agents went from 10% to 30% reliable this year, which is still a big deal.

drannex 4 months ago

Yes! but, when they work, they only kinda work, sort of.

thebigspacefuck 4 months ago

This is from a Dec 2024 which feels like a while ago

bsallthewaydown 4 months ago

AI is a going to be the next bubble. It can't even figure out who the real author of a sculpture is. It's really all BS made up to play with markets and geopolitics. Enjoy it while it lasts.

JTbaneOP 4 months ago

"We test baseline agents powered by both closed API-based and open-weights language models (LMs), and find that the most competitive agent can complete 30% of tasks autonomously."

gavinray 4 months ago

So you ask it to try every task 3.33 times for guaranteed success?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection