Settings

Theme

"Devin" AI automates Upwork job, making inferences on a computer vision model

twitter.com

3 points by drubio 2 years ago · 1 comment

Reader

ben_w 2 years ago

The SWE-bench graph (Devin: 13.85%, Clause 2: 4.8%, GPT-4: 1.74%) looks surprising, perhaps even a bit suspicious (too far above the next best) — can someone elaborate further?

Perhaps it really is this good, perhaps it's just my anxiety looking for reason to doubt. It was only 9 hours ago I was writing:

> It's certainly still possible today that some random individual has a crucial insight that gives them an edge over the big names

- https://news.ycombinator.com/item?id=39679669

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection