Settings

Theme

Why do LLMs fail at the basic visual tests

twitter.com

1 points by chaitimes 8 months ago · 2 comments

Reader

chaitimesOP 8 months ago

I would assume there's enough training data now to extrapolate from the visuals the answers to these basic tests. Why do they fail miserably on such trivial questions while appearing to perform very well on complicated tests like 3d object generation

  • yorwba 8 months ago

    There are unlikely to be many six-fingered hands in the training data. So there's little reason for the model to develop the ability to recognize one when it encounters it. Maybe the result improves if you break the task down into two steps of listing the bounding boxes of all fingers in the image and then counting the number of bounding boxes.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection