Why do LLMs fail at the basic visual tests

1 points by chaitimes a year ago · 2 comments

Reader

chaitimesOP a year ago

I would assume there's enough training data now to extrapolate from the visuals the answers to these basic tests. Why do they fail miserably on such trivial questions while appearing to perform very well on complicated tests like 3d object generation

yorwba a year ago

There are unlikely to be many six-fingered hands in the training data. So there's little reason for the model to develop the ability to recognize one when it encounters it. Maybe the result improves if you break the task down into two steps of listing the bounding boxes of all fingers in the image and then counting the number of bounding boxes.

Settings

Why do LLMs fail at the basic visual tests

Keyboard Shortcuts