The Platonic Representation Hypothesis

1 min read Original article ↗

Conventionally, different AI systems represent the world in different ways. A vision system might represent shapes and colors, a language model might focus on syntax and semantics. However, in recent years, the architectures and objectives for modeling images and text, and many other signals, are becoming remarkably alike. Are the internal representations in these systems also converging?

We argue that they are, and put forth the following hypothesis:

Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.

The intuition behind our hypothesis is that all the data we consume -- images, text, sounds, etc -- are projections of some underlying reality. A concept like

can be viewed in many different ways but the meaning, what is represented, is roughly* the same. Representation learning algorithms might recover this shared meaning.

* Not exactly the same. The text "apple" does not tell whether the fruit is red or green, but an image can. Sufficiently descriptive text is necessary. See the limitations section of our paper for discussion of this point.