All AI models might be the same

blog.jxmo.io

294 points by jxmorris12 a day ago


d_burfoot - 12 hours ago

For those who don't know about Plato's Theory of Forms, the remarkable empirical fact is that all humans learn roughly the same concepts representing words like "dog", "house", "person", "boat", and so on, even if those people grew up in different places and never had any overlap between their observational experience. You and I may never have observed the same dog, but we still agree on what a "dog" is.

This phenomenon appears to occur for LLM learning as well, though it is less remarkable due to the fact that LLMs likely have significant overlap in their training data.

I believe this is good news for Alignment, because, as Plato pointed out, one of the most important forms is the Form of the Good - a (theoretical) universal human ideal containing notions of justice, virtue, compassion, excellence, etc. If this Form truly exists, and LLMs can learn it, it may be possible to train them to pursue it (or refuse requests that are opposed to it).

lsy - a day ago

The example given for inverting an embedding back to text doesn't help the idea that this effect is reflecting some "shared statistical model of reality": What would be the plausible whalesong mapping of "Mage (foaled April 18, 2020) is an American Thoroughbred racehorse who won the 2023 Kentucky Derby"?

There isn't anything core to reality about Kentucky, its Derby, the Gregorian calendar, America, horse breeds, etc. These are all cultural inventions that happen to have particular importance in global human culture because of accidents of history, and are well-attested in training sets. At best we are seeing some statistical convergence on training sets because everyone is training on the same pile and scraping the barrel for any differences.

coffeecoders - a day ago

I think “we might decode whale speech or ancient languages” is a huge stretch. Context is the most important part of what makes language useful.

There is billions of human-written texts, grounded in shared experience that makes our AI good at language. We don't have that for a whale.

TheSaifurRahman - a day ago

This only works when different sources share similar feature distributions and semantic relationships.

The M or B game breaks down when you play with someone who knows obscure people you've never heard of. Either you can't recognize their references, or your sense of "semantic distance" differs from theirs. The solution is to match knowledge levels: experts play with experts, generalists with generalists.

The same applies to decoding ancient texts, if ancient civilizations focused on completely different concepts than we do today, our modern semantic models won't help us understand their writing.

streptomycin - a day ago

Is it closer to Mussolini or bread? Mussolini.

Is it closer to Mussolini or David Beckham? Uhh, I guess Mussolini. (Ok, they’re definitely thinking of a person.)

That reasoning doesn't follow. Many things besides people would have the same answers, for instance any animal that seems more like Mussolini than Beckham.

IAmNotACellist - a day ago

I agree LLMs are converging on a current representation of reality based on the collective works of humanity. What we need to do is provide AIs with realtime sensory input, simulated hormones each with their own half-lifes based on metabolic conditions and energy usage, a constant thinking loop, and discover a synthetic psilocybin that's capable of causing creative, cross-neural connections similar to human brains. We have the stoned ape theory, we need the stoned AI theory.

antonvs - 12 hours ago

> One explanation for why this game works is that there is only one way in which things are related, and this comes from the underlying world we live in.

I think what this might be trying to say is something more like: ...there are many ways in which things can be related, but those relationships come from the underlying world we live in.

I.e. there are obviously many ways in which things can be related, but if we assume the quote is not entirely counterfactual, then it must be getting at something else. I suppose "way" here is being used in a different sense, but it isn't clear.

amoss - 20 hours ago

When I read that "Ilya gave a famously incomprehensible talk about the connections between intelligence and compression" it makes me wonder if Marcus Hutter has now been forgotten? If so more people should take a look at http://prize.hutter1.net/

kindkang2024 - a day ago

The Dao can be spoken of, yet what is spoken is not the eternal Dao.

So, what is the Dao? Personally, I see it as will — something we humans could express through words. For any given will, even though we use different words in different languages — Chinese, Japanese, English — these are simply different representations of the same will.

Large language models learn from word tokens and begin to grasp these wills — and in doing so, they become the Dao.

In that sense, I agree: “All AI models might be the same.”

throwpoaster - 14 hours ago

I asked Grok, o3-pro, and Claude a question about piezoelectric effects.

They all got it "right", but Claude called out a second order effect that arose from the use case that the other two missed.

I get it, they might all be exploring the same space, but Claude went an extra, important, hop.

stillpointlab - a day ago

I have to be careful of confirmation bias when I read stuff like this because I have the intuition that we are uncovering a single intelligence with each of the different LLMs. I even feel, when switching between the big three (OpenAI, Google, Anthropic) that there is a lot of similarity in how they speak and think - but I am aware of my bias so I try not to let it cloud my judgement.

On the topic of compression, I am reminded of an anecdote about Heidegger. Apparently he had a bias towards German and Greek, claiming that these languages were the only suitable forms for philosophy. His claim was based on the "puns" in language, or homonyms. He had some intuition that deep truths about reality were hidden in these over-loaded words, and that the particular puns in German and Greek were essential to understand the most fundamental philosophical ideas. This feels similar to the idea of shared embeddings being a critical aspect of LLM emergent intelligence.

This "superposition" of meaning in representation space again aligns with my intuitions. I'm glad there are people seriously studying this.

stevage - a day ago

I tried playing Mussolini or Bread with ChatGPT, but it didn't go very well. It seemed to have trouble grasping the rules, and kept getting overly specific when we were miles from the right concept.

somethingsome - a day ago

Mmmh I'm deeply skeptical of some parts.

> One explanation for why this game works is that there is only one way in which things are related

There is not, this is a completely non transitive relationship.

On another point, suppose you keep the same vocabulary, but permute the signification of the words, the neural network will still learn relationships, completely different ones, but it's representation may converge toward a better compression for that set of words, but I'm dubious that this new compression scheme will ressemble the previous one (?)

I would say that given an optimal encoding of the relationships, we can achieve an extreme compression, but not all encodings lead to the same compression at the end.

If I add 'bla' between every words in a text, that is easy to compress, but now, if I add an increasing sequence of words between each words, the meaning is still there, but the compression will not be the same, as the network will try to generate the words in-between.

(thinking out loud)

pona-a - 17 hours ago

I think some of this effect is explained by different models being trained on almost identical corpa. Even if you try to erase certain works, contamination will still represent some of it, just in less obvious ways.

tgsovlerkhgsel - a day ago

I've noticed that many of the large, separately developed AIs often answer with remarkably similar wording to the same question.

megaloblasto - a day ago

Really cool idea that I hadn't considered yet. If true, seems like a big plus for open source and not having a few companies controll all the models. If they all converge to the same intelligence, one open source model would make all proprietary models obsolete.

- a day ago
[deleted]
dr_dshiv - a day ago

What about the platonic bits? Any other articles that give more details there?

empath75 - a day ago

This is kind of fascinating because I just tried to play mussolini or bread with chatgpt and it is absolutely _awful_ at it, even with reasoning models.

It just assumes that your answers are going to be reasonably bread-like or reasonably mussolini-like, and doesn't think laterally at all.

It just kept asking me about varieties of baked goods.

edit: It did much better after I added some extra explanation -- that it could be anything that it may be very unlike either choice, and not to try and narrow down too quickly

cultofmetatron - 12 hours ago

someone just discovered the AI equivalent of "everything turns to crabs"

darepublic - a day ago

Why is claude shannon closer to mussolini than to david beckham?

robertclaus - a day ago

The board game DaDaDa is a great example of this.

TheSaifurRahman - a day ago

Has there been research on using this to make models smaller? If models converge on similar representations, we should be able to build more efficient architectures around those core features.

derbOac - 19 hours ago

Seemed to me they are just saying that typical LLMs are consistent:

https://en.m.wikipedia.org/wiki/Consistent_estimator

- 20 hours ago
[deleted]
tyronehed - a day ago

Especially if they are all me-too copies of a Transformer.

When we arrive at AGI, you can be certain it will not contain a Transformer.

tempfile - 19 hours ago

I know it's only one tiny part, but I can't resist:

> Is it closer to Mussolini or David Beckham? Uhh, I guess Mussolini. (Ok, they’re definitely thinking of a person.)

This deduction is absurd! The only information you have at all is that it's more like a person than it is like bread (which could be almost anything).

foxes - a day ago

So in the limit the models representation space has one dimension per "concept" or something, but making it couple things together is what actually makes it useful?

An infinite dimensional model with just one dim per concept would be sorta useless, but you need things tied together?

Xcelerate - a day ago

Edit: I wrote my comment a bit too early before finishing the whole article. I'll leave my comment below, but it's actually not very closely related to the topic at hand or the author's paper.

I agree with the gist of the article (which IMO is basically that universal computation is universal regardless of how you perform it), but there are two big issues that prevent this observation from helping us in a practical sense:

1. Not all models are equally efficient. We already have many methods to perform universal search (e.g., Levin's, Hutter's, and Schmidhuber's versions), but they are painfully slow despite being optimal in a narrow sense that doesn't extrapolate well to real world performance.

2. Solomonoff induction is only optimal for infinite data (i.e., it can be used to create a predictor that asymptotically dominates any other algorithmic predictor). As far as I can tell, the problem remains totally unsolved for finite data, due to the additive constant that results from the question: which universal model of computation should be applied to finite data? You can easily construct a Turing machine that is universal and perfectly reproduces the training data, yet nevertheless dramatically fails to generalize. No one has made a strong case for any specific natural prior over universal Turing machines (and if you try to define some measure to quantify the "size" of a Turing machine you realize this method starts to fail once the number of transition tables becomes large enough to start exhibiting redundancy).

zoeey - 20 hours ago

[dead]

seeknotfind - a day ago

Platonism - not even once. Green is the smell of my grandmother's lawn on a hot summer day. Just because things are similar to a lot of people doesn't mean their fundamentally the same.

ieie3366 - a day ago

LLMs are bruteforce reverse engineered human brains. Think about it. Any written text out there is written by human brains. The ”function” to output this is whatever happens inside the brain, insanely complex.

LLM ”training” is just brute forcing the same function into existence. ”Human brain output X, llm output Y, mutate it times billion until X and Y start matching”

gerdesj - a day ago

The devil is in the details.

I recently gave the "Veeam Intelligence" a spin.

Veeam is a backup system spanning quite a lot of IT systems with a lot of options - it is quite complicated but it is also a bounded domain - the app does as the app does. It is very mature and has extremely good technical documentation and a massive amount of technical information docs (TIDs) and a vibrant and very well informed set of web forums, staffed by ... staff and even the likes of Anton Gostev - https://www.veeam.com/company/management-team.html

Surely they have close to the perfect data set to train on?

I asked a question about moving existing VMware replicas from one datastore to another and how to keep my replication jobs working correctly. In this field, you may not be familiar with my particular requirements but this is not a niche issue.

The "VI" came up with a reasonable sounding answer involving a wizard. I hunted around the GUI looking for it (I had actually used that wizard a while back). So I asked where it was and was given directions. It wasn't there. The wizard was genuine but its usage here was a hallucination.

A human might have done the same thing with some half remembered knowledge but would soon fix that with the docs or the app itself.

I will stick to reading the docs. They are really well written and I am reasonably proficient in this field so actually - a decent index is all I need to get a job done. I might get some of my staff to play with this thing when given a few tasks that they are unfamiliar with and see what it comes up with.

I am sure that domain specific LLMs are where it is at but we need some sort of efficient "fact checker" system.