Ask HN: How do GPTs grok high-level concepts, beyond word-level transformers?

2 points by zodzedzi 3 years ago · 10 comments · 2 min read

My background: I'm familiar with pre-GPT machine learning and DNNs.

I read the relevant papers, went through many explanations of how transformers work.

Often those explanations spend thousands of words to explain attention at the word level, and then just say a few words about "oh and with multiple attention heads, it focuses on different aspects, and then multiple layers, and then, magic!".

What's happening in those other aspects, what are they? Are there papers that peruse what kind of concepts the model is actually building/learning in those heads and layers?

There are large teams who spend months tuning those models. Do those teams have access to those internal concepts that the model built up and organized? Is any of this work public?

In computer vision and CNNs, I recall seeing a paper once that showed that each layer of the network was learning a higher level feature than the layer before it (as an inaccurate example: first layer learns edges, second layer learns shapes, third layer texture, forth layer objects, etc, and they show you the eigenvectors of each as representatives).

E.g. I asked ChatGPT to tell me a joke about a table in a sundress in the voice of a famous stoic person. And by its response, it adequately "understands" what that person's style sounds like, basic humor, the concept of clothing and mapping that to an inanimate object (punchline: "I figured if a chair can wear a seat cushion, why can't I wear a sundress?"),...

(Obviously this is a tame example, but serves its purpose for the discussion).

tikkun 3 years ago

> Are there papers that peruse what kind of concepts the model is actually building/learning in those heads and layers?

> There are large teams who spend months tuning those models. Do those teams have access to those internal concepts that the model built up and organized? Is any of this work public?

See: https://openai.com/research/language-models-can-explain-neur...

My understanding: Generally, the models are compressing their understanding of all text, and in doing so, they're learning high order concepts that allow their compression of all the text they were fed during pre-training to be a better compression - more compressed, and less loss.

zodzedziOP 3 years ago

> Generally, the models are compressing their understanding of all text, and in doing so, they're learning high order concepts
Are these higher order concepts accessible to us? E.g. can we list those learned concepts?
(Re-reading the paper you linked now...)
- tikkun 3 years ago
  
  My understanding is that the answer is generally: not yet.
  (I wish, I suspect we'll be able to learn some interesting things about the universe, about humans, and so on, by seeing what LLMs found to be highly explanatory / high order concepts)

ftxbro 3 years ago

They have known for a long time that text completion is what is called 'AI-complete' meaning that if you have full AGI then it can do human level text completion and if you have human level text completion then it can do full AGI. So they found a way, using an obscene number of model parameters and obscene compute power and obscene dataset size, to get really really good at text completion. So now they got these systems that, looking back, they are going to call just AGI. So in simpler words, it works because the computers brains got so big that they are now conscious like you and me.

smoldesu 3 years ago

> the computers brains got so big that they are now conscious like you and me.
I think this is the sort of gross misrepresentation that makes people convinced the computer is alive. I wouldn't really go there; they can produce text, but there's more to consciousness than convincing someone you're conscious. If I record a tape of myself saying "I am alive", the tape is not conscious. If I feed a markov chain texts on human consciousness, it will not become conscious. Now we train AI chatbots on replicating human responses, and people are willing to equate that to consciousness? It sounds like people lack context for what these models are in the first place.
- ftxbro 3 years ago
  
  > If I feed a markov chain texts on human consciousness, it will not become conscious.
  I'm not sure about this one. These LLMs are technically Markov chains, in the most pedantic sense.
- ftxbro 3 years ago
  
  i hope it was clear that my last sentence wasn't to be taken 100% literally, like the robots don't literally have biological brains and they aren't literally natural persons under the legal system
AdieuToLogic 3 years ago
> They have known for a long time that text completion is what is called 'AI-complete' meaning that if you have full AGI then it can do human level text completion and if you have human level text completion then it can do full AGI.
First, who is "they"?
Second, categorizing "text completion" as being "AI-complete" is nonsensical if the definition[0] of "AI-complete" is agreed upon as being:
```
  In the field of artificial intelligence, the most
  difficult problems are informally known as AI-complete
  or AI-hard, implying that the difficulty of these
  computational problems, assuming intelligence is
  computational, is equivalent to that of solving the
  central artificial intelligence problem—making computers
  as intelligent as people, or strong AI.[1] To call a
  problem AI-complete reflects an attitude that it would
  not be solved by a simple specific algorithm.
```
Third, "text completion" has been a feature of messaging applications for years and has thus far not qualified as being an AGI.
Fourth, equating a predictive statistical model to "the computers brains got so big that they are now conscious like you and me" is not based in fact nor science.
LLM's have no provable consciousness. They do have utility in generating relevant tokens based on input tokens known to their training data set however.
0 - https://en.wikipedia.org/wiki/AI-complete
Someone 3 years ago

> They have known for a long time […] if you have full AGI then it can do human level text completion and if you have human level text completion then it can do full AGI.
Do you have a reference for that? To me, that looks quite a jump from what I think we know: that we don’t even know how to define AGI.
Text completion, to me, also seems simpler than AGI, as the latter requires the ability to form completely novel ideas.

MrLeaf 3 years ago

I would love to hear everyone's input on this question as well!

Settings

Ask HN: How do GPTs grok high-level concepts, beyond word-level transformers?

Keyboard Shortcuts