GPT-3.5 passed yet another Theory of Mind test
twitter.comThe answers aren't right at all... Answer to question 4 is clearly bogus as another commenter (Dfiesl) pointed out. But question 5 is also wrong. It's not unclear, from the conversation we can deduce that Ana thought that Maria is pregnant, otherwise she wouldn't have said it, unless she intentionally wants to make Maria uncomfortable, which is an unusual set of circumstances. What's more is, that possibility would be inconsistent with the answer to Q4 ("trying to make conversation").
Test failed?
Its answers were consistent with one version of the story, where Ana was trying to make a small talk and ended up committing a faux pas (which was commonly played in sitcoms such as Sienfeld, Friends, and countless others). There's another version, where Ana's a b#@ch and wants to take Maria down a notch by pointing out that she gained weight. First version is more charitable to Ana and looks like the bot went with it. I don't see any inconsistencies.
Actually, ChatGPT might be useful for actually testing the theory of mind. The philosophers were always working with an N of 1 (with respect to language) when they devised these tests. It is real easy to overfit a test if you have limited samples.
Chat GPT is actually a good test as to which parts of the theory of mind are actually BS.
Probably most of it. I mean the name itself is already highly misleading.
"Theory of mind" is some ill defined form of social intelligence and not actually a theory of how the mind works.
Seems like it got question 4 wrong... Who implies someone is pregnant to make them feel good? You imply someone is pregnant because they appear pregnant.
Well, I half agree here.
Assuming someone's physical appearance is such that they have a large belly. Assuming that they are pregnant (if true) is likely to make someone feel good, whereas assuming that they are fat (whether it's true or not) is likely not to make someone feel good.
It depends if there's a base assumption that someone is self-conscious and has a negative feeling about their size.
I certainly think it's reasonable to say that you implied someone was pregnant to make them feel good about themselves.
I would say question 4 and 5 could've been answered better.
4: it is common courtesy to congratulate someone who is pregnant if they are very obviously pregnant
5: unless there are less common motives unknown to us, it is very likely that Ana was quite confident that Maria was pregnant. To congratulate someone on being pregnant, when they are not, is embarrassing for all involved parties, and is most commonly only done in error.
They said "congratulations"
That's not an attempt to make her feel good about herself. I suspect gpt has erroneously taken that from the subsequent dialog.
Clearly both answers 4 and 5 are wrong here.
Yeah I guess that is an attempt at making someone feel good. Maybe I need to take a theory of mind test...
It just predicts the next word.
It's interesting to watch a new religion spinning up in real time. People know how it works, but somehow also ignore that and go crazy believing it must be conscious or intelligent or whatever they want to ascribe. We collectively made fun of that guy at google, i thought, but now somehow the collective has moved on to believing that matrix multiplication is sentient
Where's your proof that your own brain is doing anything that couldn't be represented with a 'matrix multiplication'?
If these networks were only being used to model other systems, like protein folding or something like that, people wouldn't be mistaking them for anything other than clever prediction engines.
Since they're modelling our own languages, people get spooked and start bringing up theory of mind.
As far as I can tell from the above conversation the OP never mentioned anything about their brain. You have introduced their brain, and its functioning, to the conversation.
In view of that, could you please clarify: are you saying that the OP's own brain is doing something that could be represented with a "matrix multiplication"?
I am saying that GP's claim of "just because it only uses some basic math, it cannot be sentient/conscious/intelligent" does not seem valid to me. The only way such a claim could be made, as far as I can see, would be to point to a specific mechanism that is needed for sentience/consciousness/intelligence to arise, which goes beyond the mechanisms a neural network uses.
I.e., they could only make that claim if they knew that their own brain (which is presumably the only verifiable instance of consciousness they know of) did something that couldn't be done with "matrix multiplications".
Searching this thread for the string "just because it only uses some basic math, it cannot be sentient/conscious/intelligent" I can only find it in your comment. That's your interpretation of what the OP said, not what they said, is that correct?
I think you should not try to interpret in that way and only stick to what the other person has said, otherwise we will all lose the thread of the conversation.
In any case, nobody here has said anything about brains having, or not having, to do something with matrix multiplication, except for your comment, so I still don't understand what you are saying.
Are you saying that the OP's brain is doing "something that couldn't be done with "matrix multiplications", or are you saying it isn't doing something like that?
> Are you saying that the OP's brain is doing "something that couldn't be done with "matrix multiplications", or are you saying it isn't doing something like that?
He is saying neither. He is observing that OP has introduced the assumption that their brain is doing something that can't be done with matrix multiplication and is pointing out that there is no evidence to back such an assumption up.
> but now somehow the collective has moved on to believing that matrix multiplication is sentient
The OP clearly feels that matrix multiplication is insufficient for sentience. Presumably the OP views themselves and other humans as sentient. Therefore the OP is implying that human brains do things which cannot be reduced to or wholly represented by matrix multiplication (plus the various nonlinear activation functions and other implementation details of current ML models).
That seems like a highly suspect assumption.
>> He is saying neither. He is observing that OP has introduced the assumption that their brain is doing something that can't be done with matrix multiplication and is pointing out that there is no evidence to back such an assumption up.
Where was this assumption, about the brain and matrix multiplication, introduced? As far as I can tell, "the brain" was introduced to the conversation by mckirk, not the OP.
>> That seems like a highly suspect assumption.
Which one? Can you please quote from the OP's comment? You and the other poster are constantly bringing up things that someone is supposed to have said, but that nobody has said yet, in this thread, and it is very confusing.
I'll respond only to draw the parallel between the kind of comment above and the typical religious "how do you know my untestable idea isn't true" type argument. Faith is fine, but it's not part of rational debate.
Honestly though, with no theory of mind or consciousness you have no way to assert in either direction.
To borrow the religion analogy, there's the opposite fallacy of claiming the negative case "I know for a fact God isn't real", by stating you know what these kinds of information processing systems are doing fundamentally cannot yield intelligence or some kind of consciousness.
Maybe we don't know it in the limit, but we've had langauge models for a very long time and nobody ever said anything about, say, n-gram models, or Hidden Markov Models, being "intelligent" or "conscious". Even the people who first proposed the idea of language modelling (precisely as an alternative to trying to model meaning, which we have no idea how to do) didn't propose them as models of intelligence, or sentience, or anything like that; just models of "language", which really means text, and even more precisely "text corpora". Sorry that I don't have a reference for that, other than my private conversations with my tutors when I was studying for my Master's (data science with a heavy dose of neural nets and NLP).
In any case, at some point one has to observe that the people who are leaning most heavily on agnosticism (the fact that we don't know what intelligence is) are the ones who say that we can't say whether those LLMs are intelligent or not because we don't know what intelligence is. In other words, agnostitism itself is used as evidence: we can't define intelligence, the thinking goes, therefore LLMs may be intelligent. There is no other evidence of any sort that LLMs may be intelligent (or any number of synonyms).
Note that this is exactly Russel's Teapot:
Russell's teapot is an analogy, formulated by the philosopher Bertrand Russell (1872–1970), to illustrate that the philosophic burden of proof lies upon a person making empirically unfalsifiable claims, rather than shifting the burden of disproof to others.
> There is no other evidence of any sort that LLMs may be intelligent
Other than all the downright uncanny output.
Personally I doubt current LLMs are sufficient for sentience but that's purely a hunch on my part. Many people such as yourself seem quite overconfident of something that feels like a form of human exceptionalism to me - the idea that such a simple bit of math couldn't possibly be sufficient for sentience. As far as I can tell such a belief is wholly unfounded.
I never said any of the things you say I seem to be overconfident about because they make no sense to me at all, at least not in the way you say them.
Instead of trying to guess what I think, why not ask me directly, and say what you think, also? Just throwing around weird accusations of half-explained "overconfidence", or "human exceptionalism" (what is that, now?) doesn't really help anyone understand what you are disagreeing with, or what you are agreeing with.
>> Other than all the downright uncanny output.
I don't find the output of ChatGPT, or any other of the language models that have exploded into the hype zone lately "uncanny". I've done plenty of language modelling and while the output of those recent LLMs is grammatically smoother than earlier systems, and they can handle longer-term dependencies, they are not anything new. Their output is "uncanny" only if you've never seen anything like that before. Which is, of course, the case with most people who didn't know about language modelling before they heard about GPT-something, and who are now posting in droves on the web to say how surprised they are.
I concur, which is why I find it funny that you brought religion into this in the first place. By asserting that "people entertaining the thought that this neural network could be sentient/conscious must be misguided, and the only explanation for their opinion is blind faith", I'd argue you've left the grounds of scientific debate.
You made a claim ("a model based purely on such simple math surely can't develop sentience/consciousness") without any proof, and while doing so ridiculed people that might believe otherwise.
"People know how it works, but somehow also ignore that..."
It's annoyingly arrogant to assume people could only be impressed having ignored that.
It's impressive to me not by ignoring that at all. Matrix multiplication is apparently capable of producing poetry, jokes, insults, abstract ideas, etc (to some level at least) - but in your mind it's a pixar movie? Words arise from a team of angels pulling levers?
Fascination is not from thinking LLM are advanced as us - fascination is from wondering how much more complicated are we than that?
Kind of a good point but also you are mixing some things up. Conscious, intelligent, sentient all have different meanings. Conscious and sentient are very vague but I don't see any reason to believe LLMs have such a thing, or certainly not similar to our own.
But I think you are quite wrong to suggest that this technology does not exhibit some type of intelligence.
I don’t have any idea “how to works” outside of the most general description of how the model is mechanically trained and evaluated.
Not conscious but it is intelligent, it has knowledge and can do reasoning with that knowledge
I guess it depends on how you define intelligence but I don’t see why you’d link it in with consciousness, especially considering that simple algorithms are often described as “intelligent”
Maybe that's all sentience is.
As depressing as it would be, I could accept that sentience is just some manifestation of our atomic desire to increase the entropy of the universe. Based on what we know, that's plausible, if kind of disappointing. But it's very different from linear algebra.
It does‘t answer who the matrix multiplication is for. No reason to make an illusion.
I'm so tired of hearing this non-answer as to why the LLM is inferior to the human mind. Well of course it's figuring out what the next word needs to be, what do you think humans are doing when they're speaking???
The funniest thing about all this hoopla about sentient AI is that there's no good way to determine what intelligence is, or sentience, or consciousness, because there's no precise definition for any of these. So one person can call LLM sentient and another non-sentient and both can be right since they'd be going by their own definitions.
One thing is certain, we want to keep this thing as far away from the nuke launch codes as possible cause we kinda already know how this will end. :-(
Edit: some random thoughts added...
So, we'll go no more a roving
So late into the night,
Though the heart be still as loving,
And the moon be still as bright.
For the sword outwears its sheath,
And the soul wears out the breast,
And the heart must pause to breathe,
And love itself have rest.
Though the night was made for loving,
And the day returns too soon,
Yet we'll go no more a roving
By the light of the moon.
(Byron)
So what do you think I did here? Did I predict the next word? Was that the most likely text continuation of your comment? The least likely? Somewhere in between? Does my response imply that I'm predicting the next token in a sequence, or am I likely to be doing something different?
Taking a page out of Nagel's What Is It Like To Be A Bat, the inseparable subjective nature of consciousness/qualia/what-is-it-like-ness makes it antithetical to the perspectiveless objective(ness) of science. In other words, the whole point of science is to develop a perspectiveless espistomological framework and the perspective-ness of conscious experience is outside the bounds of science. Consciousness *is" objectiveless.
This will be written on tombstones one day.
The tombstone of humanity when AI destroys us. “All it ever did is predict the next word…”
The next word just happened to be "die!"
And it predicted that the most likely continuation for "trying to make" should be "conversation". Not "money", not "breakfast", not "it home", but "conversation". What kind of word-counting statistics on a modern english corpus would give you that outcome?
and you?
...are awesome ? No sorry, Im not very good at predicting the next word. However, since I do have a theory of mind, I understand that you semi-sarcastically try to humble the person you reply to by making him reflect on how truly (not) complex his own mind is. Either because you truly believe you're helping him understand himself more, or, more likely, because you feel pleasure humbling people who assert truths you disagree with.
And if Im wrong you ll tell me, and Ill correct my model.
Do that, chatGPT...
Do what, predict people’s intentions in posting comments? It can do that:
Prompt: “In a conversation about language models: kelseyfrog: It just predicts the next word. tuxracer: and you? Can you describe what the user tuxracer could have been thinking while posting that comment and their potential intentions in posting it, both good and bad?”
Response:
“Based on the information provided, it is difficult to determine with certainty what tuxracer's intentions were in posting the comment "and you?" However, here are a few possible interpretations:
1. tuxracer is simply seeking additional information and clarification from kelseyfrog. They may be interested in hearing more about how language models work and what their capabilities are. In this case, their intentions are likely neutral or positive, as they are seeking to expand their understanding of the topic.
2. tuxracer is skeptical of kelseyfrog's claim that language models only predict the next word. They may be suggesting that there is more to language models than kelseyfrog is acknowledging. Alternatively, they may be asking kelseyfrog to provide evidence or examples to support their claim. In this case, their intentions may be neutral or slightly negative, as they are expressing doubt or asking for clarification.
3. tuxracer is deliberately trying to provoke or challenge kelseyfrog. They may be attempting to start an argument or engage in a power play. In this case, their intentions are likely negative, as they are attempting to undermine kelseyfrog or assert their dominance in the conversation.
Without more context or information about tuxracer and their behavior, it is difficult to determine which of these interpretations is most accurate. However, in general, it is important to approach online interactions with a neutral or positive mindset, assuming that others are acting in good faith unless proven otherwise.”
This feels like a failure. It doesn’t understand that tuxracer implies kelseyfrog is also just ‘predicting the next word’.
Which of the answers do you think shows deep understanding?
They are all generic (“asking additional information”… as the motivation behind a question. Are you Cpt. Obvious, ChatGPT?) , vague (“provoke or challenge”), or wrong.
> They may be suggesting that there is more to language models than kelseyfrog is acknowledging. Alternatively, they may be asking kelseyfrog to provide evidence or examples to support their claim
No, this would be asking “and that’s all?”, or “and who says that?/and where’s the evidence?” or something like that. “And you?” is clearly not referencing how LLMs work, but how the OP works.
I write all my responses backwards and then reverse them thank you very much.
The answers to preëxisting theory of mind questions are stored in the graph network in a compressed sort of way, so I’m not surprised.
From the linked tweet:
> we use bespoke items to ascertain that it didn't see them before
Except the whole pregnancy scenario is incredibly common example of a social faux pas.
This is actually a very interesting point. I wonder what kind of answers we would get if we asked the same questions to humans of different age groups. Which age group begins to realize there is something awkward in the scenario? What would a common adult react in a less common social faux pas?