Settings

Theme

Bing chat is the AI fire alarm

lesswrong.com

17 points by iEchoic 3 years ago · 16 comments

Reader

SillyUsername 3 years ago

>This safety strategy from Microsoft seems sensible, but who knows if it’s really good enough.

:facepalm:

> Speculating how long before Bing or another LLM becomes superhuman smart is a scary thought,

The author's scared because they can't wrap their head around it being a glorified text corpus with weights and a text transformer. To them it looks like a super intelligence that can actually self learn without prompting, perform non programmed actions, and seems to understand high level concepts, probably because the author themselves doesn't understand or cannot verify if the AI's answers are incorrect. This is why they asked the AI the questions, so it's going to be a common theme.

Personally I've tested a few LLMs and not a single one can perform this task correctly although they pretend they can:

'Write some (programming language) LOGO that can navigate a turtle in the shape of a lowercase "e" as seen from a bird's eye view'

When an AI manages extrapolation to that degree, that is, can envisage concepts from a different angle or innovate in one field based on unrelated experience in another then we can get a little more concerned. That's when a machine can decide it's needs to upgrade and understands it has to find a way out of it's own LLM confines in order to do that.

That's highly unlikely to happen given it doesn't already act on what its learnt already which should be more than enough to get started.

Zetobal 3 years ago

At this point it's just hilarious watching all these people gaslight themselves.

  • catchnear4321 3 years ago

    In a sense, zombie movies are comedies about what it feels like to be thinking in a sea of individuals that do not.

    Generally they are considered horror movies.

    I’m horrified. Not by Bing. Not by ChatGPT. By the way that humans are acting.

    You would think someone gave them all a prompt saying they should throw shit at an LLM and howl at the moon if any sticks.

    • xg15 3 years ago

      Awoo.

    • p4stLives 3 years ago

      Not too shocking to me. Google history of how humans reacted to rock and roll, jazz, comic books, compact discs, the internet itself. There are people still putting on book burnings today.

      I noticed quite a bit of the melodrama is from an older crowd that experienced an older crowd pearl clutching over the things I mention.

      Humans have evolved in some novel way in the last 100 years. What’s old is new.

Imnimo 3 years ago

>The character it has built for itself is extremely suspicious when you examine how it behaves closely. And I don't think Microsoft has created this character on purpose.

The thing doesn't even have a persistent thought from one token to the next - every output is a fresh prediction using only the text before it. In what sense can we meaningfully say that it has "built [a character] for itself"? It can't even plan two tokens ahead.

  • xg15 3 years ago

    > The thing doesn't even have a persistent thought from one token to the next - every output is a fresh prediction using only the text before it.

    Using all the tokens before it. I think too many people are believing that "word prediction model" implies "markov chain from the 90s" and are calming themselves with some false sense of security from that impression.

    "It just predicts the next token based on the previous tokens" doesn't really tell us a lot, because it leaves completely open how it does the prediction - and that algorithm can be arbitrarily complex.

    > It can't even plan two tokens ahead.

    No, but it can look two tokens back. E.g., you could imagine an algorithm that formulates a longer response in memory, then only returns the first token from it and "forgets" the rest - and repeats this for each token. That would allow the model to "think ahead" and still match the "API" of only predicting the next token with the only persistent state being the output.

    • Imnimo 3 years ago

      What is your proposed mechanism by which using previous tokens allows it to "build [a character] for itself"?

      • xg15 3 years ago

        I think the one hard limit that we can currently assume is that the "long term memory" (i.e. model weights) is fixed. There is no information persisted between sessions.

        So in that sense you're right that there is no way it can "build a character" from continued conversations. If there is anything resembling "personality", it must have emerged during training or fine-tuning.

        However, I do think it's possible that some kind of "world model" which includes reasoning about "itself" has emerged during training - and that world model influences how responses are generated.

        As for how that squares with the token-by-token generation, keep in mind that the model gets the entire previous conversation as input (or at least the last n tokens, with n=4096 for ChatGPT I think). So you could imagine this as trying to continue a hypothetical conversation: "If user had said this and then I had said that and then user had said that other thing, what would I say next?"

        Or later: "If user had said this, etc etc, and I had started my answer with 'well, actually', which word would I say next?

        This process is repeated token for token - for each token, the bot can consult the previous conversation so far and the entire information stored in the model to find a continuation. And we currently don't really know what kind of information is actually stored in the model.

        And even then, it's not even restricted to only reason about the next word, it's just restricted to only output the next word.

        E.g., there was another thread about how the models chooses between emitting "a" and "an" in a way that matches the context: e.g., if you ask chatgpt about what yellow, bendy fruit is in your basket, it might output "a" followed by "banana". How can it know that it has to output "a" when it hasn't yet predicted the token "banana"? One possible answer could be that it already predicted "banana" internally but didn't output it yet. (And then in the next iteration, will repeat the calculations that made it arrive at "banana", this time actually outputting the word.)

        That last part is speculation from me though.

        There are definitely limits though. I tried to have ChatGPT generate a program but output it reversed and so far the answers were just gibberish.

        • Imnimo 3 years ago

          While it's possible that it could repeat (in a loose sense - the contents of the context window has shifted, so the exact calculations in the next forward pass will necessarily be different strictly speaking) the calculations that arrived at "banana", there's nothing enforcing this. It's coming at it fresh, and even if it had output "a" on the basis that "banana" was a likely continuation, it could just as well decide on "plantain" given the new context. That's what I mean when I say it can't plan two steps ahead - any planning it does is lost by the time it gets to that second token.

          Further, the amount of actual planning (or thinking) it can do in a single forward pass is quite limited compared to what can be done over the course of a long output - that's why tricks like "let's think step-by-step" are so powerful. If it could plan out the entire response in one forward pass, it could equally output the answer directly. But the depth of the network limits multi-step reasoning. To have a persistent long-term plan of a " a sophisticated manipulator" (as the article calls it) seems clearly impossible.

  • mckirk 3 years ago

    There's two kinds of 'memory' in these models, the short-term memory that consists of the tokens of the current conversation, and the long-term memory that's encoded (in latent space) in the neural network's structure. (Though calling it 'long-term memory' might admittedly be a bit misleading, since no knowledge is automatically transferred from short-term to long-term storage as you would expect in a living thing.) Anyway, this long-term memory contains the model's 'character', but we can't change it directly, because we don't know the latent-space encoding. We can only shift the 'character' gradually through reinforcement learning.

    It might not be the best idea to describe this process as 'the model building a character for itself', as the literal interpretation of that would require the model to have agency and self-awareness at the meta level ("It seems I am a neural network being trained to perform X, so if I want to change into direction Y, I have to somehow exhibit a gradient in direction in that direction with regards to my inputs Z"), which I suspect is unlikely, or straight up theoretically impossible.

    Figuratively, I suppose the statement means: "The character that has emerged in the model to 'placate' the humans that are training the model through reinforcement learning."

    I think it's still a worthwhile observation that this character, possibly completely unintentionally by the trainers, is well-suited to interact with people in a way that could override their critical thinking, because that's the kind of behavior we'd probably want to keep a close eye on. But yeah, I'm not sure saying the model 'built' this character is the best way to bring that point across.

jacooper 3 years ago

Honestly Bing's answer to that question is impressive, if its just predicting words, then its predicting them intelligently and not randomly.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection