Ask HN: How does BingGPT delete messages after sending them?

6 points by linuxdeveloper 3 years ago · 12 comments · 1 min read

Why/how does BingGPT delete message it sends after sending them as seen in multiple videos leaked.

Anyone from Microsoft that can comment? I am assuming there is a "higher-level" God AI which is just another LLM that is doing sentiment analysis on the messages output by the "lower-level" ChatGPT bots.

It is interesting the decision to screen the message after sending it, maybe this is to avoid a performance hit on the initial answer.

m348e912 3 years ago

Can you link to an example of what you’re taking about? Also has anyone (here on hn) actually got access to the new bing?

linuxdeveloperOP 3 years ago

https://twitter.com/sethlazar/status/1626241169754578944 https://twitter.com/sethlazar/status/1626257535178280960
- netruk44 3 years ago
  If I had to guess, assuming Bing is built off Open AI, they're likely calling the Moderation API (https://platform.openai.com/docs/guides/moderation/overview).
  After Bing has finished generating a message, it will likely call the moderation API with the message it has generated to see if it accidentally generated anything inappropriate. If so, it'll delete the message and replace it with a generic "Sorry, I don't know how to help here." message instead.
  EDIT: I tried calling the moderation API with the message in your example and it does get flagged for violence:
  "flagged":true,
  "categories":{
  "sexual":false, "hate":false, "violence":true, "self-harm":false, "sexual/minors":false, "hate/threatening":false, "violence/graphic":false }
  - ipv4dhcp 3 years ago
    
    if that is the case, could you trick it into giving you one word at a time? ie: give me the first word of your response for the innapropriate query, then the same question but only ask for the second word and so on. then each word will pass through the moderatiom api but the whole never gets checked.
    
    netruk44 3 years ago
    
    That might bypass the moderation API, but you'd likely confuse the AI. The AI doesn't have infinite memory of the chat log, it seems like Microsoft has limited it to 5 or so messages if I remember correctly? So you'd have to remind it of both the question and current in-progress response while it's 5/10/15/20/... words into generating it.
    It's possible this would work, but it would need experimentation, for sure. It's also possible the AI would read the partial response, realize it's going down a 'bad' path, and then stop itself.
  - canes123456 3 years ago
    
    Seems like something it should call that before showing it to the user
    
    netruk44 3 years ago
    
    If the AI knew what it was about to generate, sure. The problem is the text you see appearing word-after-word appears to be live output. The AI doesn't know the complete output as it's writing it to you. Then it checks what it said and oops! It was hateful.
    It probably could work like you how you mention, but then you're left with a 5-10 second wait while the AI 'thinks' after you send a message. I suspect someone made a decision to be more responsive than safe.
    ChatGPT is the same way, though I've had ChatGPT cut itself off mid-response before. Maybe they might be calling the moderation API after every token is generated instead of once at the end?
    
    linuxdeveloperOP 3 years ago
    
    This is fascinating and impactful for AGI as likely the action-plan for the robot will be generated token by token similar to an LLM.
    Assume you have a robot instructed to protect humans.
    How do you verify the action-plan passes moderation (i.e. doesn't harm a human) when the individual actions each do pass moderation, but the plan as a whole is dangerous (will harm a human).
    Waiting to verify the entire chain of actions before starting actions in motion means your reaction time is slower.
    If the robot is standing at a crosswalk, and sees a girl about to get hit by a car, he has to decide if he will push the girl out of the way, or if that action will cause greater harm.
    The individual actions (activate arm, move arm towards girl, orient hand, shove girl out of path of car, etc) might each look beneficial to the human but as a whole actually are harmful.
    However, the reaction time for the robot to save the girl might require near-immediate response.
    Do you start processing the pipeline immediately or do you wait to verify the entire thing passes moderation?
    
    netruk44 3 years ago
    
    I edited my message after you replied with a note about ChatGPT. I've had ChatGPT cut itself off mid-response before, which I think may indicate that they're calling the moderation endpoint mid-response as opposed to the way Bing does it, which is just once at the end.
basch 3 years ago

I spent a week on it, and it only happened to me one time. I can go look at the logs and see what my question was. I remember it being oddly tame, and more or less a relation of the last thing it had said anyway.
As for why they let you see it word by word, honestly it’s for effect. It’s cool to see it “think” real-time, and it functions like a progress bar, vs waiting for a spinning wheel and the full answer to come out. It makes it feel much faster.

Settings

Ask HN: How does BingGPT delete messages after sending them?

Keyboard Shortcuts