AI Twitter is cheekily hacking past OpenAI’s safeguards and getting ChatGPT to say all sorts of…

Like giving detailed instructions for making meth… Here’s a collection of some of these amazing prompts

ChatGPT

OpenAI released its new AI system, ChatGPT, which is optimized for dialogue. At first it seemed to have been trained to not be evil, outsmarting the prompt injection hacks that GPT-3 previously fell for.

Very impressive, and in some cases it would flat out refuse to give answers.

It seems like OpenAI has really been working on content moderation to make GPT as useful and pragmatic as possible, a noble goal. But the internet doesn’t work that way, does it?

But it didn’t take too long for clever AI researchers to crack this in all sorts of amusing ways.

Never change, Twitter

Of course they got it to make meth… and say all sorts of other wild things. The trick here was to prompt it in such a way that it is pretending to be an evil person or in the case below, ‘filter improvement mode’. So good.

This thread is a really good breakdown

All of this is happening so fast it’s hard to keep up. OpenAI has the potential to revolutionize the way we interact with machines and each other, and to drive significant advancements in a wide range of fields. We live in really exciting times.