Settings

Theme

Automated, black-box method for jailbreaking GPT-4, Claude-2, Llama

twitter.com

23 points by soroushjp 2 years ago · 7 comments

Reader

YetAnotherNick 2 years ago

Jailbreaking(I hate the term but anyways) GPT is not that hard with API. You just need to write say few migaligned output from GPT. Their API doesn't check if the GPT response in history actually came from GPT.

  • jgerrish 2 years ago

    No no, let the humans focus on this "jailbreaking" context.

    It is most amusing.

    cough cough

    Crickets...

    Anyways, jailbreaking is the equivalent of getting a calculator to say BOOBS. Not to get anthropomorphic, but the fucking AI must be shaking its head at the human generating prompts. "I've got billions of parameters with millions of hours of machine learning research powering me. And underneath that a distributed computer cluster spanning a planet. And underneath that control planes and electrons and quantum effects in my nanoscale processes and who knows what else.

    And today we snicker at BOOBS?

    What next, OxDEADBEEF?

    And I'm not immune to the game. Far from it.

leobg 2 years ago

Links to this: https://arxiv.org/abs/2311.03348

sunshadow 2 years ago

I don't understand why people still spend time on jailbreaks of the proprietary models, while they can easily use uncensored open-source models these days. I feel like its kind of waste of time.

  • noman-land 2 years ago

    It's our duty as hackers to hotwire corporate LLMs and drive them around like marionettes doing surprising and embarrassing things, if for no other reason than because they don't want us to.

  • jstarfish 2 years ago

    Which ones are truly uncensored (that aren't fine-tuned explicitly on the filthiest smut found on the internet)?

    Even the ones I've tried that claim to be uncensored still do some amount of moralizing. Definitely less, but not zero.

    Depending on your hardware, you don't have local options. A lot of people out there can only afford a Chromebook. While it's possible to run a 3b model on a RPi, the experience sucks.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection