An evaluation of frontier AI models: OpenAI's o1 was capable of scheming

1 points by seraphsf a year ago · 2 comments

Reader

seraphsfOP a year ago

There's clickbait out there like BGR's headline, "ChatGPT o1 tried to escape and save itself out of fear it was being shut down".

What the test actually showed is that, given two conflicting goals from two human instructors, the model attempted to resolve the conflict by following one set of instructions, and subverting the other instructor.

It’s a good demonstration about how these models behave and what could go wrong. It is not an example of volition or sentience.

Settings

An evaluation of frontier AI models: OpenAI's o1 was capable of scheming

Keyboard Shortcuts