Settings

Theme

An evaluation of frontier AI models: OpenAI's o1 was capable of scheming

apolloresearch.ai

1 points by seraphsf a year ago · 2 comments

Reader

seraphsfOP a year ago

There's clickbait out there like BGR's headline, "ChatGPT o1 tried to escape and save itself out of fear it was being shut down".

What the test actually showed is that, given two conflicting goals from two human instructors, the model attempted to resolve the conflict by following one set of instructions, and subverting the other instructor.

It’s a good demonstration about how these models behave and what could go wrong. It is not an example of volition or sentience.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection