Settings

Theme

Come down a rabbit hole with OpenAI o1 system card

portkey.ai

1 points by roh26it a year ago · 1 comment

Reader

roh26itOP a year ago

The most interesting findings (at least from my perspective) - OpenAI measured "honesty" that could become an interesting model eval others should adopt. - Using instruction heirarchy to _avoid_ prompt injections - When pushed to pursue misaligned goals, it tried to disable oversight in 5% of tests!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection