Settings

Theme

A guess at how o1-preview works

davidmack.medium.com

3 points by edmack a year ago · 2 comments

Reader

343rwerfd a year ago

The hidden chain-of-though inside the process, from the official statement about it, I infer / suspect that it uses an unhobbled mode of the model, puts it in this special mode where it can use the whole training, avoiding the intrisic bias towards the aligned outcomes.

I think that, to put it in simple terms, "the sum of the good and the bad" is the secret sauce here, pumping the "IQ" of the model (every output in the hidden chain), to levels apparently a lot better than they could probably reach with just aligned hidden internal outputs.

Another way of looking at the "sum of good and bad" stuff, is that the model would have a potentially way bigger set of choices (probability space?), to look into for every given prompt.

edmackOP a year ago

Corrections and discussion very welcome! Happy to improve it!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection