Settings

Theme

RLHF: Reinforcement Learning from Human Feedback

huyenchip.com

4 points by madisonmay 3 years ago · 1 comment

Reader

heliophobicdude 3 years ago

This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection