Settings

Theme

I used RL fine-tuning to make an LLM generate ugly and unpythonic FizzBuzz code

seantey.github.io

4 points by seanrrr a month ago · 1 comment

Reader

seanrrrOP a month ago

I wrote up a blog post for a hackathon project where I used RL fine-tuning to make an LLM generate intentionally ugly and unpythonic FizzBuzz code. The post covers what I learned about reward shaping and GRPO. Feedback on the writing or content is welcome!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection