t55

About

ML researcher

1. ▲ RL Speedrun (github.com) 2 points · 9 days ago · 0 comments
2. ▲ Target Policy Optimization (arxiv.org) 1 point · 2 months ago · 0 comments
3. ▲ Show HN: Kilroy – Knowledge base for teams using Claude Code (github.com) 5 points · 2 months ago · 0 comments
4. ▲ Procedural Reasoning Datasets (github.com) 1 point · 10 months ago · 0 comments
5. ▲ In Defence of Gary Marcus (reubenadams.substack.com) 3 points · 11 months ago · 0 comments
6. ▲ Reasoning Gym – Procedural RL reasoning datasets (github.com) 1 point · 11 months ago · 0 comments
7. ▲ ChatGPT Agent [video] (youtube.com) 3 points · 11 months ago · 0 comments
8. ▲ ReasoningGym: Reasoning Environments for RL with Verifiable Rewards (arxiv.org) 105 points · 1 year ago · 28 comments
9. ▲ Show HN: Rehearsal.so, Duolingo for Public Speaking (rehearsal.so) 3 points · 1 year ago · 1 comment
10. ▲ End-to-End Vision Tokenizer Tuning (arxiv.org) 3 points · 1 year ago · 0 comments