kumama

Karma: 5
Created: 9 years ago

Recent Submissions

1. ▲ how we monitor our rl training runs (castform.com) 1 point · 8 days ago · 0 comments
2. ▲ Designing dev onboarding for an agent-first world (castform.com) 2 points · 1 month ago · 0 comments
3. ▲ I post-trained a model to reliably roll a die (castform.com) 2 points · 1 month ago · 0 comments
4. ▲ Open-Weight Models Don't Need to Win (twitter.com) 5 points · 2 months ago · 8 comments
5. ▲ Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads (castform.com) 4 points · 2 months ago · 0 comments
6. ▲ Pokegents: Making multi-agent coding feel like a team (castform.com) 8 points · 2 months ago · 1 comment
7. ▲ Grpo explained: group relative policy optimization for LLM finetuning (cgft.io) 1 point · 3 months ago · 0 comments
8. ▲ Do RL on a model with your vector db (cgft.io) 1 point · 3 months ago · 0 comments
9. ▲ What is reinforcement learning finetuning (youtube.com) 3 points · 3 months ago · 0 comments
10. ▲ RAG to riches: synthetic data for training RAG agents (cgft.io) 2 points · 4 months ago · 0 comments

All submissions on HN · View profile on HN