kumama
- Karma
- 4
- Created
- 9 years ago
Recent Submissions
- 1. ▲ Open-Weight Models Don't Need to Win (twitter.com)
- 2. ▲ Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads (castform.com)
- 3. ▲ Pokegents: Making multi-agent coding feel like a team (castform.com)
- 4. ▲ Grpo explained: group relative policy optimization for LLM finetuning (cgft.io)
- 5. ▲ Do RL on a model with your vector db (cgft.io)
- 6. ▲ What is reinforcement learning finetuning (youtube.com)
- 7. ▲ RAG to riches: synthetic data for training RAG agents (cgft.io)
- 8. ▲ rag not lag: rl for fast agentic retrieval (cgft.io)
- 9. ▲ Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning (github.com)
- 10. ▲ Beating o3/o4-mini with Codebase-specific Reinforcement Learning (cgft.io)