An online RL framework that empowers LLM agents to evolve — not just solve.
✅ Update: The code for RetroAgent with in-context and RL-trained self-reflection has been released — enjoy and have fun!
📖 Overview
RetroAgent is an online reinforcement learning framework that empowers LLM-based agents to master complex interactive environments through continuous self-improvement.
Standard RL paradigms favor static problem-solving over continuous adaptation: agents often converge to suboptimal strategies due to insufficient exploration, while learned knowledge remains implicit within parameters rather than explicitly retrievable. RetroAgent addresses these limitations through a hindsight self-reflection mechanism that produces dual intrinsic feedback:
- 🔢 Intrinsic Numerical Feedback — Tracks incremental subtask completion relative to prior attempts, rewarding promising explorations.
- 💬 Intrinsic Language Feedback — Distills reusable lessons into a memory buffer, retrieved via our proposed Similarity & Utility-Aware Upper Confidence Bound (SimUtil-UCB) strategy that balances relevance, utility, and exploration to effectively leverage past experiences.
RetroAgent - On-Policy RL Training Framework
🏆 Key Results
RetroAgent significantly outperforms existing methods across four challenging agentic tasks, achieving state-of-the-art results. Compared to GRPO-trained agents:
| Task | Improvement |
|---|---|
| ALFWorld | +18.3% |
| WebShop | +15.4% |
| Sokoban | +27.1% |
| MineSweeper | +8.9% |
RetroAgent also exhibits strong test-time adaptation and generalization to out-of-distribution scenarios, validated across two model families.
💡 All experiments were conducted on 4 × NVIDIA H200 GPUs.
🚀 Quick Start
Clone the Repository
git clone https://github.com/zhangxy-2019/RetroAgent.git
cd RetroAgent2. Install veRL (Base Environment)
We recommend using Conda to manage your environment.
conda create -n verl-agent python==3.12 -y
conda activate verl-agent
pip3 install vllm==0.11.0
pip3 install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
pip install -e .3. Install Supported Environments
Option A: w/ In-Context Self-Reflection
Navigate to the in-context directory and run the desired environment:
cd ./in_context_self_reflection
ALFWorld
conda env create -f agent-alfworld-env.yaml bash ./examples/grpo_trainer/run_alfworld_retroagent_grpo_in_context_self_reflection.sh
Webshop
conda env create -f agent-webshop-env.yaml bash ./examples/grpo_trainer/run_webshop_retroagent_grpo_in_context_self_reflection.sh
Sokoban
conda env create -f agent-sokoban-env.yaml bash ./examples/grpo_trainer/run_sokoban_retroagent_grpo_in_context_self_reflection.sh
MineSweeper
conda env create -f agent-sokoban-env.yaml bash ./examples/grpo_trainer/run_minesweeper_retroagent_grpo_in_context_self_reflection.sh
Option B: w/ RL-Trained Self-Reflection
Navigate to the in-context directory and run the desired environment:
cd ./rl_trained_self_reflection
ALFWorld
conda env create -f agent-alfworld-env.yaml bash ./examples/grpo_trainer/run_alfworld_retroagent_rl_trained_self_reflection.sh
Webshop
conda env create -f agent-webshop-env.yaml bash ./examples/grpo_trainer/run_webshop_retroagent_rl_trained_self_reflection.sh
Sokoban
conda env create -f agent-sokoban-env.yaml bash ./examples/grpo_trainer/run_sokoban_retroagent_rl_trained_self_reflection.sh
MineSweeper
conda env create -f agent-sokoban-env.yaml bash ./examples/grpo_trainer/run_minesweeper_retroagent_rl_trained_self_reflection.sh
🙏 Acknowledgments
This code is adapted from and built upon several amazing open-source repositories. We sincerely thank the authors and contributors of these projects for sharing their valuable work: verl (https://github.com/verl-project/verl), verl-agent (https://github.com/langfengQ/verl-agent/tree/master), and LaMer (https://github.com/mlbio-epfl/LaMer).
📬 Contact
For questions or discussions, please reach out to Xiaoying Zhang at zhangxycuhk@gmail.com.
📝 Citation
If you find our work useful, please consider giving us a ⭐ and citing our paper:
@article{zhang2026retroagent, title={RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback}, author={Zhang, Xiaoying and Liu, Zichen and Zhang, Yipeng and Hu, Xia and Shao, Wenqi}, journal={arXiv preprint arXiv:2603.08561}, year={2026} }
