Zeno
Verifiable RL rewards for LLMs that actually make sense.
Why Zeno?
- Provides verifiable, interpretable reward functions for finetuning LLMs, especially for code, and soon, other domains.
- Keeps all rewards transparent and hackable.
- No secret sauce. No LLM-as-a-judge. No magical "alignment". Just actual auditable rules.
If you care about trusting your rewards, and being able to debug them, start with Zeno.
Installation
git clone https://github.com/think-a-tron/zeno.git
cd zeno
uv add ./zenoWhat do you get?
Zeno ships with a set of plug-and-play reward functions for Python code completions for now. All rewards are stateless, verifiable, and don't require extra config or setup.
| Reward | What it rewards or penalizes | Score Range |
|---|---|---|
reward_docstrings |
Proportion of functions/classes with docstrings | 0.0 to 1.0 |
reward_lint |
Fewer lint errors (via ruff), normalized per line |
0.0 to 1.0 |
reward_direct_recursion |
Presence/absence of direct recursion (configurable) | 1.0 / 0.0 / -1.0 |
reward_list_comprehension |
Use (or avoidance) of list comprehensions | 1.0 / 0.0 / -1.0 |
reward_type_hints |
Fraction of args/returns with type hints | 0.0 to 1.0 |
reward_exception_handling |
Has at least one try/except (reward) or none (penalize) | 1.0 / -1.0 |
reward_functional |
Prefers functional (no class) over OOP style | 1.0 / -1.0 |
Usage with trl
You can plug them directly into trl as your reward function.
from trl import GRPOTrainer from zeno.code.python import reward_lint trainer = GRPOTrainer( model="Qwen/Qwen3-0.6B", reward_funcs=[reward_lint], )
Contributing
Want to add more rewards? PRs welcome, but keep it verifiable and explainable. No black-box models.
Roadmap
- Stepwise rewards for math reasoning.
- Multi turn rewards for tool use.