GitHub - Think-a-Tron/zeno: Verifiable RL rewards for LLMs

Zeno

Verifiable RL rewards for LLMs that actually make sense.

Why Zeno?

Provides verifiable, interpretable reward functions for finetuning LLMs, especially for code, and soon, other domains.
Keeps all rewards transparent and hackable.
No secret sauce. No LLM-as-a-judge. No magical "alignment". Just actual auditable rules.

If you care about trusting your rewards, and being able to debug them, start with Zeno.

Installation

git clone https://github.com/think-a-tron/zeno.git
cd zeno
uv add ./zeno

What do you get?

Zeno ships with a set of plug-and-play reward functions for Python code completions for now. All rewards are stateless, verifiable, and don't require extra config or setup.

Reward	What it rewards or penalizes	Score Range
`reward_docstrings`	Proportion of functions/classes with docstrings	0.0 to 1.0
`reward_lint`	Fewer lint errors (via `ruff`), normalized per line	0.0 to 1.0
`reward_direct_recursion`	Presence/absence of direct recursion (configurable)	1.0 / 0.0 / -1.0
`reward_list_comprehension`	Use (or avoidance) of list comprehensions	1.0 / 0.0 / -1.0
`reward_type_hints`	Fraction of args/returns with type hints	0.0 to 1.0
`reward_exception_handling`	Has at least one try/except (reward) or none (penalize)	1.0 / -1.0
`reward_functional`	Prefers functional (no class) over OOP style	1.0 / -1.0

Usage with trl

You can plug them directly into trl as your reward function.

from trl import GRPOTrainer
from zeno.code.python import reward_lint

trainer = GRPOTrainer(
    model="Qwen/Qwen3-0.6B",
    reward_funcs=[reward_lint],
)

Contributing

Want to add more rewards? PRs welcome, but keep it verifiable and explainable. No black-box models.

Roadmap

Stepwise rewards for math reasoning.
Multi turn rewards for tool use.

License

MIT