When Are Two RLHF Objectives the Same?

View PDF HTML (experimental)

Abstract:The preference optimization literature contains many proposed objectives, often presented as distinct improvements. We introduce Opal, a canonicalization algorithm that determines whether two preference objectives are algebraically equivalent by producing either a canonical form or a concrete witness of non-equivalence. Applying Opal reveals that many widely used methods optimize the same underlying objective, while others are provably distinct. For example, batch normalization can cause the same response pair to receive different gradients depending on batch composition. We identify a small set of structural mechanisms that give rise to genuinely different objectives; most remaining differences are reparameterizations.

Submission history

From: Madhava Gaikwad [view email]
[v1] Sun, 14 Sep 2025 14:42:39 UTC (897 KB)
[v2] Thu, 5 Feb 2026 17:34:59 UTC (34 KB)