Discrete Tilt Matching

1 min read Original article ↗

View PDF HTML (experimental)

Abstract:Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

Submission history

From: Yuyuan Chen [view email]
[v1] Mon, 20 Apr 2026 18:43:37 UTC (6,659 KB)
[v2] Sat, 16 May 2026 03:48:01 UTC (6,659 KB)
[v3] Tue, 19 May 2026 01:39:31 UTC (6,659 KB)