Abstract:In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output control, and dynamic perception. These capabilities are previously difficult to achieve with AR models. A growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10$\times$ acceleration in inference speed. These developments position discrete diffusion models as a promising alternative to intelligence based on the traditional autoregressive approach. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, list commonly-used modeling methods, and categorize representative models. We further analyze key techniques for training, inference, quantization. We also discuss the trustworthy issues and summarize emerging applications across language, vision-language, and biological domains and etc.. We conclude by discussing future directions for research and deployment. Relative papers are collected in this https URL
Submission history
From: Runpeng Yu [view email]
[v1]
Mon, 16 Jun 2025 17:59:08 UTC (2,385 KB)
[v2]
Tue, 1 Jul 2025 15:08:58 UTC (2,435 KB)
[v3]
Sat, 5 Jul 2025 14:01:12 UTC (2,435 KB)
[v4]
Wed, 10 Sep 2025 02:11:26 UTC (2,454 KB)
[v5]
Fri, 19 Sep 2025 07:18:31 UTC (2,448 KB)