ArXivIQ

ArXivIQ

ELF: Embedded Language Flows

May 18, 2026
∙ Paid

Authors: Keya Hu, Linlu Qiu, Yiyang Lu, Hanhong Zhao, Tianhong Li, Yoon Kim, Jacob Andreas, Kaiming He
Affiliation: MIT
Paper: https://arxiv.org/abs/2605.10938
Code: https://github.com/lillian039/ELF
Model: N/A

TL;DR

WHAT was done? The authors introduced Embedded Language Flows (ELF), a novel continuous diffusion language model utilizing continuous-time Flow Matching. ELF operates entirely within a high-dimensional continuous embedding space, using a shared-weight network to perform denoising, and applies discretization to map embeddings back to discrete tokens exclusively at the final generative time step.

WHY it matters? This framework successfully challenges the prevailing assumption that text generation requires inherently discrete diffusion algorithms. By demonstrating superior generation quality with significantly fewer sampling steps and requiring a magnitude fewer training tokens than leading discrete models, ELF paves the way for unifying the underlying architectures of text, image, and video generative systems.

Executive summary: For research scientists and technical leaders evaluating the next generation of multimodal architectures, the bifurcation between continuous diffusion for vision and discrete diffusion (or autoregression) for language has been a persistent architectural friction. ELF demonstrates that the historical underperformance of continuous diffusion language models stems from design choices—specifically, intermediate per-step discretization—rather than an innate incompatibility with language. By leveraging a single shared-weight network for both the continuous flow and the final discrete projection, ELF allows language models to inherit the scaling laws, training stability, and sampling techniques (like classifier-free guidance) that have recently propelled image generation models.

Details

The Discrete vs. Continuous Bottleneck

The divergence in modern generative modeling is stark: while visual domains have overwhelmingly adopted continuous-time Flow Matching and diffusion, language modeling has remained stubbornly tied to discrete spaces. Recent attempts to force language into continuous diffusion frameworks, such as CDCD, historically underperformed. To compensate, the field pivoted toward discrete diffusion language models (DLMs) operating directly over categorical variables. The core bottleneck in prior continuous attempts was the insistence on applying per-step discretization losses—such as rounding or token-level cross-entropy—throughout the intermediate denoising steps. This design tightly coupled the continuous trajectory to the vocabulary space, restricting the flow dynamics. ELF fundamentally alters this delta by abandoning intermediate discretization, allowing the diffusion trajectory to evolve unhindered in a continuous embedding space and translating to discrete text only at the absolute end of the process.

User's avatar

Continue reading this post for free, courtesy of Grigory Sapunov.

Or purchase a paid subscription.
© 2026 Grigory Sapunov · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture