ArXivIQ

ArXivIQ

World Models

Looped World Models

Jun 19, 2026
∙ Paid

Authors: Hongyuan Adam Lu, Z.L. Victor Wei, Qun Zhang, Jinrui Zeng, Bowen Cao, Lingwei Meng, Mocheng Li, Zezhong Wang, Haonan Yin, Naifu Xue, Minyu Chen, Cenyuan Zhang, Zefan Zhang, Hao Wei, Jiawei Zhou, Haoran Xu, Hao Yang, Ronglai Zuo, Tongda Xu, Yonghao Li, Jian Chen, Hebin Wang, Zeyu Gao, Yang Li, Wei Zhao, Qimin Zhong, Siqi Liu, Yumeng Zhang, Leyan Cui, Zhangyu Wang, Wai Lam
Paper: https://arxiv.org/abs/2606.18208
Code: N/A
Model: N/A

TL;DR

WHAT was done? The paper introduces Looped World Models (LoopWM), a novel recurrent-depth transformer architecture for world modeling. LoopWM leverages a parameter-shared transformer block that iteratively refines latent environmental representations alongside a mathematically guaranteed contractive state-retention mechanism and an adaptive early-exit strategy. This is further extended by a “Deferred Decoding” paradigm (LoopWM-DD) which rolls out action trajectories purely in the latent space, delaying decoding to the terminal step.

WHY it matters? Standard world models are severely limited by a fundamental trade-off: capturing long-horizon dynamics requires deep, parameter-dense architectures that are highly prone to compounding rollout errors and are expensive to deploy. LoopWM establishes iterative latent depth as an orthogonal scaling axis to physical model size, enabling up to 100× parameter efficiency. A 1-billion parameter LoopWM model outperforms massive closed-source systems, opening a practical pathway for deploying highly capable, stable physical simulators on power-constrained and real-time robotic edge platforms.

Details

The High-Horizon Bottleneck: Why Fixed-Depth World Models Diverge

The fundamental role of a world model in model-based reinforcement learning and embodied intelligence is to mimic physical environments. Architectures like PlaNet and Dreamer rely on recurrent state-space models to predict forward transitions entirely in the latent space. However, physical environments present non-uniform computational difficulties; a simple, predictable transition (such as free flight) requires minimal processing, whereas complex transitions (such as high-speed multi-body collisions) demand deeper representation mapping. Conventional fixed-depth models struggle here because they apply an identical allocation of compute to every transition. If the model is too shallow, it fails on complex steps, causing trajectory degradation over long rollouts due to compounding prediction errors. Conversely, scaling the model’s physical depth to resolve these errors leads to a proportional explosion in parameter counts, making real-time deployment on hardware targets prohibitively expensive. LoopWM addresses this structural tension by replacing deep, feed-forward stacks with a parameter-shared recursive transformer loop that dynamically scales its effective computational depth per step.

User's avatar

Continue reading this post for free, courtesy of Grigory Sapunov.

Or purchase a paid subscription.
© 2026 Grigory Sapunov · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture