Probabilistic Tiny Recursive Model
Authors: Amin Sghaier, Ali Parviz, Alexia Jolicoeur-Martineau
Paper: https://arxiv.org/abs/2605.19943
Code: N/A
Model: N/A
TL;DR
WHAT was done? The authors introduce Probabilistic TRM (PTRM), a training-free test-time compute scaling framework that introduces stochastic exploration into pre-trained Tiny Recursive Models (TRMs) [review]. By injecting Gaussian noise into the latent state at each recursion step, PTRM generates multiple parallel trajectories to escape suboptimal local attractors and selects the best candidate using the model’s existing, pre-trained classification head (the Q head).
WHY it matters? This work demonstrates that massive, expensive autoregressive large language models (LLMs) are not the only path to mastering complex logical reasoning. Highly compact, non-autoregressive recursive networks (with only 5M to 7M parameters) can outperform frontier LLMs on structured constraint-satisfaction puzzles at less than 0.0001× the inference cost, illustrating the viability of scaling test-time compute in continuous latent space rather than discrete token space.
Details
The Attractor Bottleneck in Recursive Reasoning
A long-standing challenge in AI has been developing compact architectures capable of deep reasoning without the staggering parameter counts of modern large language models. The Tiny Recursive Model (TRM) [review] emerged as a compelling alternative, iteratively refining a low-dimensional latent state rather than generating token-by-token text. Despite its strong performance, TRM’s deterministic inference is highly prone to getting trapped in suboptimal regions of its latent space, which the authors formalize as bad basins (similar to a subsequent GRAM approach). Once a deterministic rollout falls into one of these attractors, it cannot escape, leading to incorrect final outputs. This limitation mirrors findings from concurrent mechanistic studies of the Hierarchical Reasoning Model (HRM) [review] by Ren and Liu, which required heavy, task-specific training-time augmentations to mitigate spurious fixed points. The core delta of PTRM is that it resolves this bottleneck stochastically at test-time on a single pre-trained checkpoint, requiring no retraining, architectural changes, or hand-crafted input perturbations.



