Most AI scaling conversations still default to one strategy: bigger models, more data, longer context. HRM and TRM added a different axis in 2025: more recursive computation at inference time without proportionally increasing parameter count.
This post summarizes the key ideas from recent HRM/TRM research and the Decoded discussion transcript you shared, then maps those ideas to practical model-design choices.
TL;DR
| Question | Short answer |
|---|---|
| What changed? | HRM/TRM showed that small models can gain strong reasoning behavior via recursive latent refinement loops. |
| Why was it interesting? | Reported ARC-style results were strong relative to model size and training data. |
| Core mechanism | Reuse the same weights repeatedly over internal states (z, z_low, or equivalent) at inference and training. |
| HRM key idea | Two-timescale recursion (high-level + low-level modules). |
| TRM key idea | Simplify to one tiny shared network and keep recursive refinement + deep supervision. |
| Big takeaway | Inference-time recursion is a meaningful compute axis, not just parameter scaling. |
Primary sources:
- HRM paper: arXiv:2506.21734
- TRM paper (“Less is More”): arXiv:2510.04871
- ARC benchmark context: ARC Prize
Why Recursion Re-entered the Conversation
A transformer forward pass is highly parallel and efficient for training, but many reasoning tasks are effectively multi-step algorithms. If the task is hard to compress into one pass, performance can bottleneck even when the model is large.
In the transcript, this is framed as a gap between:
- Token-space iteration (chain-of-thought and tool calls)
- Latent-space iteration (internal recursive state updates)
That distinction matters. Token-space traces are useful, but they are constrained by discrete outputs and supervision artifacts. Latent recursion can keep iterative computation inside a continuous state space.
HRM in One Page
Hierarchical Reasoning Model (HRM) proposes two interacting recurrent modules:
- A high-level module for slower abstract updates
- A low-level module for faster local computation
At a high level, training repeatedly:
- Initializes internal states
- Runs nested recursion loops
- Applies a supervised objective
- Repeats refinement
Reported results in the paper include strong performance on reasoning-heavy tasks (including ARC-style settings) with a relatively small parameter budget and limited training samples.
Reference: Wang et al., 2025
What TRM Kept, What TRM Removed
Tiny Recursive Model (TRM) keeps the core recursive refinement intuition but simplifies architecture and training design.
From the paper’s framing:
- Replace dual-network hierarchy with a single tiny shared network
- Keep recursive latent/output refinement
- Use deep supervision-style training across refinement iterations
The paper reports that this simplified setup outperformed HRM on key ARC-AGI metrics while using fewer parameters.
Reference: Jolicoeur-Martineau, 2025
Chain-of-Thought vs Latent Recursion
A useful way to reason about the difference:
| Approach | Iteration medium | Typical failure mode |
|---|---|---|
| Chain-of-thought | Tokens (external text) | Verbose traces, brittle decomposition, inherited token errors |
| Tool-use loops | Tokens + external API calls | Bounded by tool availability and prior knowledge |
| HRM/TRM recursion | Continuous latent state | Training stability and optimization details become central |
This does not make chain-of-thought obsolete. It reframes it as one recursion interface, not the only one.
Why ARC-Style Tasks Fit This Direction
ARC-style problems emphasize abstraction and stepwise transformation. They are often hard to solve via a single direct mapping from input to output.
Recursive latent refinement is naturally aligned with these tasks because it allows:
- Iterative hypothesis updates
- Intermediate state correction before final output
- More compute depth without proportional parameter growth
That is the core reason these papers attracted attention: not just scoreboards, but a different compute strategy.
Engineering Implications
If you are building reasoning systems, these papers suggest a practical design checklist:
-
Separate model capacity from compute depth Capacity (parameters) and iterative depth (recursion steps) should be tuned independently.
-
Treat recursion loops as first-class hyperparameters Refinement steps, supervision depth, and state-reset behavior can matter as much as width/depth.
-
Benchmark for algorithmic generalization, not only text fluency Include tasks where single-pass pattern matching fails.
-
Expect hybrid architectures General-purpose pretrained models plus compact recursive reasoning heads/modules is a plausible near-term direction.
Limits and Open Questions
Important caveats:
- HRM/TRM are not drop-in replacements for broad conversational LLM products.
- Reported gains are strongest on specific reasoning benchmarks; transfer breadth remains an open question.
- Training dynamics (especially truncated backprop choices and recursion schedules) are still under active study.
- Benchmark-specific optimization risk always exists; cross-domain validation is essential.
Practical Positioning in 2026
The most realistic interpretation is not “small recursive models replace frontier LLMs.”
It is: recursive inference is a complementary scaling law. The field can continue scaling pretrained world models while adding stronger latent recursive computation where algorithmic reasoning is the bottleneck.
That matches where many labs are heading across agent systems and reasoning stacks: combine broad priors with targeted iterative computation.
Related ExplainX Reads
- AI Benchmarks in 2026
- LLM Context Window Explained (2026)
- What Are Agent Skills? Complete Guide
- AI Models Hallucinate: Why and How to Catch It
Source Notes
This article is based on:
- Your provided Decoded transcript content
- HRM primary paper: arXiv:2506.21734
- TRM primary paper: arXiv:2510.04871
- ARC benchmark site: arcprize.org
Paper results and benchmark standings can change with revised evaluations, replications, and new benchmark versions. Verify against the latest arXiv revisions and ARC Prize updates.