← Blog
explainx / blog

Recursive Reasoning in 2026: HRM, TRM, and Why Inference-Time Recursion Matters

A technical guide to Hierarchical Reasoning Models (HRM) and Tiny Recursive Models (TRM): architecture, training tricks, ARC-AGI results, and what recursive inference changes for reasoning systems.

5 min readYash Thakker
Recursive ReasoningHRMTRMARC-AGIInference-Time ComputeAI ResearchReasoning Models

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Recursive Reasoning in 2026: HRM, TRM, and Why Inference-Time Recursion Matters

Most AI scaling conversations still default to one strategy: bigger models, more data, longer context. HRM and TRM added a different axis in 2025: more recursive computation at inference time without proportionally increasing parameter count.

This post summarizes the key ideas from recent HRM/TRM research and the Decoded discussion transcript you shared, then maps those ideas to practical model-design choices.


TL;DR

QuestionShort answer
What changed?HRM/TRM showed that small models can gain strong reasoning behavior via recursive latent refinement loops.
Why was it interesting?Reported ARC-style results were strong relative to model size and training data.
Core mechanismReuse the same weights repeatedly over internal states (z, z_low, or equivalent) at inference and training.
HRM key ideaTwo-timescale recursion (high-level + low-level modules).
TRM key ideaSimplify to one tiny shared network and keep recursive refinement + deep supervision.
Big takeawayInference-time recursion is a meaningful compute axis, not just parameter scaling.

Primary sources:


Why Recursion Re-entered the Conversation

A transformer forward pass is highly parallel and efficient for training, but many reasoning tasks are effectively multi-step algorithms. If the task is hard to compress into one pass, performance can bottleneck even when the model is large.

In the transcript, this is framed as a gap between:

  • Token-space iteration (chain-of-thought and tool calls)
  • Latent-space iteration (internal recursive state updates)

That distinction matters. Token-space traces are useful, but they are constrained by discrete outputs and supervision artifacts. Latent recursion can keep iterative computation inside a continuous state space.


HRM in One Page

Hierarchical Reasoning Model (HRM) proposes two interacting recurrent modules:

  • A high-level module for slower abstract updates
  • A low-level module for faster local computation

At a high level, training repeatedly:

  1. Initializes internal states
  2. Runs nested recursion loops
  3. Applies a supervised objective
  4. Repeats refinement

Reported results in the paper include strong performance on reasoning-heavy tasks (including ARC-style settings) with a relatively small parameter budget and limited training samples.

Reference: Wang et al., 2025


What TRM Kept, What TRM Removed

Tiny Recursive Model (TRM) keeps the core recursive refinement intuition but simplifies architecture and training design.

From the paper’s framing:

  • Replace dual-network hierarchy with a single tiny shared network
  • Keep recursive latent/output refinement
  • Use deep supervision-style training across refinement iterations

The paper reports that this simplified setup outperformed HRM on key ARC-AGI metrics while using fewer parameters.

Reference: Jolicoeur-Martineau, 2025


Chain-of-Thought vs Latent Recursion

A useful way to reason about the difference:

ApproachIteration mediumTypical failure mode
Chain-of-thoughtTokens (external text)Verbose traces, brittle decomposition, inherited token errors
Tool-use loopsTokens + external API callsBounded by tool availability and prior knowledge
HRM/TRM recursionContinuous latent stateTraining stability and optimization details become central

This does not make chain-of-thought obsolete. It reframes it as one recursion interface, not the only one.


Why ARC-Style Tasks Fit This Direction

ARC-style problems emphasize abstraction and stepwise transformation. They are often hard to solve via a single direct mapping from input to output.

Recursive latent refinement is naturally aligned with these tasks because it allows:

  • Iterative hypothesis updates
  • Intermediate state correction before final output
  • More compute depth without proportional parameter growth

That is the core reason these papers attracted attention: not just scoreboards, but a different compute strategy.


Engineering Implications

If you are building reasoning systems, these papers suggest a practical design checklist:

  1. Separate model capacity from compute depth Capacity (parameters) and iterative depth (recursion steps) should be tuned independently.

  2. Treat recursion loops as first-class hyperparameters Refinement steps, supervision depth, and state-reset behavior can matter as much as width/depth.

  3. Benchmark for algorithmic generalization, not only text fluency Include tasks where single-pass pattern matching fails.

  4. Expect hybrid architectures General-purpose pretrained models plus compact recursive reasoning heads/modules is a plausible near-term direction.


Limits and Open Questions

Important caveats:

  • HRM/TRM are not drop-in replacements for broad conversational LLM products.
  • Reported gains are strongest on specific reasoning benchmarks; transfer breadth remains an open question.
  • Training dynamics (especially truncated backprop choices and recursion schedules) are still under active study.
  • Benchmark-specific optimization risk always exists; cross-domain validation is essential.

Practical Positioning in 2026

The most realistic interpretation is not “small recursive models replace frontier LLMs.”

It is: recursive inference is a complementary scaling law. The field can continue scaling pretrained world models while adding stronger latent recursive computation where algorithmic reasoning is the bottleneck.

That matches where many labs are heading across agent systems and reasoning stacks: combine broad priors with targeted iterative computation.


Related ExplainX Reads


Source Notes

This article is based on:

Paper results and benchmark standings can change with revised evaluations, replications, and new benchmark versions. Verify against the latest arXiv revisions and ARC Prize updates.

Related posts