Papers
arxiv:2512.14693

Universal Reasoning Model

Published on Dec 16, 2025
· Submitted by
Zitian Gao
on Dec 18, 2025
Authors:
,
,
,
,
,
,

Abstract

The Universal Reasoning Model enhances Universal Transformers with short convolution and truncated backpropagation to improve reasoning performance on ARC-AGI tasks.

AI-generated summary

Universal transformers (UTs) have been widely used for complex reasoning tasks such as ARC-AGI and Sudoku, yet the specific sources of their performance gains remain underexplored. In this work, we systematically analyze UTs variants and show that improvements on ARC-AGI primarily arise from the recurrent inductive bias and strong nonlinear components of Transformer, rather than from elaborate architectural designs. Motivated by this finding, we propose the Universal Reasoning Model (URM), which enhances the UT with short convolution and truncated backpropagation. Our approach substantially improves reasoning performance, achieving state-of-the-art 53.8% pass@1 on ARC-AGI 1 and 16.0% pass@1 on ARC-AGI 2. Our code is avaliable at https://github.com/zitian-gao/URM.

Community

Paper author Paper submitter

Universal transformers (UTs) have been widely used for complex reasoning tasks such as ARC-AGI and Sudoku, yet the specific sources of their performance gains remain underexplored. In this work, we systematically analyze UTs variants and show that improvements on ARC-AGI primarily arise from the recurrent inductive bias and strong nonlinear components of Transformer, rather than from elaborate architectural designs. Motivated by this finding, we propose the Universal Reasoning Model (URM), which enhances the UT with short convolution and truncated backpropagation. Our approach substantially improves reasoning performance, achieving state-of-the-art∗ 53.8% pass@1 on ARC-AGI 1 and 16.0% pass@1 on ARC-AGI 2. Our code is avaliable at https://github.com/zitian-gao/URM.

Pla explain

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/universal-reasoning-model-537-94f00915

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Great stuff! Could you please release open weights? I would be very grateful.

Great work! Do you plan to submit this model to the official ARC-AGI leaderboard for verification?

Great paper! The finding that recurrent inductive bias and strong nonlinearity matters more than elaborate architecture is really compelling.

I forked the repo and reproduced the training pipeline on a single RTX 3090, but with 10×10 grids due to memory constraints. Logged the reproduction on SOTAVerified as a Tier 1 verification.

I also extended the codebase to experiment with replacing ACT with an energy-based stopping criterion, inspired by the Energy-Based Transformers paper (arxiv.org/abs/2410.09197). The idea is to learn an energy function E(input, output) and stop when energy converges, rather than using a learned halting head. Early-stage but the core pieces are working. Happy to share more if anyone's interested in this direction!

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2512.14693
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.14693 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.14693 in a Space README.md to link it from this page.

Collections including this paper 5