RISE: Self-Improving Robot Policy with Compositional World Model
The first study on leveraging world models as an effective learning environment for challenging real-world manipulation, bootstrapping performance on tasks requiring high dynamics, dexterity, and precision.
Jiazhi Yang1,2*† Kunyang Lin2* Jinwei Li2,6* Wencong Zhang2* Tianwei Lin5 Longyan Wu4 Zhizhong Su5 Hao Zhao6 Ya-Qin Zhang6 Li Chen3 Ping Luo3 Xiangyu Yue1♮ Hongyang Li3♮
1 The Chinese University of Hong Kong 2 Kinetix AI 3 The University of Hong Kong
4 Shanghai Innovation Institute 5 Horizon Robotics 6 Tsinghua University
* Equal Contribution Project Lead Equal Advising
Scroll

At a Glance

System Overview

RISE shifts the RL environment to a Compositional World Model, which first emulates future observations for proposed actions, then evaluates imagined states to derive advantage for policy improvement.

Contributions

RISE, a principled framework for robotic reinforcement learning, that enables autonomous self-improvement in a scalable and online manner. RISE overcomes the physical restrictions in prior arts by shifting robotic interactions from physical environment to imaginative space.
Compositional World Model, an online learning environment that builds reliable dynamics and value estimates for real-world tasks. We unveil critical design choices to derive informative learning signals for policy improvement.
Through extensive experiments on dexterous manipulation tasks, we demonstrate that RISE exhibits significantly higher performance compared to existing RL methods.

Real-World Autonomous Execution

Continuous Success achieved by RISE

Autonomous execution on three complex tasks.


Comparison: RISE vs. Baselines

Left: RISE (Ours). Right: Baselines (Composite video including RECAP, π0.5, π0.5+DSRL).

RISE (Backpack Packing)
Baselines (Backpack Packing)
RISE (Dynamic Brick Sorting)
Baselines (Dynamic Brick Sorting)
RISE (Box Closing)
Baselines (Box Closing)

Robustness & Generalization

Self-Correction & Robustness
Spatial Generalization

Methodology

Key Component#1: Compositional World Model

Inference pipeline that yields rewarded samples for policy optimization. Both modules are compatible with multi-view images.

Key Component#2: Self-Improving Loop

Top: Rollout stage. Prompted with an optimal advantage, the rollout policy interacts with the world model to produce rollout data.
Bottom: Training stage. The behavior policy is then trained to generate proper action under an advantage-conditioning scheme.

Results

Main Results

RISE outperforms all baselines with +35% on brick sorting, +45% on backpack packing, and +35% on box closing through world model imagination.

Metric:

Dynamic Brick Sorting: Success (%)

Backpack Packing: Success (%)

Box Closing: Success (%)


Ablations

Ablation studies conducted on the Dynamic Brick Sorting task.

Left: Optimal offline ratio (0.6) balances demonstrations with online rollouts.
Right: Both online actions and states are essential; full RISE with all components achieves best performance.

Ablation: Offline data ratio

Ablation: Online action/state integration

Sort Accuracy
Pick&Place Success Rate
Complete Success Rate

Dynamics Model Comparisons

Qualitative comparison: controllable future prediction under candidate actions.

Abstract

Despite the sustained scaling on model capacity and data acquisition, Vision–Language–Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.

BibTeX

Copy the citation:
Copy
@article{rise2026,
  title={RISE: Self-Improving Robot Policy with Compositional World Model},
  author={Yang, Jiazhi and Lin, Kunyang and Li, Jinwei and Zhang, Wencong and Lin, Tianwei and Wu, Longyan and Su, Zhizhong and Zhao, Hao and Zhang, Ya-Qin and Chen, Li and Luo, Ping and Yue, Xiangyu and Li, Hongyang},
  year={2026}
}