Learning to Drive via Real-World Simulation at Scale

Haochen Tian1,2,3    Tianyu Li2    Haochen Liu3    Jiazhi Yang2    Yihang Qiu2
Guang Li3    Junli Wang1,3    Yinfeng Gao1,3    Zhang Zhang1    Liang Wang1    Hangjun Ye3
Tieniu Tan1    Long Chen3    Hongyang Li2
1 MAIS, Institute of Automation, Chinese Academy of Sciences 
2 OpenDriveLab at The University of HongKong      3 Xiaomi EV
Work done while interning at Xiaomi Embodied Intelligence Team
Image description

Scaling Up End-to-End Planners by Simulation.
(a) We construct large-scale simulation data by perturbing ego trajectories, generating corresponding pseudo-expert demonstrations, and rendering multi-view observations in reactive environments. Combined with real-world data, this enables broad coverage of out-of-distribution states and supports sim–real co-training for any end-to-end planner.
(b) Across three representative planner families, including regression, diffusion, and vocabulary scoring, sim-real co-training consistently produces synergistic improvements in robustness and generalization, demonstrating clear and predictable simulation scaling trends.

Abstract

Achieving fully autonomous driving systems requires learning rational decisions in a wide span of scenarios, including safety-critical and out-of-distribution ones. However, such cases are underrepresented in real-world corpus collected by human experts. To complement for the lack of data diversity, we introduce a novel and scalable simulation framework capable of synthesizing massive unseen states upon existing driving logs. Our pipeline utilizes advanced neural rendering with a reactive environment to generate high-fidelity multi-view observations controlled by the perturbed ego trajectory. Furthermore, we develop a pseudo-expert trajectory generation mechanism for these newly simulated states to provide action supervision. Upon the synthesized data, we find that a simple co-training strategy on both real-world and simulated samples can lead to significant improvements in both robustness and generalization for various planning methods on challenging real-world benchmarks, up to +6.8 EPDMS on navhard and +2.9 on navtest. More importantly, such policy improvement scales smoothly by increasing simulation data only, even without extra real-world data streaming in. We further reveal several crucial findings of such a sim-real learning system, which we term SimScale, including the design of pseudo-experts and the scaling properties for different policy architectures.

Simulation Data Pipeline

Image description

Pseudo-Expert Scene Simulation. (a) Trajectory perturbation on T to T + H, (b) reactive environment rollout, and pseudo-expert trajectory generation from T + H to T + 2H under recovery-based and planner-based strategies.

Leaderboard Results

 NAVSIMv2 navhard

Image description

*: pseudo-expert supervision; †: reward scoring; S.: per-stage EPDM score.

 NAVSIMv2 navtest

Image description

*: pseudo-expert supervision; †: reward scoring.

Scaling Property Analysis

Image description

Scaling Dynamics with Different Planners and Pseudo-Expert Trajectories. Visualization how simulation data scale and supervision signals influence the driving performance of various planners, where the infection point indicates learning plateau

Qaulitative Results

 Pseudo-Expert Scene Simulation

Image description

Sim. 1


Sim. 2


Sim. 3


 Simulated OOD Scenes

Image description
More Results
Image description

Ecosystem

The code and simulation data would be open-sourced before 2026!

Please stay tuned for more work from OpenDriveLab: R2SE, ReSim, NEXUS, MTGS, Centaur .

BibTeX

If you find the project helpful for your research, please consider citing our paper:
@article{tian2025SimScale,
        title={SimScale: Learning to Drive via Real-World Simulation at Scale},
        author={xxxx},
        journal={arXiv preprint arXiv:xxx},
        year={2025}
      }