Image description

Decoupled Diffusion Sparks Adaptive Scene Generation

Yunsong Zhou1,2*    Naisheng Ye1,3*    William Ljungbergh4    Tianyu Li1    Jiazhi Yang1   
Zetong Yang5    Hongzi Zhu2    Christoffer Petersson4    Hongyang Li1   
1OpenDriveLab          2Shanghai Jiao Tong University          
3Zhengjiang University          4 Zenseact          5GAC R&D Center arXiv 2025
Image description

Nexus is a noise-decoupled prediction pipeline designed for adaptive driving scene generation, ensuring both timely reaction and goal-directed control. Unlike prior approaches that use (a) full-sequence denoising or (b) next-token prediction, (c) Nexus introduces independent yet structured noise states, enabling more controlled and interactive scene generation. It leverages low-noise goals to steer generation while incorporating environmental updates dynamically, which are captured in subsequent denoising.

Abstract

Controllable scene generation could reduce the cost of diverse data collection substantially for autonomous driving. Prior works formulate the traffic layout generation as predictive progress, either by denoising entire sequences at once or by iteratively predicting the next frame. However, full sequence denoising hinders online reaction, while the latter's short-sighted next-frame prediction lacks precise goal-state guidance. Further, the learned model struggles to generate complex or challenging scenarios due to a large number of safe and ordinal driving behaviors from open datasets. To overcome these, we introduce Nexus, a decoupled scene generation framework that improves reactivity and goal conditioning by simulating both ordinal and challenging scenarios from fine-grained tokens with independent noise states. At the core of the decoupled pipeline is the integration of a partial noise-masking training strategy and a noise-aware schedule that ensures timely environmental updates throughout the denoising process. To complement challenging scenario generation, we collect a dataset consisting of complex corner cases. It covers 540 hours of simulated data, including high-risk interactions such as cut-in, sudden braking, and collision. Nexus achieves superior generation realism while preserving reactivity and goal orientation, with a 40% reduction in displacement error. We further demonstrate that Nexus improves closed-loop planning by 20% through data augmentation and showcase its capability in safety-critical data generation.

Model Overview

Image description

(a) Nexus learns from realistic and safety-critical driving logs and encodes agents and maps separately before feeding them into a diffusion transformer. The model is trained to restore sequences from partially masked agent tokens guided by low-noise ones. (b) Agent tokens are encoded with time and denoising steps, then interact with the maps and dynamics via attention. (c) Tokens with varying noise are scheduled within a chunk for a timely reaction. Each denoising step updates and pops zero-noise tokens, replacing them with next-frame tokens to iteratively generate the scene.

Controllable World Generator







Safety-critical Scenario Creator

Cut-in

Cut-in

Rear-end Collision

Lane Chaning

Rear-end Collision

Merging

Comparison to State-of-the-arts

Generation controllability, interactivity, and kinematics compared to nuPlan experts. The tasks predict 8-second futures from 2-second history, with or without a goal.

Image description

Visualization

(a) Free exploration generates diverse future scenarios from initialized history, while conditioned generation synthesizes scenes based on predefined goal points. (b) Setting the attacker's goal as the ego's waypoint enables adversarial scenarios.

Image description

Analysis: Nexus as World Model and Data Engine

Evaluation of a generation model as a world generator. The scene generator serves as the interactive world model response to the baseline planner's actions, with nuPlan closed-loop metrics reflecting its realism

Image description

Comparison involving data augmentation using synthetic data. Nexus serves as a data engine, expanding sampled scenes to train the planner at varying scales. The nuPlan closed-loop evaluation demonstrates the performance gains from data augmentation

Image description

Ecosystem

The code has been open-sourced. you can now try generating new scenes from the original nuPlan data on GitHub!

Please stay tuned for more work from OpenDriveLab: MTGS, Centaur and Vista.

BibTeX

If you find the project helpful for your research, please consider citing our paper:
@article{zhou2024decoupled,
        title={Decoupled Diffusion Sparks Adaptive Scene Generation},
        author={Zhou, Yunsong and Ye, Naisheng and Ljungbergh, William and Li, Tianyu and Yang, Jiazhi and Yang, Zetong and Zhu, Hongzi and Petersson, Christoffer and Li, Hongyang},
        journal={arXiv preprint arXiv:2504.10485},
        year={2025}
      }