OpenDriveLab

Publication

We position OpenDriveLab as one of the most top research teams around globe, since we've got talented people and published work at top venues.
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
RSS 2025
A unified vision-language-action framework that enables policy learning across different environments.
Planning-oriented Autonomous Driving
Yihan Hu
Keyu Li
Xizhou Zhu
Siqi Chai
Senyao Du
Tianwei Lin
Lewei Lu
Qiang Liu
Jifeng Dai
UniAD: The first comprehensive framework that incorporates full-stack driving tasks.
AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
IROS 2025
blog
huggingface
bilibili
A novel generalist policy that leverages latent action representations to maximize data utilization, demonstrating predictable performance scaling with increased data volume.
End-to-End Autonomous Driving: Challenges and Frontiers
IEEE-TPAMI 2024Survey
In this survey, we provide a comprehensive analysis of more than 270 papers on the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
DriveLM: Driving with Graph Visual Question Answering
ECCV 2024 Oral
dataset
page
huggingface
Unlocking the future where autonomous driving meets the unlimited potential of language.
FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation
arXiv 2025
page
blog
youtube
A human-centric and robot-free visuo-tactile data collection system for high-quality and efficient robot manipulation.
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
arXiv 2024
page
Our objective is to develop a synergistic dual-system framework which supplements the generalizability of large-scale pre-trained generalist with the efficient and task-specific adaptation of specialist.
Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation
NeurIPS 2024
bilibili
CLOVER employs a text-conditioned video diffusion model for generating visual plans as reference inputs, then these sub-goals guide the feedback-driven policy to generate actions with an error measurement strategy.
Learning Manipulation by Predicting Interaction
Jia Zeng
Wenke Xia
Hao Dong
Haoming Song
Dong Wang
Di Hu
Heming Cui
Bin Zhao
Xuelong Li
RSS 2024
page
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI).
ReSim: Reliable World Simulation for Autonomous Driving
arXiv 2025
page
ReSim is a driving world model that enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. A Video2Reward model estimates the reward from ReSim's simulated future.
Decoupled Diffusion Sparks Adaptive Scene Generation
ICCV 2025
page
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
NeurIPS 2024
dataset
page
huggingface
bilibili
blog
poster
A generalizable driving world model with high-fidelity open-world prediction, continuous long-horizon rollout, and zero-shot action controllability.
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
Daniel Dauner
Marcel Hallgarten
Xinshuo Weng
Zhiyu Huang
Igor Gilitschenski
Boris Ivanovic
Marco Pavone
NeurIPS 2024 Track Datasets and Benchmarks
Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking.
Generalized Predictive Model for Autonomous Driving
CVPR 2024 Highlight
dataset
youtube
bilibili
blog
slides
We aim to establish a generalized video prediction paradigm for autonomous driving by presenting the largest multimodal driving video dataset to date, OpenDV-2K, and a generative model that predicts the future given past visual and textual input, GenAD.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
CVPR 2024 Highlight
A new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.
自動駕駛開源數據體系:現狀與未來
Yang Li
Jia Zeng
Huilin Xu
Pinlong Cai
Feng Xu
Lu Xiong
Jingdong Wang
Futang Zhu
Kai Yan
Chunjing Xu
Tiancai Wang
Fei Xia
Beipeng Mu
Zhihui Peng
Dahua Lin
中國科學:信息科學Survey
arxiv
Embodied Understanding of Driving Scenarios
ECCV 2024
page
Revive driving scene understanding by delving into the embodiment philosophy.
DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving
ICCV 2023 Oral
A new paradigm for end-to-end autonomous driving without causal confusion issue.
Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
CVPR 2023
A scalable decoder paradigm that generates the future trajectory and action of the ego vehicle for end-to-end autonomous driving.
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
arXiv 2023Survey
A collection of research papers about LLM-for-Autonomous-Driving (LLM4AD).
Policy Pre-Training for End-to-End Autonomous Driving via Self-Supervised Geometric Modeling
ICLR 2023
slides
An intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving.
Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
NeurIPS 2022
zhihu
Take the initiative to explore the combination of controller based on a planned trajectory and perform control prediction.
ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning
ECCV 2022
zhihu
A spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
MTGS: Multi-Traversal Gaussian Splatting
Zhenhua Wu
Carl Lindström
Peng Su
Matthias Nießner
arXiv 2025
LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving
ICLR 2024
We advocate Lane Segment as a map learning paradigm that seamlessly incorporates both map geometry and topology information.
Fully Sparse 3D Occupancy Prediction
Haisong Liu
Yang Chen
Haiguang Wang
Jia Zeng
Limin Wang
ECCV 2024
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
NeurIPS 2023
The unified framework to enhance 3D object detection by uniting a multi-modal expert model with a trajectory distillation module.
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
Yang Li
Zhenbo Liu
Peijin Jia
Yuting Wang
Shengyin Jiang
Feng Wen
Hang Xu
Wei Zhang
NeurIPS 2023 Track Datasets and Benchmarks
dataset
The world's first perception and reasoning benchmark for scene structure in autonomous driving.
Delving into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe
Jifeng Dai
Lewei Lu
Jia Zeng
Hanming Deng
Hao Tian
Enze Xie
Jiangwei Xie
Yang Li
Si Liu
Jianping Shi
Dahua Lin
IEEE-TPAMI 2023Survey
We review the most recent work on BEV perception and provide analysis of different solutions.
Scene as Occupancy
Wenwen Tong
Tai Wang
Silei Wu
Hanming Deng
Yi Gu
Lewei Lu
Dahua Lin
ICCV 2023
Occupancy serves as a general representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving.
Sparse Dense Fusion for 3D Object Detection
IROS 2023
We propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture.
HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding
IEEE-TPAMI 2023
HDGT formulates the driving scene as a heterogeneous graph with different types of nodes and edges.
Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection
CVPR 2023
We investigate on how to distill the knowledge from an imperfect expert. We propose FD3D, a Focal Distiller for 3D object detection.
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
Chenyu Yang
Yuntao Chen
Hao Tian
Chenxin Tao
Xizhou Zhu
Zhaoxiang Zhang
Gao Huang
Lewei Lu
Jie Zhou
Jifeng Dai
CVPR 2023 Highlight
A novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.
Graph-based Topology Reasoning for Driving Scenes
Yang Li
Shengyin Jiang
Yuting Wang
Hang Xu
Chunjing Xu
arXiv 2023
A new baseline for scene topology reasoning, which unifies heterogeneous feature learning and enhances feature interactions via the graph neural network architecture and the knowledge graph design.
Geometric-aware Pretraining for Vision-centric 3D Object Detection
arXiv 2023
We propose GAPretrain, a plug-and-play framework that boosts 3D detection by pretraining with spatial-structural cues and BEV representation.
3D Data Augmentation for Driving Scenes on Camera
Wenwen Tong
Jiangwei Xie
Yang Li
Hanming Deng
Bo Dai
Lewei Lu
Hao Zhao
PRCV 2024
We propose a 3D data augmentation approach termed Drive-3DAug to augment the driving scenes on camera in the 3D space.
Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach
CoRL 2022
We find taking scratch trajectories generated by MLP as input, a refinement module based on structures with temporal prior, could boost the accuracy.
BEVFormer: Learning Bird’s-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers
IEEE-TPAMI 2025
zhihu
A paradigm for autonomous driving that applies both Transformer and Temporal structure to generate BEV features.
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
ECCV 2022 Oral
dataset
blog
zhihu
PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes; we release one of the first large-scale real-world 3D lane datasets, OpenLane.
Detect Anything 3D in the Wild
Qingsong Yao
Yanan Sun
Renrui Zhang
Hao Zhao
Hongzi Zhu
ICCV 2025
DetAny3D is a promptable 3D detection foundation model that leverages 2D foundation knowledge to enable zero-shot 3D object detection from monocular images across novel categories and camera settings.
FastMAC: Stochastic Spectral Sampling of Correspondence Graph
CVPR 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Zhongjiang He
Peng Gao
AAAI 2024
Stare at What You See: Masked Image Modeling without Reconstruction
CVPR 2023
An efficient MIM paradigm MaskAlign and a Dynamic Alignment module to apply learnable alignment to tackle the problem of input inconsistency.
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Peng Gao
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongsheng Li
Qiao Yu
IJCV 2023
Introducing high-level and low-level representations to MAE without interference during pre-training.
Align Representations With Base: A New Approach to Self-Supervised Learning
Shaofeng Zhang
Lyn Qiu
Feng Zhu
Hengrui Zhang
Rui Zhao
Xiaokang Yang
CVPR 2022