Editor's Pick

Planning-oriented Autonomous Driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

CVPR 2023 Best Paper Award

A comprehensive framework up-to-date that incorporates full-stack driving tasks in one network.

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai

ECCV 2022 [nuScenes First Place] [Waymo Challenge 2022 First Place]

A paradigm for autonomous driving that applies both Transformer and Temporal structure to generate BEV features.

DriveLM: Driving with Graph Visual Question Answering

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, Hongyang Li

ECCV 2024

Learning Manipulation by Predicting Interaction

Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

RSS 2024

We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI).

自动驾驶开源数据体系：现状与未来

Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Kai Yan, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

中国科学：信息科学

End-to-End Autonomous Driving

End-to-end Autonomous Driving: Challenges and Frontiers

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li

TPAMI 2024

In this survey, we provide a comprehensive analysis of more than 250 papers on the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, Hongyang Li

arXiv 2024

Generalized Predictive Model for Autonomous Driving

Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li

CVPR 2024 (Highlight)

We aim to establish a generalized video prediction paradigm for autonomous driving by presenting the largest multimodal driving video dataset to date, OpenDV-2K, and a generative model that predicts the future given past visual and textual input, GenAD.

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun, Hongyang Li

CVPR 2024 (Highlight)

A new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.

Embodied Understanding of Driving Scenarios

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li

ECCV 2024

Revive driving scene understanding by delving into the embodiment philosophy.

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, Hongyang Li

ICCV 2023 (Oral)

A new paradigm for end-to-end autonomous driving without causal confusion issue.

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Xiaosong Jia, Penghao Wu, Li Chen, et al.

CVPR 2023

A scalable decoder paradigm that generates the future trajectory and action of the ego vehicle for end-to-end autonomous driving.

Policy Pre-Training for End-to-End Autonomous Driving via Self-Supervised Geometric Modeling

Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao

ICLR 2023

An intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving.

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao

NeurIPS 2022 [Carla First Place]

Take the initiative to explore the combination of controller based on a planned trajectory and perform control prediction.

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, et al.

ECCV 2022

A spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.

AD Algorithm

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, Hongyang Li

ICLR 2024

We advocate Lane Segment as a map learning paradigm that seamlessly incorporates both map geometry and topology information.

Fully Sparse 3D Occupancy Prediction

Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, Limin Wang

ECCV 2024

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

NeurIPS 2023

The unified framework to enhance 3D object detection by uniting a multi-modal expert model with a trajectory distillation module.

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, et al.

NeurIPS 2023 Track Datasets and Benchmarks

The world's first perception and reasoning benchmark for scene structure in autonomous driving.

Delving into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, et al.

TPAMI 2023 [Setup the Table]

We review the most recent work on BEV perception and provide analysis of different solutions.

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan

arXiv 2023

A collection of research papers about LLM-for-Autonomous-Driving (LLM4AD).

Scene as Occupancy

Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li

ICCV 2023

Occupancy serves as a general representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving.

Sparse Dense Fusion for 3D Object Detection

Yulu Gao, Chonghao Sima, Shaoshuai Shi, Shangzhe Di, Si Liu, Hongyang Li

IROS 2023

We propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture.

HDGT: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding

Xiaosong Jia, Penghao Wu, Li Chen, Hongyang Li, Yu Liu, Junchi Yan

TPAMI 2023

HDGT formulates the driving scene as a heterogeneous graph with different types of nodes and edges.

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection

Jia Zeng, Li Chen, et al.

CVPR 2023

We investigate on how to distill the knowledge from an imperfect expert. We propose FD3D, a Focal Distiller for 3D object detection.

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Chenyu Yang, Xizhou Zhu, Hongyang Li, Jifeng Dai, et al.

CVPR 2023 Highlight

A novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.

Graph-based Topology Reasoning for Driving Scenes

Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, et al.

arXiv 2023

A new baseline for scene topology reasoning, which unifies heterogeneous feature learning and enhances feature interactions via the graph neural network architecture and the knowledge graph design.

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Rongrong Ji, Junchi Yan, Hongyang Li

arXiv 2023

We propose GAPretrain, a plug-and-play framework that boosts 3D detection by pretraining with spatial-structural cues and BEV representation.

3D Data Augmentation for Driving Scenes on Camera

Wenwen Tong, Jiangwei Xie, Tianyu Li, Hanming Deng, Xiangwei Geng, Ruoyi Zhou, Dingchen Yang, Bo Dai, Lewei Lu, Hongyang Li

arXiv 2023

We propose a 3D data augmentation approach termed Drive-3DAug to augment the driving scenes on camera in the 3D space.

Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach

Xiaosong Jia, Li Chen, Penghao Wu, Jia Zeng, et al.

CoRL 2022

We find taking scratch trajectories generated by MLP as input, a refinement module based on structures with temporal prior, could boost the accuracy.

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

Li Chen, Chonghao Sima, Yang Li, Xiangwei Geng, Junchi Yan, et al.

ECCV 2022 (Oral) [Redefine the Community]

PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes; we release one of the first large-scale real-world 3D lane datasets, OpenLane.

Embodied AI

Learning Manipulation by Predicting Interaction

Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

RSS 2024

We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI).

Computer Vision at Scale

FastMAC: Stochastic Spectral Sampling of Correspondence Grap

Yifei Zhang, Hao Zhao, Hongyang Li, Siheng Chen

CVPR 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Zhongjiang He, Peng Gao

AAAI 2024

Stare at What You See: Masked Image Modeling without Reconstruction

Hongwei Xue, Peng Gao, Hongyang Li, et al.

CVPR 2023

An efficient MIM paradigm MaskAlign and a Dynamic Alignment module to apply learnable alignment to tackle the problem of input inconsistency.

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu

IJCV 2023

Introducing high-level and low-level representations to MAE without interference during pre-training.

Align Representations With Base: A New Approach to Self-Supervised Learning

Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang

CVPR 2022

Publication

We position OpenDriveLab as one of the most top research teams around globe, since we've got talented people and published work at top venues.

Editor's Pick

End-to-End Autonomous Driving

AD Algorithm

Embodied AI

Computer Vision at Scale