Editor's Pick
End-to-End AD BEV Perception Prediction and Planning CV at Scale



Editor's Pick



Generalized Predictive Model for Autonomous Driving

Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li
CVPR 2024
We aim to establish a generalized video prediction paradigm for autonomous driving by presenting the largest multimodal driving video dataset to date, OpenDV-2K, and a generative model that predicts the future given past visual and textual input, GenAD.

Planning-oriented Autonomous Driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Hongyang Li, et al. A comprehensive framework up-to-date that incorporates full-stack driving tasks in one network.

自动驾驶开源数据体系:现状与未来

Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Kai Yan, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao
中国科学

End-to-end Autonomous Driving: Challenges and Frontiers

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li
arXiv 2023
In this survey, we provide a comprehensive analysis of more than 250 papers on the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.

Delving into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, et al.
TPAMI 2023 [Setup the Table]
We review the most recent work on BEV perception and provide analysis of different solutions.

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, et al.
NeurIPS 2023 Track Datasets and Benchmarks
The world's first perception and reasoning benchmark for scene structure in autonomous driving.

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, et al.
ECCV 2022 [nuScenes First Place] [Waymo Challenge 2022 First Place]
A paradigm for autonomous driving that applies both Transformer and Temporal structure to generate BEV features.




End-to-End Autonomous Driving



Embodied Understanding of Driving Scenarios

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li
arXiv 2024
Revive driving scene understanding by delving into the embodiment philosophy.

DriveLM: Driving with Graph Visual Question Answering

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, Hongyang Li
arXiv 2024

A Survey of Large Language Models for Autonomous Driving

Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan
arXiv 2023
A collection of research papers about LLM-for-Autonomous-Driving (LLM4AD).

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, Hongyang Li
ICCV 2023 (Oral)
A new paradigm for end-to-end autonomous driving without causal confusion issue.

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Xiaosong Jia, Penghao Wu, Li Chen, et al.
CVPR 2023
A scalable decoder paradigm that generates the future trajectory and action of the ego vehicle for end-to-end autonomous driving.

Policy Pre-Training for End-to-End Autonomous Driving via Self-Supervised Geometric Modeling

Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao
ICLR 2023
An intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving.

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao
NeurIPS 2022 [Carla First Place]
Take the initiative to explore the combination of controller based on a planned trajectory and perform control prediction.

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, et al.
ECCV 2022
A spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.




Bird's-Eye-View Perception



Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun, Hongyang Li
CVPR 2024
A new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, Hongyang Li
ICLR 2024
We advocate Lane Segment as a map learning paradigm that seamlessly incorporates both map geometry and topology information.

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li
NeurIPS 2023
The unified framework to enhance 3D object detection by uniting a multi-modal expert model with a trajectory distillation module.

Scene as Occupancy

Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li
ICCV 2023
Occupancy serves as a general representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving.

Graph-based Topology Reasoning for Driving Scenes

Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, et al.
arXiv 2023
A new baseline for scene topology reasoning, which unifies heterogeneous feature learning and enhances feature interactions via the graph neural network architecture and the knowledge graph design.

Sparse Dense Fusion for 3D Object Detection

Yulu Gao, Chonghao Sima, Shaoshuai Shi, Shangzhe Di, Si Liu, Hongyang Li
IROS 2023
We propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture.

3D Data Augmentation for Driving Scenes on Camera

Wenwen Tong, Jiangwei Xie, Tianyu Li, Hanming Deng, Xiangwei Geng, Ruoyi Zhou, Dingchen Yang, Bo Dai, Lewei Lu, Hongyang Li
arXiv 2023
We propose a 3D data augmentation approach termed Drive-3DAug to augment the driving scenes on camera in the 3D space.

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Rongrong Ji, Junchi Yan, Hongyang Li
arXiv 2023
We propose GAPretrain, a plug-and-play framework that boosts 3D detection by pretraining with spatial-structural cues and BEV representation.

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Chenyu Yang, Xizhou Zhu, Hongyang Li, Jifeng Dai, et al.
CVPR 2023 Highlight
A novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection

Jia Zeng, Li Chen, et al.
CVPR 2023
We investigate on how to distill the knowledge from an imperfect expert. We propose FD3D, a Focal Distiller for 3D object detection.

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

Li Chen, Chonghao Sima, Yang Li, Xiangwei Geng, Junchi Yan, et al.
ECCV 2022 (Oral) [Redefine the Community]
PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes; we release one of the first large-scale real-world 3D lane datasets, OpenLane.




Prediction and Planning



Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach

Xiaosong Jia, Li Chen, Penghao Wu, Jia Zeng, et al.
CoRL 2022
We find taking scratch trajectories generated by MLP as input, a refinement module based on structures with temporal prior, could boost the accuracy.

HDGT: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding

Xiaosong Jia, Penghao Wu, Li Chen, Hongyang Li, Yu Liu, Junchi Yan
TPAMI 2023
HDGT formulates the driving scene as a heterogeneous graph with different types of nodes and edges.




Computer Vision at Scale



Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Zhongjiang He, Peng Gao
AAAI 2024

Stare at What You See: Masked Image Modeling without Reconstruction

Hongwei Xue, Peng Gao, Hongyang Li, et al.
CVPR 2023
An efficient MIM paradigm MaskAlign and a Dynamic Alignment module to apply learnable alignment to tackle the problem of input inconsistency.

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu
IJCV 2023
Introducing high-level and low-level representations to MAE without interference during pre-training.

Align Representations With Base: A New Approach to Self-Supervised Learning

Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang
CVPR 2022