Editor's Pick
End-to-End AD AD Algorithm Embodied AI CV at Scale



Editor's Pick



Planning-oriented Autonomous Driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li A comprehensive framework up-to-date that incorporates full-stack driving tasks in one network.

End-to-end Autonomous Driving: Challenges and Frontiers

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li
TPAMI 2024
In this survey, we provide a comprehensive analysis of more than 270 papers on the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, Hongyang Li
NeurIPS 2024
​CLOVER employs a text-conditioned video diffusion model for generating visual plans as reference inputs, then these sub-goals guide the feedback-driven policy to generate actions with an error measurement strategy.

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, Hongyang Li
NeurIPS 2024
A generalizable driving world model with high-fidelity open-world prediction, continuous long-horizon rollout, and zero-shot action controllability.

DriveLM: Driving with Graph Visual Question Answering

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, Hongyang Li
ECCV 2024 (Oral)




End-to-End Autonomous Driving



Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving

Haochen Liu, Li Chen, Yu Qiao, Chen Lv, Hongyang Li
NeurIPS 2024

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, Kashyap Chitta
NeurIPS 2024 Track Datasets and Benchmarks
Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking.

Generalized Predictive Model for Autonomous Driving

Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li
CVPR 2024 (Highlight)
We aim to establish a generalized video prediction paradigm for autonomous driving by presenting the largest multimodal driving video dataset to date, OpenDV-2K, and a generative model that predicts the future given past visual and textual input, GenAD.

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun, Hongyang Li
CVPR 2024 (Highlight)
A new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.

自动驾驶开源数据体系:现状与未来

Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Kai Yan, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao
中国科学:信息科学

Embodied Understanding of Driving Scenarios

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li
ECCV 2024
Revive driving scene understanding by delving into the embodiment philosophy.

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, Hongyang Li
ICCV 2023 (Oral)
A new paradigm for end-to-end autonomous driving without causal confusion issue.

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Xiaosong Jia, Penghao Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi Yan, Hongyang Li
CVPR 2023
A scalable decoder paradigm that generates the future trajectory and action of the ego vehicle for end-to-end autonomous driving.

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan
arXiv 2023
A collection of research papers about LLM-for-Autonomous-Driving (LLM4AD).

Policy Pre-Training for End-to-End Autonomous Driving via Self-Supervised Geometric Modeling

Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao
ICLR 2023
An intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving.

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao
NeurIPS 2022 [Carla First Place]
Take the initiative to explore the combination of controller based on a planned trajectory and perform control prediction.

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, Dacheng Tao
ECCV 2022
A spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.




AD Algorithm



LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, Hongyang Li
ICLR 2024
We advocate Lane Segment as a map learning paradigm that seamlessly incorporates both map geometry and topology information.

Fully Sparse 3D Occupancy Prediction

Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, Limin Wang
ECCV 2024

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li
NeurIPS 2023
The unified framework to enhance 3D object detection by uniting a multi-modal expert model with a trajectory distillation module.

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, Ping Luo, Junchi Yan, Wei Zhang, Hongyang Li
NeurIPS 2023 Track Datasets and Benchmarks
The world's first perception and reasoning benchmark for scene structure in autonomous driving.

Delving into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao
TPAMI 2023 [Setup the Table]
We review the most recent work on BEV perception and provide analysis of different solutions.

Scene as Occupancy

Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li
ICCV 2023
Occupancy serves as a general representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving.

Sparse Dense Fusion for 3D Object Detection

Yulu Gao, Chonghao Sima, Shaoshuai Shi, Shangzhe Di, Si Liu, Hongyang Li
IROS 2023
We propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture.

HDGT: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding

Xiaosong Jia, Penghao Wu, Li Chen, Hongyang Li, Yu Liu, Junchi Yan
TPAMI 2023
HDGT formulates the driving scene as a heterogeneous graph with different types of nodes and edges.

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection

Jia Zeng, Li Chen, Hanming Deng, Lewei Lu, Junchi Yan, Yu Qiao, Hongyang Li
CVPR 2023
We investigate on how to distill the knowledge from an imperfect expert. We propose FD3D, a Focal Distiller for 3D object detection.

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai
CVPR 2023 Highlight
A novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.

Graph-based Topology Reasoning for Driving Scenes

Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Junchi Yan, Ping Luo, Hongyang Li
arXiv 2023
A new baseline for scene topology reasoning, which unifies heterogeneous feature learning and enhances feature interactions via the graph neural network architecture and the knowledge graph design.

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Rongrong Ji, Junchi Yan, Hongyang Li
arXiv 2023
We propose GAPretrain, a plug-and-play framework that boosts 3D detection by pretraining with spatial-structural cues and BEV representation.

3D Data Augmentation for Driving Scenes on Camera

Wenwen Tong, Jiangwei Xie, Tianyu Li, Hanming Deng, Xiangwei Geng, Ruoyi Zhou, Dingchen Yang, Bo Dai, Lewei Lu, Hongyang Li
arXiv 2023
We propose a 3D data augmentation approach termed Drive-3DAug to augment the driving scenes on camera in the 3D space.

Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach

Xiaosong Jia, Li Chen, Penghao Wu, Jia Zeng, Junchi Yan, Hongyang Li, Yu Qiao
CoRL 2022
We find taking scratch trajectories generated by MLP as input, a refinement module based on structures with temporal prior, could boost the accuracy.

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai
ECCV 2022 [nuScenes First Place] [Waymo Challenge 2022 First Place]
A paradigm for autonomous driving that applies both Transformer and Temporal structure to generate BEV features.

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan
ECCV 2022 (Oral) [Redefine the Community]
PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes; we release one of the first large-scale real-world 3D lane datasets, OpenLane.




Embodied AI



Learning Manipulation by Predicting Interaction

Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li
RSS 2024
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI).




Computer Vision at Scale



FastMAC: Stochastic Spectral Sampling of Correspondence Grap

Yifei Zhang, Hao Zhao, Hongyang Li, Siheng Chen
CVPR 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Zhongjiang He, Peng Gao
AAAI 2024

Stare at What You See: Masked Image Modeling without Reconstruction

Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
CVPR 2023
An efficient MIM paradigm MaskAlign and a Dynamic Alignment module to apply learnable alignment to tackle the problem of input inconsistency.

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu
IJCV 2023
Introducing high-level and low-level representations to MAE without interference during pre-training.

Align Representations With Base: A New Approach to Self-Supervised Learning

Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang
CVPR 2022