Publication

Jin Chen,

Yuxiang Lu,

Chiming Liu,

Guanghui Ren,

Di Huang,

Maoqing Yao,

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

arXiv 2025

The first comprehensive analysis of data diversity principles revealing optimal scaling strategies for large-scale robotic manipulation training.

Yanting Yang,

Jisong Cai,

Guanghui Ren,

Maoqing Yao,

Planning-oriented Autonomous Driving

RSS 2025

A unified vision-language-action framework that enables policy learning across different environments.

Yihan Hu,

Keyu Li,

Xizhou Zhu,

Siqi Chai,

Senyao Du,

Tianwei Lin,

Lewei Lu,

Qiang Liu,

Jifeng Dai,

CVPR 2023 Best Paper Award

UniAD: The first comprehensive framework that incorporates full-stack driving tasks.

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Team AgiBot-World

IROS 2025

A novel generalist policy that leverages latent action representations to maximize data utilization, demonstrating predictable performance scaling with increased data volume.

End-to-End Autonomous Driving: Challenges and Frontiers

Bernhard Jaeger,

DriveLM: Driving with Graph Visual Question Answering

IEEE-TPAMI 2024Survey

In this survey, we provide a comprehensive analysis of more than 270 papers on the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.

Katrin Renz,

Hanxue Zhang,

Chengen Xie,

FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation

ECCV 2024 Oral

Unlocking the future where autonomous driving meets the unlimited potential of language.

Embodied AI

Longyan Wu,

Checheng Yu,

Jieji Ren,

Ran Huang,

Guoying Gu,

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

arXiv 2025

A human-centric and robot-free visuo-tactile data collection system for high-quality and efficient robot manipulation.

Jisong Cai,

Jia Zeng,

Heming Cui,

Maoqing Yao,

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

arXiv 2024

Our objective is to develop a synergistic dual-system framework which supplements the generalizability of large-scale pre-trained generalist with the efficient and task-specific adaptation of specialist.

Jia Zeng,

Yanchao Yang,

Guyue Zhou,

Heming Cui,

Yi Ma,

Learning Manipulation by Predicting Interaction

NeurIPS 2024

CLOVER employs a text-conditioned video diffusion model for generating visual plans as reference inputs, then these sub-goals guide the feedback-driven policy to generate actions with an error measurement strategy.

Jia Zeng,

Bangjun Wang,

Wenke Xia,

Hao Dong,

Haoming Song,

Dong Wang,

Di Hu,

Heming Cui,

Bin Zhao,

Xuelong Li,

Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving

RSS 2024

We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI).

End-to-End Autonomous Driving

Haochen Liu,

Haohan Yang,

Caojun Wang,

Ke Guo,

Haochen Tian,

Hongchen Li,

ReSim: Reliable World Simulation for Autonomous Driving

Chen Lv

arXiv 2025

Long Chen,

Yuqian Shao,

ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models

Xiangyu Yue,

Li Chen

arXiv 2025

ReSim is a driving world model that enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. A Video2Reward model estimates the reward from ReSim's simulated future.

Shadi Hamdan,

Decoupled Diffusion Sparks Adaptive Scene Generation

Fatma Güney

ICCV 2025

Yunsong Zhou,

Naisheng Ye,

William Ljungbergh,

Hongzi Zhu,

Christoffer Petersson,

Centaur: Robust End-to-End Autonomous Driving with Test-Time Training

ICCV 2025

Zhiding Yu,

Shiyi Lan,

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

Jose M. Alvarez

arXiv 2025

Yihang Qiu,

Jun Zhang,

Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving

NeurIPS 2024

A generalizable driving world model with high-fidelity open-world prediction, continuous long-horizon rollout, and zero-shot action controllability.

Haochen Liu,

Chen Lv,

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

NeurIPS 2024

Daniel Dauner,

Marcel Hallgarten,

Xinshuo Weng,

Zhiyu Huang,

Igor Gilitschenski,

Boris Ivanovic,

Marco Pavone,

Generalized Predictive Model for Autonomous Driving

Kashyap Chitta

NeurIPS 2024 Track Datasets and Benchmarks

Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking.

Yihang Qiu,

Bo Dai,

Jia Zeng,

Jun Zhang,

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

CVPR 2024 Highlight

We aim to establish a generalized video prediction paradigm for autonomous driving by presenting the largest multimodal driving video dataset to date, OpenDV-2K, and a generative model that predicts the future given past visual and textual input, GenAD.

Yanan Sun,

CVPR 2024 Highlight

A new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.

自動駕駛開源數據體系：現狀與未來

Yang Li,

Jia Zeng,

Huilin Xu,

Pinlong Cai,

Feng Xu,

Lu Xiong,

Jingdong Wang,

Futang Zhu,

Kai Yan,

Chunjing Xu,

Tiancai Wang,

Fei Xia,

Beipeng Mu,

Zhihui Peng,

Dahua Lin,

Embodied Understanding of Driving Scenarios

中國科學：信息科學Survey

Yunsong Zhou,

Linyan Huang,

Jia Zeng,

Hang Qiu,

Hongzi Zhu,

Minyi Guo,

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

ECCV 2024

Revive driving scene understanding by delving into the embodiment philosophy.

Yulu Gao,

Patrick Langechuan Liu,

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

ICCV 2023 Oral

A new paradigm for end-to-end autonomous driving without causal confusion issue.

Jiangwei Xie,

Conghui He,

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

CVPR 2023

A scalable decoder paradigm that generates the future trajectory and action of the ego vehicle for end-to-end autonomous driving.

Zhenjie Yang,

Policy Pre-Training for End-to-End Autonomous Driving via Self-Supervised Geometric Modeling

Junchi Yan

arXiv 2023Survey

A collection of research papers about LLM-for-Autonomous-Driving (LLM4AD).

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

ICLR 2023

An intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving.

ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning

NeurIPS 2022

Take the initiative to explore the combination of controller based on a planned trajectory and perform control prediction.

Shengchao Hu,

MTGS: Multi-Traversal Gaussian Splatting

Dacheng Tao

ECCV 2022

A spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.

Autonomous Driving Algorithm

Yihang Qiu,

Zhenhua Wu,

Carl Lindström,

Peng Su,

Matthias Nießner,

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

arXiv 2025

Peijin Jia,

Bangjun Wang,

Kun Jiang,

Fully Sparse 3D Occupancy Prediction

ICLR 2024

We advocate Lane Segment as a map learning paradigm that seamlessly incorporates both map geometry and topology information.

Haisong Liu,

Yang Chen,

Haiguang Wang,

Jia Zeng,

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Limin Wang

ECCV 2024

Linyan Huang,

Zhiqi Li,

Jingdong Wang,

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

NeurIPS 2023

The unified framework to enhance 3D object detection by uniting a multi-modal expert model with a trajectory distillation module.

Yang Li,

Zhenbo Liu,

Bangjun Wang,

Peijin Jia,

Yuting Wang,

Shengyin Jiang,

Feng Wen,

Hang Xu,

Wei Zhang,

Delving into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

NeurIPS 2023 Track Datasets and Benchmarks

The world's first perception and reasoning benchmark for scene structure in autonomous driving.

Jifeng Dai,

Lewei Lu,

Jia Zeng,

Zhiqi Li,

Hanming Deng,

Hao Tian,

Enze Xie,

Jiangwei Xie,

Yang Li,

Yulu Gao,

Si Liu,

Jianping Shi,

Dahua Lin,

IEEE-TPAMI 2023Survey

We review the most recent work on BEV perception and provide analysis of different solutions.

Scene as Occupancy

Wenwen Tong,

Tai Wang,

Silei Wu,

Hanming Deng,

Yi Gu,

Lewei Lu,

Dahua Lin,

Sparse Dense Fusion for 3D Object Detection

ICCV 2023

Occupancy serves as a general representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving.

Yulu Gao,

Shaoshuai Shi,

Shangzhe Di,

Si Liu,

HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding

IROS 2023

We propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture.

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection

Yu Liu,

Junchi Yan

IEEE-TPAMI 2023

HDGT formulates the driving scene as a heterogeneous graph with different types of nodes and edges.

Jia Zeng,

Hanming Deng,

Lewei Lu,

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

CVPR 2023

We investigate on how to distill the knowledge from an imperfect expert. We propose FD3D, a Focal Distiller for 3D object detection.

Chenyu Yang,

Yuntao Chen,

Hao Tian,

Chenxin Tao,

Xizhou Zhu,

Zhaoxiang Zhang,

Gao Huang,

Graph-based Topology Reasoning for Driving Scenes

Lewei Lu,

Jie Zhou,

Jifeng Dai

CVPR 2023 Highlight

A novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.

Yang Li,

Xiangwei Geng,

Shengyin Jiang,

Yuting Wang,

Hang Xu,

Chunjing Xu,

Geometric-aware Pretraining for Vision-centric 3D Object Detection

arXiv 2023

A new baseline for scene topology reasoning, which unifies heterogeneous feature learning and enhances feature interactions via the graph neural network architecture and the knowledge graph design.

Linyan Huang,

Jia Zeng,

Shengchuan Zhang,

Liujuan Cao,

Rongrong Ji,

3D Data Augmentation for Driving Scenes on Camera

arXiv 2023

We propose GAPretrain, a plug-and-play framework that boosts 3D detection by pretraining with spatial-structural cues and BEV representation.

Wenwen Tong,

Jiangwei Xie,

Yang Li,

Hanming Deng,

Bo Dai,

Lewei Lu,

Hao Zhao,

Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach

PRCV 2024

We propose a 3D data augmentation approach termed Drive-3DAug to augment the driving scenes on camera in the 3D space.

Jia Zeng,

BEVFormer: Learning Bird’s-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers

CoRL 2022

We find taking scratch trajectories generated by MLP as input, a refinement module based on structures with temporal prior, could boost the accuracy.

Zhiqi Li,

Enze Xie,

Tong Lu,

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

Jifeng Dai

IEEE-TPAMI 2025

A paradigm for autonomous driving that applies both Transformer and Temporal structure to generate BEV features.

Yang Li,

Zehan Zheng,

Jiajie Xu,

Xiangwei Geng,

Conghui He,

Jianping Shi,

Detect Anything 3D in the Wild

Junchi Yan

ECCV 2022 Oral

PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes; we release one of the first large-scale real-world 3D lane datasets, OpenLane.

Computer Vision at Scale

Hanxue Zhang,

Haoran Jiang,

Qingsong Yao,

Yanan Sun,

Renrui Zhang,

Hao Zhao,