Autonomous Driving

enables vehicles to perceive, reason, and act safely in complex environments. We focus on whole-scene perception, critical data generation, and end-to-end decision-making. Our goal is to build a unified and scalable autonomy pipeline grounded in large-scale real-world driving data and efficient world representations.For a complete list of publications, please see here.

2025.11.28SimScale: Learning to Drive via Real-World Simulation at ScaleA scalable sim-real learning framework that synthesizes high-fidelity driving data and cboosts end-to-end planners to achieve robust, generalizable autonomy with principled scaling insights.
Paper |
Page |
GitHub
2025.06.11ReSim: Reliable World Simulation for Autonomous DrivingReSim is a driving world model that enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. A Video2Reward model estimates the reward from ReSim's simulated future.
Paper |
Page
2025.04.14Decoupled Diffusion Sparks Adaptive Scene Generation
Paper |
Page
2025.03.16MTGS: Multi-Traversal Gaussian Splatting
Paper |
GitHub
2024.06.21NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and BenchmarkingData-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking.
Paper |
GitHub
2024.05.27Vista: A Generalizable Driving World Model with High Fidelity and Versatile ControllabilityA generalizable driving world model with high-fidelity open-world prediction, continuous long-horizon rollout, and zero-shot action controllability.
Paper |
Dataset |
Page |
GitHub |
Hugging Face |
Video |
Blog |
Poster
2024.03.14Generalized Predictive Model for Autonomous DrivingWe aim to establish a generalized video prediction paradigm for autonomous driving by presenting the largest multimodal driving video dataset to date, OpenDV-2K, and a generative model that predicts the future given past visual and textual input, GenAD.
Paper |
GitHub |
Dataset |
Video |
Video |
Blog |
Slides
2023.12.29Visual Point Cloud Forecasting enables Scalable Autonomous DrivingA new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.
Paper |
GitHub
2023.12.26LaneSegNet: Map Learning with Lane Segment Perception for Autonomous DrivingWe advocate Lane Segment as a map learning paradigm that seamlessly incorporates both map geometry and topology information.
Paper |
GitHub
2023.12.21DriveLM: Driving with Graph Visual Question AnsweringUnlocking the future where autonomous driving meets the unlimited potential of language.
Paper |
Dataset |
Page |
GitHub |
Hugging Face
2023.12.06自動駕駛開源數據體系：現狀與未來
Paper |
GitHub |
arXiv
2023.08.01DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous DrivingA new paradigm for end-to-end autonomous driving without causal confusion issue.
Paper |
GitHub
2023.06.29End-to-End Autonomous Driving: Challenges and FrontiersIn this survey, we provide a comprehensive analysis of more than 270 papers on the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
Paper |
GitHub
2023.06.05Scene as OccupancyOccupancy serves as a general representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving.
Paper |
GitHub
2023.05.10Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous DrivingA scalable decoder paradigm that generates the future trajectory and action of the ego vehicle for end-to-end autonomous driving.
Paper |
GitHub
2023.04.20OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD MappingThe world's first perception and reasoning benchmark for scene structure in autonomous driving.
Paper |
GitHub |
Dataset
2023.04.11Graph-based Topology Reasoning for Driving ScenesA new baseline for scene topology reasoning, which unifies heterogeneous feature learning and enhances feature interactions via the graph neural network architecture and the knowledge graph design.
Paper |
GitHub
2023.01.03Policy Pre-Training for End-to-End Autonomous Driving via Self-Supervised Geometric ModelingAn intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving.
Paper |
GitHub |
Slides
2022.12.20Planning-oriented Autonomous DrivingUniAD: The first comprehensive framework that incorporates full-stack driving tasks.
Paper |
GitHub |
Video |
Slides |
解讀 |
解讀
2022.09.12Delving into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and RecipeWe review the most recent work on BEV perception and provide analysis of different solutions.
Paper |
GitHub
2022.09.10Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine ApproachWe find taking scratch trajectories generated by MLP as input, a refinement module based on structures with temporal prior, could boost the accuracy.
Paper |
GitHub
2022.07.15ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature LearningA spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
Paper |
GitHub |
解讀
2022.07.06Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong BaselineTake the initiative to explore the combination of controller based on a planned trajectory and perform control prediction.
Paper |
GitHub |
解讀
2022.04.30HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene EncodingHDGT formulates the driving scene as a heterogeneous graph with different types of nodes and edges.
Paper |
GitHub
2022.03.31BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal TransformersA paradigm for autonomous driving that applies both Transformer and Temporal structure to generate BEV features.
Paper |
GitHub |
解讀
2022.03.21PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane BenchmarkPersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes; we release one of the first large-scale real-world 3D lane datasets, OpenLane.
Paper |
GitHub |
Dataset |
Blog |
解讀