The Autonomous Grand Challenge at the CVPR 2024 Workshop has wrapped up! The Challenge has gained worldwide participation across ALL continents, including Africa and Oceania. The diversity of institutions indicates a big success of the Challenge. This year, we had 3,000+ submissions made by 480+ teams from 28 countries and regions across all continents. The popularity of the Challenge is also testified by 30,000+ total views on the websites and 110,000+ total views on posts on social media. All the submissions are carefully reviewed, and the award recipients are finalized by the Award Committee. A complete list of awardees is provided on the CVPR website (see "Foundation Models for Autonomous Systems"). We want to thank all participants, organizers, volunteers, and webmasters of the Challenge. All of this would not be possible without you.

We provide certificates for all participants. Please contact us via [email protected]

`Test servers remain active`

Why these tracks?

The field of autonomy is rapidly evolving, and recent advancements from the machine learning community, such as large language models (LLM) and world models, bring great potential. We believe the future lies in explainable, end-to-end models that understand the world and generalize to unvisited environments. In light of this, we propose seven new challenges that push the boundary of existing perception, prediction, and planning pipelines.

End-to-End Driving at Scale

CARLA Autonomous Driving Challenge

Driving with Language

Occupancy and Flow

Multi-View 3D Visual Grounding Embodied AI

Mapless Driving

Predictive World Model

Features

A total cash pool of up to USD 100,000.
Over USD 18,000 cash awards for a winner of a single track (Innovation Award + Outstanding Champion).
Winners will be invited to submit to a top-tier Journal Special Issue.

Host Organizations

Contact

Join discussions in our WeChat group

Track

End-to-End Driving at Scale

Test Server @

Leaderboard `(Server remains active)`

# Participating Teams: 143

# Countries and Regions: 13

# Submissions: 463

Rank	🗺️	Institution	PDM Score (primary) ▼	Team Name

* Not involed in the final official ranking.

Innovation Award goes to Team NVIDIA for

Present a simple yet effective idea to improve any end-to-end driving model using learned open-loop proxy metrics.

Award

	Innovation Award	USD 9,000
	Outstanding Champion	USD 9,000
	Honorable Runner-up	USD 3,000

Citation

Please consider citing our work if the challenge helps your research with the following BibTex:

@inproceedings{Dauner2024ARXIV,
    title = {NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking},
    author = {Daniel Dauner and Marcel Hallgarten and Tianyu Li and Xinshuo Weng and Zhiyu Huang and Zetong Yang and Hongyang Li and Igor Gilitschenski and Boris Ivanovic and Marco Pavone and Andreas Geiger and Kashyap Chitta},
    journal = {arXiv},
    volume = {2406.15349},
    year = {2024}
}

Contact

Daniel Dauner (University of Tübingen), [email protected]

Task Description

Benchmarking sensorimotor driving policies with real data is challenging due to the limited scale of prior datasets and the misalignment between open- and closed-loop metrics. Our NAVSIM framework aims to address these issues for end-to-end driving by unrolling simplified bird's eye view abstractions of scenes for a short simulation horizon.

The private test set is provided by nuPlan from Motional.

Related Literature

Track

Predictive World Model

Leaderboard

# Participating Teams: 69

# Countries and Regions: 13

# Submissions: 82

Rank	🗺️	Institution	CD@overall (primary) ▲	Team Name		[email protected]	[email protected]

* Not involed in the final official ranking.

Innovation Award goes to Huawei-Noah & CUHK-SZ for

Methodology distinct from the official baseline ViDAR. Though it has lower performance, the technical report does have clear writing and comprehensive results. The proposed approach aims to improve the speed of point cloud prediction models. It decomposes the point cloud prediction task into two sub-tasks: occupancy prediction and flow estimation.

Award

	Innovation Award	USD 9,000
	Outstanding Champion	USD 9,000
	Honorable Runner-up	USD 3,000

Citation

Please consider citing our work if the challenge helps your research with the following BibTex:

@article{yang2023visual,
    title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},
    author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},
    journal={arXiv preprint arXiv:2312.17655},
    year={2023}
}

Contact

Zetong Yang (OpenDriveLab), [email protected]

Task Description

Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to provide a pre-trained foundation model for autonomous driving. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.

The private test set is provided by nuPlan from Motional.

Related Literature

Track

Occupancy and Flow

Test Server @

Leaderboard `(Server remains active)`

# Participating Teams: 97

# Countries and Regions: 17

# Submissions: 153

Rank	🗺️	Institution	OccScore (primary) ▼	Team Name		RayIoU	mAVE

* Not involed in the final official ranking.

Innovation Award is not granted for

All candidates use existing technology and lack of novelty.

Award

	Innovation Award	USD 9,000
	Outstanding Champion	USD 9,000
	Honorable Runner-up	USD 3,000

Citation

Please consider citing our work if the challenge helps your research with the following BibTex:

@article{sima2023_occnet,
    title={Scene as Occupancy},
    author={Chonghao Sima and Wenwen Tong and Tai Wang and Li Chen and Silei Wu and Hanming Deng and Yi Gu and Lewei Lu and Ping Luo and Dahua Lin and Hongyang Li},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    year={2023}
}

@article{liu2023fully,
    title={Fully Sparse 3D Panoptic Occupancy Prediction},
    author={Liu, Haisong and Wang, Haiguang and Chen, Yang and Yang, Zetong and Zeng, Jia and Chen, Li and Wang, Limin},
    journal={arXiv preprint arXiv:2312.17118},
    year={2023}
}

Contact

Yihang Qiu (OpenDriveLab), [email protected]

Task Description

The representation of 3D bounding boxes is not enough to describe general objects (obstacles). Instead, inspired by concepts in robotics, we perform general object detection via an occupancy representation to cover more irregularly shaped (e.g., protruding) objects. The goal of this task is to predict the 3D occupancy of the complete scene and the flow of the foreground objects given the input image from six cameras.

The private test set is provided by nuScenes from Motional.

Related Literature

Track

Multi-View 3D Visual Grounding

Test Server @

Leaderboard `(Server remains active)`

# Participating Teams: 64

# Countries and Regions: 12

# Submissions: 154

Rank	🗺️	Institution	[email protected] (primary) ▼	Team Name		[email protected]

* Not involed in the final official ranking.

Innovation Award goes to THU-LenovoAI for

The introduction of LLM and bidirectional image-text interaction enriches the information multi-view perception context and thus leverages the challenge of vision-language feature sparsity. Model performance surpasses baseline by a large margin.

Award

	Innovation Award	USD 6,000
	Outstanding Champion	USD 3,000
	Honorable Runner-up	USD 1,000

Citation

Please consider citing our work if the challenge helps your research with the following BibTex:

@article{wang2023embodiedscan,
    title={EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI},
    author={Wang, Tai and Mao, Xiaohan and Zhu, Chenming and Xu, Runsen and Lyu, Ruiyuan and Li, Peisen and Chen, Xiao and Zhang, Wenwei and Chen, Kai and Xue, Tianfan and Liu, Xihui and Lu, Cewu and Lin, Dahua and Pang, Jiangmiao},
    journal={arXiv preprint arXiv:2312.16170},
    year={2023}
}

Contact

Tai Wang (OpenRobotLab), [email protected]

Task Description

Compared to driving scenes, the indoor embodied 3D perception systems face multi-modal input, including language instructions, more complex semantic understanding, diverse object categories and orientations, and different perceptual spaces and needs. Based on this, we construct EmbodiedScan, a holistic multi-modal, ego-centric 3D perception suite. In this track, given language prompts describing specific objects, models are required to detect them in the scene and predict their oriented 3D bounding boxes.

Details and test server are available at Hugging Face.

Related Literature

Track

Driving with Language

Test Server @

Leaderboard `(Server remains active)`

# Participating Teams: 152

# Countries and Regions: 14

# Submissions: 978

Rank	🗺️	Institution	final_score (primary) ▼	Team Name		accuracy	chatgpt

* Not involed in the final official ranking. † Selected submission from previous scores, not the highest one.

Submissions with Offline Label

Rank	🗺️	Institution	final_score (primary) ▼	Team Name		accuracy	chatgpt

Innovation Award goes to ADLM for

Team ADLM uses a spatially unified BEV feature and aligns it with an LLM to help the LLM comprehend spatial relationship information. The whole pipeline is in an end-to-end fashion.

Award

	Innovation Award	USD 1,000
	Outstanding Champion	USD 500

Citation

Please consider citing our work if the challenge helps your research with the following BibTex:

@article{sima2023drivelm,
    title={DriveLM: Driving with Graph Visual Question Answering},
    author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
    journal={arXiv preprint arXiv:2312.14150},
    year={2023}
}

@misc{contributors2023drivelmrepo,
    title={DriveLM: Driving with Graph Visual Question Answering},
    author={DriveLM contributors},
    howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
    year={2023}
}

@article{zhou2024embodied,
    title={Embodied Understanding of Driving Scenarios},
    author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
    journal={arXiv preprint arXiv:2403.04593},
    year={2024}
}

Contact

Chonghao Sima (OpenDriveLab), [email protected]

Task Description

Incorporating the language modality, this task connects Vision Language Models (VLMs) and autonomous driving systems. The model will introduce the reasoning ability of LLMs to make decisions, and pursue generalizable and explainable driving behavior. Given multi-view images as inputs, models are required to answer questions covering various aspects of driving.

Details and test server are available at Hugging Face.

Related Literature

Track

Mapless Driving

Test Server @

Leaderboard `(Server remains active)`

# Participating Teams: 120

# Countries and Regions: 10

# Submissions: 729

Rank	🗺️	Institution	UniScore (primary) ▼	Team Name		DET_l	DET_a

* Not involed in the final official ranking.

Innovation Award goes to LGMap for

Team LGMap designs a comprehensive and robust pipeline for mapless driving. They integrated SDMap input for prior road topology knowledge, used depth supervision, and proposed a streaming-stacking temporal module to achieve the most accurate lane topology perception to date.

Award

	Innovation Award	USD 1,000
	Outstanding Champion	USD 500

Citation

Please consider citing our work if the challenge helps your research with the following BibTex:

@inproceedings{wang2023openlanev2,
    title={OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping},
    author={Wang, Huijie and Li, Tianyu and Li, Yang and Chen, Li and Sima, Chonghao and Liu, Zhenbo and Wang, Bangjun and Jia, Peijin and Wang, Yuting and Jiang, Shengyin and Wen, Feng and Xu, Hang and Luo, Ping and Yan, Junchi and Zhang, Wei and Li, Hongyang},
    booktitle={NeurIPS},
    year={2023}
}

@inproceedings{li2023lanesegnet,
    title={LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving},
    author={Li, Tianyu and Jia, Peijin and Wang, Bangjun and Chen, Li and Jiang, Kun and Yan, Junchi and Li, Hongyang},
    booktitle={ICLR},
    year={2024}
}

Contact

Tianyu Li (OpenDriveLab), [email protected]

Task Description

Driving without High-Definition (HD) maps demands a high level of active scene understanding capacity. This challenge aims to explore the boundaries of scene reasoning. Neural networks take multi-view images and Standard-Definition (SD) Map as input, then provide not only perception results of lanes and traffic elements but also topology relationships among lanes and between lanes and traffic elements simultaneously.

Details and test server are available at Hugging Face.

Related Literature

Track

CARLA Autonomous Driving Challenge

Test Server @

Leaderboard `(Server remains active)`

# Participating Teams: 40

# Countries and Regions: 14

# Submissions: 280

SENSORS

Rank	🗺️	Institution	Driving Score (primary) ▼	Team Name

MAP

Rank	🗺️	Institution	Driving Score (primary) ▼	Team Name

Innovation Award goes to LLM4AD for

They use LLaVa as the vision encoder and LLaMA as the decoder to generate waypoints and propose a strategy to select challenging scenarios for efficient training, achieving a driving score of 6.9, which outperformed the second-place team by 23%.

Award

	Innovation Award	USD 10,000
	Outstanding Champion	HP Workstation
	Honorable Runner-up	HP Workstation

Citation

Please consider citing our work if the challenge helps your research with the following BibTex:

@inproceedings{dosovitskiy2017carla,
    title={CARLA: An open urban driving simulator},
    author={Dosovitskiy, Alexey and Ros, German and Codevilla, Felipe and Lopez, Antonio and Koltun, Vladlen},
    booktitle={Conference on robot learning},
    year={2017}
}

Contact

Discord

Task Description

To verify the effectiveness of AD systems, we need an ultimate planning framework with a closed-loop setting. The CARLA AD Leaderboard challenges agents to drive through a set of predefined routes. For each route, agents will be initialized at a starting point and directed to drive to a destination point, provided with a description of the route through GPS style coordinates, map coordinates or route instructions. Routes are defined in a variety of situations, including freeways, urban areas, residential districts and rural settings. The Leaderboard evaluates AD agents in a variety of weather conditions, including daylight scenes, sunset, rain, fog, and night, among others.

Details and test server are available at EvalAI.

Related Literature

Eligibility

A participant must be a member of a team and cannot be a member of multiple teams.
Participants can form teams of up to 10 people.
A team is limited to one submission account. And a team can participate in multiple tracks .
A team must be registered for participation.
An entity can have multiple teams.
Participants will receive certificates. Winners will be invited to present during the CVPR workshop or sibling event.
Attempting to hack the test set or engaging in similar behaviors will result in disqualification.

Technical

The use of all publicly available datasets and pretrained weights is allowed. The use of private datasets or pretrained weights is prohibited.
The use of future frames is prohibited except where explicitly stated.
The use of data must be described explicitly in the technical report.
All technical reports will be made public after the conclusion of the challenge.

Awards

To be eligible for awards:

Teams must make their results public on the leaderboard before the submission deadline and continue to keep them public thereafter;
Teams must submit a technical report via OpenReview for each track in PDF format of at most 4 pages (excluding references);
If requested, teams must provide their code, docker image, or necessary materials to the award committee for verification;
Organizers of a track cannot claim any award of the track. However, they can participate in other tracks since the test data is excluded from them as well.

Innovation awards will be decided through a double-blind reviewing process on the technical reports by the award committee. Winners of Innovation awards can overlap with Outstanding Champion and Honorable Runner-up.
The organizers reserve the right to update the rules in response to unforeseen circumstances in order to better serve the mission of the challenge. The organizers reserve the right to disqualify teams that have violated the rules. Organizers reserve all rights for the final explanation.

FAQ

How do we/I download the data?

For each track, we provide links for downloading data in the repository. The repository, which might also contain dataset API, baseline model, and other helpful information, is a good start to begin your participation.

How many times can we/I make a submission?

Each track has its submission limit. Please refer to the test server for each track.

Should we/I use future frames during inference?

No future frame is allowed except that it is noted explicitly.

Which template of a technical report should we/I follow? How to submit the technical report?

The format of a technical report is constrained to a PDF file. We recommend using the CVPR 2024 template.

Which registration form should we/I fill in regarding multiple forms?

It is a strict requirement to register in this Google Form.

General

Huijie Wang

Shanghai AI Lab

Tianyu Li

Fudan University

Yihang Qiu

Shanghai AI Lab

Jiangmiao Pang

Shanghai AI Lab

Fei Xia

Meituan