Why these tracks?


The field of autonomy is rapidly evolving, and recent advancements from the machine learning community, such as large language models (LLM) and world models, bring great potential. We believe the future lies in explainable, end-to-end models that understand the world and generalize to unvisited environments. In light of this, we propose seven new challenges that push the boundary of existing perception, prediction, and planning pipelines.




Features




Timeline

Jan. 31

Challenge Announcement

Feb. 10

Challenge Setup Ready
[internal]

Mar. 01

Challenge DevKit Release

Late March

Test Server Open

Late April

Sibling Event at China3DV

Jun. 01

Test Server Close / Registration Deadline

Jun. 04

Technical Report Submission Deadline

Jun. 06

Award Committee Meeting
[internal]

Jun. 08

Official Winner Notification
[internal]

Middle June

Sibling Event at Beijing or Shanghai

Jun. 17

Winner Announcement & Presentation

2024 2H

Journal Special Issue Invitation

Note: General timeline of the whole challenge. Please check the timeline of each track.



Official Partner




Host Organizations



How to participate


Many of the greatest ideas come from a diverse mix of minds, backgrounds, and experiences. We provide equal opportunities to all participants without regard to nationality, affiliation, race, religion, color, age, disability, or any other restriction. We believe diversity drives innovation. When we say we welcome participation from everyone, we mean everyone.
For participation in the challenge, it is a strict requirement to register for your team by filling in this Google Form. The registration information can be modified till June 01, 2024. For more details, please check the general rules.

Contact


Contact us via [email protected]
Join Slack to chat with Challenge organizers. Please follow the guidelines in #general and join the track-specific channels.
Join discussions in our WeChat group: 关于我们 -> 加入社群



E2E Driving at Scale Predictive World Model Occupancy and Flow Multi-View 3D Visual Grounding
CARLA Driving with Language Mapless Driving

Track

End-to-End Driving at Scale






Task Description


Benchmarking sensorimotor driving policies with real data is challenging due to the limited scale of prior datasets and the misalignment between open- and closed-loop metrics. Our NAVSIM framework aims to address these issues for end-to-end driving by unrolling simplified bird's eye view abstractions of scenes for a short simulation horizon.

The private test set is provided by nuPlan from Motional. Details and test server are available at Hugging Face.


Important Dates


Test Server OpenMarch 25, 2024
Test Server CloseJune 01, 2024


Award


Innovation AwardUSD 18,000
Outstanding ChampionUSD 9,000
Honorable Runner-upUSD 3,000


Citation


Please consider citing our work if the challenge helps your research with the following BibTex:
@inproceedings{Dauner2023CORL,
    title = {Parting with Misconceptions about Learning-based Vehicle Motion Planning},
    author = {Daniel Dauner and Marcel Hallgarten and Andreas Geiger and Kashyap Chitta},
    booktitle = {Conference on Robot Learning (CoRL)},
    year = {2023}
}
@misc{Contributors2024navsim,
    title={NAVSim: Data-Driven Non-Reactive Autonomous Vehicle Simulation},
    author={NAVSim Contributors},
    howpublished={\url{https://github.com/autonomousvision/navsim}},
    year={2024}
}


Contact




Related Literature


Track

Predictive World Model






Task Description


Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to provide a pre-trained foundation model for autonomous driving. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.

The private test set is provided by nuPlan from Motional. Details and test server are available at Hugging Face.


Important Dates


Test Server OpenMarch 25, 2024
Test Server CloseJune 01, 2024


Award


Innovation AwardUSD 18,000
Outstanding ChampionUSD 9,000
Honorable Runner-upUSD 3,000


Citation


Please consider citing our work if the challenge helps your research with the following BibTex:
@article{yang2023visual,
    title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},
    author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},
    journal={arXiv preprint arXiv:2312.17655},
    year={2023}
}


Contact




Related Literature


Track

Occupancy and Flow






Task Description


The representation of 3D bounding boxes is not enough to describe general objects (obstacles). Instead, inspired by concepts in robotics, we perform general object detection via an occupancy representation to cover more irregularly shaped (e.g., protruding) objects. The goal of this task is to predict the 3D occupancy of the complete scene and the flow of the foreground objects given the input image from six cameras.

The private test set is provided by nuScenes from Motional. Details and test server are available at Hugging Face.


Important Dates


Test Server OpenMarch 25, 2024
Test Server CloseJune 01, 2024


Award


Innovation AwardUSD 18,000
Outstanding ChampionUSD 9,000
Honorable Runner-upUSD 3,000


Citation


Please consider citing our work if the challenge helps your research with the following BibTex:
@article{sima2023_occnet,
    title={Scene as Occupancy},
    author={Chonghao Sima and Wenwen Tong and Tai Wang and Li Chen and Silei Wu and Hanming Deng and Yi Gu and Lewei Lu and Ping Luo and Dahua Lin and Hongyang Li},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    year={2023}
}
@article{liu2023fully,
    title={Fully Sparse 3D Panoptic Occupancy Prediction},
    author={Liu, Haisong and Wang, Haiguang and Chen, Yang and Yang, Zetong and Zeng, Jia and Chen, Li and Wang, Limin},
    journal={arXiv preprint arXiv:2312.17118},
    year={2023}
}


Contact




Related Literature


Track

Multi-View 3D Visual Grounding






Task Description


Compared to driving scenes, the indoor embodied 3D perception systems face multi-modal input, including language instructions, more complex semantic understanding, diverse object categories and orientations, and different perceptual spaces and needs. Based on this, we construct EmbodiedScan, a holistic multi-modal, ego-centric 3D perception suite. In this track, given language prompts describing specific objects, models are required to detect them in the scene and predict their oriented 3D bounding boxes.

Details and test server are available at Hugging Face.


Important Dates


Test Server OpenMarch 25, 2024
Test Server CloseJune 01, 2024


Award


Innovation AwardUSD 6,000
Outstanding ChampionUSD 3,000
Honorable Runner-upUSD 1,000


Citation


Please consider citing our work if the challenge helps your research with the following BibTex:
@article{wang2023embodiedscan,
    title={EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI},
    author={Wang, Tai and Mao, Xiaohan and Zhu, Chenming and Xu, Runsen and Lyu, Ruiyuan and Li, Peisen and Chen, Xiao and Zhang, Wenwei and Chen, Kai and Xue, Tianfan and Liu, Xihui and Lu, Cewu and Lin, Dahua and Pang, Jiangmiao},
    journal={arXiv preprint arXiv:2312.16170},
    year={2023}
}


Contact




Related Literature


Track

CARLA Autonomous Driving Challenge






Task Description


To verify the effectiveness of AD systems, we need an ultimate planning framework with a closed-loop setting. The CARLA AD Leaderboard challenges agents to drive through a set of predefined routes. For each route, agents will be initialized at a starting point and directed to drive to a destination point, provided with a description of the route through GPS style coordinates, map coordinates or route instructions. Routes are defined in a variety of situations, including freeways, urban areas, residential districts and rural settings. The Leaderboard evaluates AD agents in a variety of weather conditions, including daylight scenes, sunset, rain, fog, and night, among others.

Details and test server are available at EvalAI.


Important Dates


Test Server OpenMarch 25, 2024
Test Server CloseJune 01, 2024


Leaderboard


See CARLA Leaderboard 2.0.

Award


Innovation AwardUSD 10,000
Outstanding ChampionHP Workstation
Honorable Runner-upHP Workstation


Citation


Please consider citing our work if the challenge helps your research with the following BibTex:
@inproceedings{dosovitskiy2017carla,
    title={CARLA: An open urban driving simulator},
    author={Dosovitskiy, Alexey and Ros, German and Codevilla, Felipe and Lopez, Antonio and Koltun, Vladlen},
    booktitle={Conference on robot learning},
    year={2017}
}


Contact




Related Literature


Track

Driving with Language






Task Description


Incorporating the language modality, this task connects Vision Language Models (VLMs) and autonomous driving systems. The model will introduce the reasoning ability of LLMs to make decisions, and pursue generalizable and explainable driving behavior. Given multi-view images as inputs, models are required to answer questions covering various aspects of driving.

Details and test server are available at Hugging Face.


Important Dates


Test Server OpenMarch 25, 2024
Test Server CloseJune 01, 2024


Award


Innovation AwardUSD 1,000
Outstanding ChampionUSD 500


Citation


Please consider citing our work if the challenge helps your research with the following BibTex:
@article{sima2023drivelm,
    title={DriveLM: Driving with Graph Visual Question Answering},
    author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
    journal={arXiv preprint arXiv:2312.14150},
    year={2023}
}
@misc{contributors2023drivelmrepo,
    title={DriveLM: Driving with Graph Visual Question Answering},
    author={DriveLM contributors},
    howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
    year={2023}
}
@article{zhou2024embodied,
    title={Embodied Understanding of Driving Scenarios},
    author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
    journal={arXiv preprint arXiv:2403.04593},
    year={2024}
}


Contact




Related Literature


Track

Mapless Driving






Task Description


Driving without High-Definition (HD) maps demands a high level of active scene understanding capacity. This challenge aims to explore the boundaries of scene reasoning. Neural networks take multi-view images and Standard-Definition (SD) Map as input, then provide not only perception results of lanes and traffic elements but also topology relationships among lanes and between lanes and traffic elements simultaneously.

Details and test server are available at Hugging Face.


Important Dates


Test Server OpenMarch 25, 2024
Test Server CloseJune 01, 2024


Award


Innovation AwardUSD 1,000
Outstanding ChampionUSD 500


Citation


Please consider citing our work if the challenge helps your research with the following BibTex:
@inproceedings{wang2023openlanev2,
    title={OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping},
    author={Wang, Huijie and Li, Tianyu and Li, Yang and Chen, Li and Sima, Chonghao and Liu, Zhenbo and Wang, Bangjun and Jia, Peijin and Wang, Yuting and Jiang, Shengyin and Wen, Feng and Xu, Hang and Luo, Ping and Yan, Junchi and Zhang, Wei and Li, Hongyang},
    booktitle={NeurIPS},
    year={2023}
}
@inproceedings{li2023lanesegnet,
    title={LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving},
    author={Li, Tianyu and Jia, Peijin and Wang, Bangjun and Chen, Li and Jiang, Kun and Yan, Junchi and Li, Hongyang},
    booktitle={ICLR},
    year={2024}
}


Contact




Related Literature





General Rules


Eligibility




Technical




Awards




FAQ








Organizers



General


Huijie Wang

Shanghai AI Lab

Tianyu Li

Fudan University

Yihang Qiu

Shanghai AI Lab

Jiangmiao Pang

Shanghai AI Lab

Fei Xia

Meituan

End-to-End Driving at Scale


Kashyap Chitta

University of Tübingen

Daniel Dauner

University of Tübingen

Marcel Hallgarten

University of Tübingen

Boris Ivanovic

NVIDIA

Marco Pavone

NVIDIA

Igor Gilitschenski

University of Toronto

Predictive World Model


Zetong Yang

Shanghai AI Lab

Pat Karnchanachari

Motional

Linyan Huang

Shanghai AI Lab

Occupancy and Flow


Yihang Qiu

Shanghai AI Lab

Whye Kit Fong

Motional

Haisong Liu

Nanjing University

Multi-View 3D Visual Grounding


Tai Wang

Shanghai AI Lab

Xiaohan Mao

Shanghai AI Lab

Chenming Zhu

HKU

Ruiyuan Lyu

Shanghai AI Lab

CARLA Autonomous Driving Challenge


German Ros

NVIDIA

Matt Rowe

CARLA Team

Joel Moriana

CARLA Team

Guillermo Lopez

CARLA Team

Driving with Language


Chonghao Sima

Shanghai AI Lab & HKU

Kashyap Chitta

University of Tübingen

Jiajie Xu

CMU

Hanxue Zhang

Shanghai AI Lab

Mapless Driving


Tianyu Li

Fudan University

Huijie Wang

Shanghai AI Lab