The Autonomous Grand Challenge at the CVPR 2024 Workshop has wrapped up! The Challenge has gained worldwide participation across ALL continents, including Africa and Oceania. The diversity of institutions indicates a big success of the Challenge. This year, we had 3,000+ submissions made by 480+ teams from 28 countries and regions across all continents. The popularity of the Challenge is also testified by 30,000+ total views on the websites and 110,000+ total views on posts on social media. All the submissions are carefully reviewed, and the award recipients are finalized by the Award Committee. A complete list of awardees is provided on the CVPR website (see "Foundation Models for Autonomous Systems"). We want to thank all participants, organizers, volunteers, and webmasters of the Challenge. All of this would not be possible without you.

We provide certificates for all participants. Please contact us via [email protected]



Test servers remain active



Why these tracks?


The field of autonomy is rapidly evolving, and recent advancements from the machine learning community, such as large language models (LLM) and world models, bring great potential. We believe the future lies in explainable, end-to-end models that understand the world and generalize to unvisited environments. In light of this, we propose seven new challenges that push the boundary of existing perception, prediction, and planning pipelines.




Features





Host Organizations



Contact


Contact us via [email protected]
Join Slack to chat with Challenge organizers. Please follow the guidelines in #general and join the track-specific channels.
Join discussions in our WeChat group



E2E Driving at Scale Predictive World Model Occupancy and Flow Multi-View 3D Visual Grounding
Driving with Language Mapless Driving CARLA

Track

End-to-End Driving at Scale






Leaderboard (Server remains active)


  • # Participating Teams: 143
  • # Countries and Regions: 13
  • # Submissions: 463

  • Rank 🗺️ Institution PDM Score (primary) ▼ Team Name

    * Not involed in the final official ranking.

      Innovation Award goes to Team NVIDIA for

    Present a simple yet effective idea to improve any end-to-end driving model using learned open-loop proxy metrics.



    Award


    Innovation AwardUSD 9,000
    Outstanding ChampionUSD 9,000
    Honorable Runner-upUSD 3,000


    Citation


    Please consider citing our work if the challenge helps your research with the following BibTex:
    @inproceedings{Dauner2024ARXIV,
        title = {NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking},
        author = {Daniel Dauner and Marcel Hallgarten and Tianyu Li and Xinshuo Weng and Zhiyu Huang and Zetong Yang and Hongyang Li and Igor Gilitschenski and Boris Ivanovic and Marco Pavone and Andreas Geiger and Kashyap Chitta},
        journal = {arXiv},
        volume = {2406.15349},
        year = {2024}
    }


    Contact




    Task Description


    Benchmarking sensorimotor driving policies with real data is challenging due to the limited scale of prior datasets and the misalignment between open- and closed-loop metrics. Our NAVSIM framework aims to address these issues for end-to-end driving by unrolling simplified bird's eye view abstractions of scenes for a short simulation horizon.

    The private test set is provided by nuPlan from Motional.


    Related Literature


    Track

    Predictive World Model




    Leaderboard


  • # Participating Teams: 69
  • # Countries and Regions: 13
  • # Submissions: 82

  • Rank 🗺️ Institution CD@overall (primary) ▲ Team Name [email protected]     [email protected]    

    * Not involed in the final official ranking.

      Innovation Award goes to Huawei-Noah & CUHK-SZ for

    Methodology distinct from the official baseline ViDAR. Though it has lower performance, the technical report does have clear writing and comprehensive results. The proposed approach aims to improve the speed of point cloud prediction models. It decomposes the point cloud prediction task into two sub-tasks: occupancy prediction and flow estimation.


    Award


    Innovation AwardUSD 9,000
    Outstanding ChampionUSD 9,000
    Honorable Runner-upUSD 3,000


    Citation


    Please consider citing our work if the challenge helps your research with the following BibTex:
    @article{yang2023visual,
        title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},
        author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},
        journal={arXiv preprint arXiv:2312.17655},
        year={2023}
    }


    Contact




    Task Description


    Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to provide a pre-trained foundation model for autonomous driving. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.

    The private test set is provided by nuPlan from Motional.


    Related Literature


    Track

    Occupancy and Flow






    Leaderboard (Server remains active)


  • # Participating Teams: 97
  • # Countries and Regions: 17
  • # Submissions: 153

  • Rank 🗺️ Institution OccScore (primary) ▼ Team Name RayIoU     mAVE    

    * Not involed in the final official ranking.

      Innovation Award is not granted for

    All candidates use existing technology and lack of novelty.


    Award


    Innovation AwardUSD 9,000
    Outstanding ChampionUSD 9,000
    Honorable Runner-upUSD 3,000


    Citation


    Please consider citing our work if the challenge helps your research with the following BibTex:
    @article{sima2023_occnet,
        title={Scene as Occupancy},
        author={Chonghao Sima and Wenwen Tong and Tai Wang and Li Chen and Silei Wu and Hanming Deng and Yi Gu and Lewei Lu and Ping Luo and Dahua Lin and Hongyang Li},
        booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
        year={2023}
    }
    @article{liu2023fully,
        title={Fully Sparse 3D Panoptic Occupancy Prediction},
        author={Liu, Haisong and Wang, Haiguang and Chen, Yang and Yang, Zetong and Zeng, Jia and Chen, Li and Wang, Limin},
        journal={arXiv preprint arXiv:2312.17118},
        year={2023}
    }


    Contact




    Task Description


    The representation of 3D bounding boxes is not enough to describe general objects (obstacles). Instead, inspired by concepts in robotics, we perform general object detection via an occupancy representation to cover more irregularly shaped (e.g., protruding) objects. The goal of this task is to predict the 3D occupancy of the complete scene and the flow of the foreground objects given the input image from six cameras.

    The private test set is provided by nuScenes from Motional.


    Related Literature


    Track

    Multi-View 3D Visual Grounding






    Leaderboard (Server remains active)


  • # Participating Teams: 64
  • # Countries and Regions: 12
  • # Submissions: 154

  • Rank 🗺️ Institution [email protected] (primary) ▼ Team Name [email protected]    

    * Not involed in the final official ranking.

      Innovation Award goes to THU-LenovoAI for

    The introduction of LLM and bidirectional image-text interaction enriches the information multi-view perception context and thus leverages the challenge of vision-language feature sparsity. Model performance surpasses baseline by a large margin.


    Award


    Innovation AwardUSD 6,000
    Outstanding ChampionUSD 3,000
    Honorable Runner-upUSD 1,000


    Citation


    Please consider citing our work if the challenge helps your research with the following BibTex:
    @article{wang2023embodiedscan,
        title={EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI},
        author={Wang, Tai and Mao, Xiaohan and Zhu, Chenming and Xu, Runsen and Lyu, Ruiyuan and Li, Peisen and Chen, Xiao and Zhang, Wenwei and Chen, Kai and Xue, Tianfan and Liu, Xihui and Lu, Cewu and Lin, Dahua and Pang, Jiangmiao},
        journal={arXiv preprint arXiv:2312.16170},
        year={2023}
    }


    Contact




    Task Description


    Compared to driving scenes, the indoor embodied 3D perception systems face multi-modal input, including language instructions, more complex semantic understanding, diverse object categories and orientations, and different perceptual spaces and needs. Based on this, we construct EmbodiedScan, a holistic multi-modal, ego-centric 3D perception suite. In this track, given language prompts describing specific objects, models are required to detect them in the scene and predict their oriented 3D bounding boxes.

    Details and test server are available at Hugging Face.


    Related Literature


    Track

    Driving with Language






    Leaderboard (Server remains active)


  • # Participating Teams: 152
  • # Countries and Regions: 14
  • # Submissions: 978

  • Rank 🗺️ Institution final_score (primary) ▼ Team Name accuracy     chatgpt    

    * Not involed in the final official ranking. † Selected submission from previous scores, not the highest one.

    Submissions with Offline Label


    Rank 🗺️ Institution final_score (primary) ▼ Team Name accuracy     chatgpt    

      Innovation Award goes to ADLM for

    Team ADLM uses a spatially unified BEV feature and aligns it with an LLM to help the LLM comprehend spatial relationship information. The whole pipeline is in an end-to-end fashion.


    Award


    Innovation AwardUSD 1,000
    Outstanding ChampionUSD 500


    Citation


    Please consider citing our work if the challenge helps your research with the following BibTex:
    @article{sima2023drivelm,
        title={DriveLM: Driving with Graph Visual Question Answering},
        author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
        journal={arXiv preprint arXiv:2312.14150},
        year={2023}
    }
    @misc{contributors2023drivelmrepo,
        title={DriveLM: Driving with Graph Visual Question Answering},
        author={DriveLM contributors},
        howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
        year={2023}
    }
    @article{zhou2024embodied,
        title={Embodied Understanding of Driving Scenarios},
        author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
        journal={arXiv preprint arXiv:2403.04593},
        year={2024}
    }


    Contact




    Task Description


    Incorporating the language modality, this task connects Vision Language Models (VLMs) and autonomous driving systems. The model will introduce the reasoning ability of LLMs to make decisions, and pursue generalizable and explainable driving behavior. Given multi-view images as inputs, models are required to answer questions covering various aspects of driving.

    Details and test server are available at Hugging Face.


    Related Literature


    Track

    Mapless Driving






    Leaderboard (Server remains active)


  • # Participating Teams: 120
  • # Countries and Regions: 10
  • # Submissions: 729

  • Rank 🗺️ Institution UniScore (primary) ▼ Team Name DET_l     DET_a    

    * Not involed in the final official ranking.

      Innovation Award goes to LGMap for

    Team LGMap designs a comprehensive and robust pipeline for mapless driving. They integrated SDMap input for prior road topology knowledge, used depth supervision, and proposed a streaming-stacking temporal module to achieve the most accurate lane topology perception to date.


    Award


    Innovation AwardUSD 1,000
    Outstanding ChampionUSD 500


    Citation


    Please consider citing our work if the challenge helps your research with the following BibTex:
    @inproceedings{wang2023openlanev2,
        title={OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping},
        author={Wang, Huijie and Li, Tianyu and Li, Yang and Chen, Li and Sima, Chonghao and Liu, Zhenbo and Wang, Bangjun and Jia, Peijin and Wang, Yuting and Jiang, Shengyin and Wen, Feng and Xu, Hang and Luo, Ping and Yan, Junchi and Zhang, Wei and Li, Hongyang},
        booktitle={NeurIPS},
        year={2023}
    }
    @inproceedings{li2023lanesegnet,
        title={LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving},
        author={Li, Tianyu and Jia, Peijin and Wang, Bangjun and Chen, Li and Jiang, Kun and Yan, Junchi and Li, Hongyang},
        booktitle={ICLR},
        year={2024}
    }


    Contact




    Task Description


    Driving without High-Definition (HD) maps demands a high level of active scene understanding capacity. This challenge aims to explore the boundaries of scene reasoning. Neural networks take multi-view images and Standard-Definition (SD) Map as input, then provide not only perception results of lanes and traffic elements but also topology relationships among lanes and between lanes and traffic elements simultaneously.

    Details and test server are available at Hugging Face.


    Related Literature


    Track

    CARLA Autonomous Driving Challenge






    Leaderboard (Server remains active)


  • # Participating Teams: 40
  • # Countries and Regions: 14
  • # Submissions: 280

  • SENSORS


    Rank 🗺️ Institution Driving Score (primary) ▼ Team Name


    MAP


    Rank 🗺️ Institution Driving Score (primary) ▼ Team Name

      Innovation Award goes to LLM4AD for

    They use LLaVa as the vision encoder and LLaMA as the decoder to generate waypoints and propose a strategy to select challenging scenarios for efficient training, achieving a driving score of 6.9, which outperformed the second-place team by 23%.


    Award


    Innovation AwardUSD 10,000
    Outstanding ChampionHP Workstation
    Honorable Runner-upHP Workstation


    Citation


    Please consider citing our work if the challenge helps your research with the following BibTex:
    @inproceedings{dosovitskiy2017carla,
        title={CARLA: An open urban driving simulator},
        author={Dosovitskiy, Alexey and Ros, German and Codevilla, Felipe and Lopez, Antonio and Koltun, Vladlen},
        booktitle={Conference on robot learning},
        year={2017}
    }


    Contact




    Task Description


    To verify the effectiveness of AD systems, we need an ultimate planning framework with a closed-loop setting. The CARLA AD Leaderboard challenges agents to drive through a set of predefined routes. For each route, agents will be initialized at a starting point and directed to drive to a destination point, provided with a description of the route through GPS style coordinates, map coordinates or route instructions. Routes are defined in a variety of situations, including freeways, urban areas, residential districts and rural settings. The Leaderboard evaluates AD agents in a variety of weather conditions, including daylight scenes, sunset, rain, fog, and night, among others.

    Details and test server are available at EvalAI.


    Related Literature





    General Rules


    Eligibility




    Technical




    Awards




    FAQ








    Organizers



    General


    Huijie Wang

    Shanghai AI Lab

    Tianyu Li

    Fudan University

    Yihang Qiu

    Shanghai AI Lab

    Jiangmiao Pang

    Shanghai AI Lab

    Fei Xia

    Meituan

    End-to-End Driving at Scale


    Kashyap Chitta

    University of Tübingen

    Daniel Dauner

    University of Tübingen

    Marcel Hallgarten

    University of Tübingen

    Boris Ivanovic

    NVIDIA

    Marco Pavone

    NVIDIA

    Igor Gilitschenski

    University of Toronto

    Predictive World Model


    Zetong Yang

    Shanghai AI Lab

    Pat Karnchanachari

    Motional

    Linyan Huang

    Shanghai AI Lab

    Occupancy and Flow


    Yihang Qiu

    Shanghai AI Lab

    Whye Kit Fong

    Motional

    Haisong Liu

    Nanjing University

    Multi-View 3D Visual Grounding


    Tai Wang

    Shanghai AI Lab

    Xiaohan Mao

    Shanghai AI Lab

    Chenming Zhu

    HKU

    Ruiyuan Lyu

    Shanghai AI Lab

    CARLA Autonomous Driving Challenge


    German Ros

    NVIDIA

    Matt Rowe

    CARLA Team

    Joel Moriana

    CARLA Team

    Guillermo Lopez

    CARLA Team

    Driving with Language


    Chonghao Sima

    Shanghai AI Lab & HKU

    Kashyap Chitta

    University of Tübingen

    Jiajie Xu

    CMU

    Mapless Driving


    Tianyu Li

    Fudan University

    Huijie Wang

    Shanghai AI Lab