Workshop at ICCV 2025 | OpenDriveLab

Introduction

The world is three-dimensional. This fact was first seen by trilobites, the first organisms capable of sensing light. From that moment, nervous systems began to evolve, gradually transforming mere sight into insight, understanding, and action. All these combined gives rise to intelligence.

Despite remarkable technological advancements in recent decades, modern embodied systems remain far from achieving full intelligence. They fall short in several key aspects: (i) contain information necessary for physical interaction, such as temporal dynamics of the scene; (ii) have a prior over semantic relevance, and should focus on task-relevant features like objects and their relationships; and (iii) be compact, avoiding the inclusion of irrelevant details, such as background elements. Attempts have been made, including integrating foundational models and utilizing large-scale data. Yet, the path to true intelligence remains long, with significant progress still required.

Contact

Contact us via [email protected]
Join Discord to chat with the organizers
Join discussions in our WeChat group

Schedule

Time zone:

Li Chen
HKU, China

Opening Remarks
Ankit Goyal
NVIDIA, USA

Keep is simple when building VLAs Recordings (YouTube, bilibili)

Biography

Ankit Goyal is a Research Scientist in Robotics at NVIDIA working with Dieter Fox. He did his Ph.D. in Computer Science at Princeton University, where he was advised by Prof. Jia Deng. He completed Masters from University of Michigan and Bachelors from IIT Kanpur.
Track Organizer and Winners

NAVSIM v2 End-to-End Driving Challenge
Coffee Break
Yunzhu Li
Columbia, USA

Learning Structured World Models From and For Physical Interactions Recordings (YouTube, bilibili)

Biography

Yunzhu Li is an Assistant Professor of Computer Science at Columbia University.

Before joining Columbia, he was an Assistant Professor at UIUC CS. He also spent time as a Postdoc at the Stanford Vision and Learning Lab (SVL), working with Fei-Fei Li and Jiajun Wu. He received his PhD from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, where he was advised by Antonio Torralba and Russ Tedrake, and he obtained his bachelor's degree from Peking University.
Track Organizer and Winners

World Model Challenge by 1X
Matthias Nießner
TUM, Germany

3D Generative AI Models Recordings (YouTube, bilibili)

Biography

Matthias Nießner is a Professor at the Technical University of Munich, where he leads the Visual Computing Lab. Before, he was a Visiting Assistant Professor at Stanford University. Prof. Nießner's research lies at the intersection of computer vision, graphics, and machine learning, where he is particularly interested in cutting-edge techniques for 3D reconstruction, semantic 3D scene understanding, video editing, and AI-driven video synthesis. In total, he has published over 150 academic publications, including 25 papers at the prestigious ACM Transactions on Graphics (SIGGRAPH / SIGGRAPH Asia) journal and 55 works at the leading vision conferences (CVPR, ECCV, ICCV); several of these works won best paper awards, including at SIGCHI'14, HPG'15, SPG'18, and the SIGGRAPH'16 Emerging Technologies Award for the best Live Demo.

Prof. Nießner's work enjoys wide media coverage, with many articles featured in main-stream media including the New York Times, Wall Street Journal, Spiegel, MIT Technological Review, and many more, and his was work led to several TV appearances such as on Jimmy Kimmel Live, where Prof. Nießner demonstrated the popular Face2Face technique; Prof. Nießner's academic Youtube channel currently has over 5 million views.

For his work, Prof. Nießner received several awards: he is a TUM-IAS Rudolph Moessbauer Fellow (2017 - ongoing), he won the Google Faculty Award for Machine Perception (2017), the Nvidia Professor Partnership Award (2018), as well as the prestigious ERC Starting Grant 2018 which comes with 1.5 million Euro in research funding; in 2019, he received the Eurographics Young Researcher Award honoring the best upcoming graphics researcher in Europe.

In addition to his academic impact, Prof. Nießner is a co-founder and director of Synthesia Inc., a leading startup dedicated to democratize synthetic media generation with cutting-edge AI-driven video synthesis technology.
Lunch Break
Kun Zhan
Li Auto, China

World Models for Autonomous Driving: A Closed-Loop Training Framework Recordings (YouTube, bilibili)

Biography

Kun Zhan received his B.Eng. from the School of Automation, Beihang University, in 2016 and joined the Baidu Apollo L4 team, focusing on behavior prediction for autonomous driving. Since 2021, he has led the Vision-Language-Action (VLA) algorithm roadmap at Li Auto, overseeing perception, planning, and large-scale model deployment. He has published over 30 papers and holds more than 20 patents across China and the U.S., with representative works including DriveVLM and StreetGaussians, both deployed in Li Auto's production autonomous driving systems.
Pulkit Agrawal
MIT, USA

Questioning the role of vision in robot manipulation

Biography

Pulkit Agrawal is an Associate Professor in the department of Electrical Engineering and Computer Science (EECS) at MIT. His lab is a part of the Computer Science and Artificial Intelligence Lab (CSAIL), is affiliated with the Laboratory for Information and Decision Systems (LIDS) and involved with NSF AI Institute for Artificial Intelligence and Fundamental Interactions ( IAIFI ).

He completed his Ph.D. at UC Berkeley; undergraduate studies from IIT Kanpur. Co-founded SafelyYou Inc. that builds fall prevention technology. Advisor to Tutor Intelligence, Lab0 Inc., and Common Sense Machines.

Recordings (YouTube, bilibili)
Coffee Break
Katerina Fragkiadaki
CMU, USA

Embodied Reasoning with 3D robot policies and Grounded Reinforcement Learning

Biography

Katerina Fragkiadaki is a JPMorgan Chase Associate Professor of Computer Science in the Machine Learning Department at Carnegie Mellon University. She works in Artificial Intelligence at the intersection of Computer Vision, Machine Learning, Language Understanding and Robotics. Prior to joining MLD's faculty she spent three wonderful years as a post doctoral researcher first at UC Berkeley working with Jitendra Malik and then at Google Research in Mountain View working with the video group. She completed her Ph.D. in GRASP, UPenn with Jianbo Shi . She did her undergraduate studies at the National Technical University of Athens and before that she was in Crete.

Recordings (YouTube, bilibili)
Jia Deng
Princeton, USA

Procedural Generation of Virtual Worlds for Embodied AI

Biography

Jia Deng is a Professor of Computer Science at Princeton University. His research focuses on computer vision and machine learning. He received his Ph.D. from Princeton University and his B.Eng. from Tsinghua University, both in computer science. He is a recipient of the Sloan Research Fellowship, the NSF CAREER award, the ONR Young Investigator award, an ICCV Marr Prize, a CVPR test-of-time award and two ECCV Best Paper Awards.

Recordings (YouTube, bilibili)
Speakers and Organizers
Closing Remarks

Speakers

Matthias Nießner

Professor TUM

Jia Deng

Associate Professor Princeton

Pulkit Agrawal

Associate Professor MIT

Katerina Fragkiadaki

Associate Professor CMU

Yunzhu Li

Assistant Professor Columbia

Ankit Goyal

Research Scientist NVIDIA

Kun Zhan

Senior Director of VLA Model Li Auto

Organizers

-	Hongyang Li	HKU
-	Philipp Krähenbühl	UT Austin
-	Kashyap Chitta	NVIDIA
-	Eric Jang	1X Technologies
-	Andrei Bursuc	Valeo
-	Long Chen	Xiaomi
-	Huijie Wang	OpenDriveLab

Learning to See:
Advancing Spatial Understanding for Embodied Intelligence

ICCV 2025 Workshop
Room 312, October 19, Honolulu, USA

Recodrings: YouTube | bilibili

Autonomous Grand Challenge 2025

Introduction

Contact

Schedule

Speakers

Matthias Nießner

Jia Deng

Pulkit Agrawal

Katerina Fragkiadaki

Yunzhu Li

Ankit Goyal

Kun Zhan

Organizers

Past Editions

Scene Representations for Autonomous Driving

ICLR 2023

Learning to See:Advancing Spatial Understanding for Embodied Intelligence

ICCV 2025 Workshop Room 312, October 19, Honolulu, USA Recodrings: YouTube | bilibili

Autonomous Grand Challenge 2025

Introduction

Contact

Schedule

Speakers

Matthias Nießner

Jia Deng

Pulkit Agrawal

Katerina Fragkiadaki

Yunzhu Li

Ankit Goyal

Kun Zhan

Organizers

Past Editions

Scene Representations for Autonomous Driving

ICLR 2023

Learning to See:
Advancing Spatial Understanding for Embodied Intelligence

ICCV 2025 Workshop
Room 312, October 19, Honolulu, USA

Recodrings: YouTube | bilibili