Learning to See:
Advancing Spatial Understanding for Embodied Intelligence

ICCV 2025 Workshop
October 19, Honolulu, USA

Introduction

The world is three-dimensional. This fact was first seen by trilobites, the first organisms capable of sensing light. From that moment, nervous systems began to evolve, gradually transforming mere sight into insight, understanding, and action. All these combined gives rise to intelligence.

Despite remarkable technological advancements in recent decades, modern embodied systems remain far from achieving full intelligence. They fall short in several key aspects: (i) contain information necessary for physical interaction, such as temporal dynamics of the scene; (ii) have a prior over semantic relevance, and should focus on task-relevant features like objects and their relationships; and (iii) be compact, avoiding the inclusion of irrelevant details, such as background elements. Attempts have been made, including integrating foundational models and utilizing large-scale data. Yet, the path to true intelligence remains long, with significant progress still required.

Contact

Schedule

Full-day workshop. The schedule is to be announced.

Speakers

Matthias Nießner

Professor TUM

Jia Deng

Associate Professor Princeton

Pulkit Agrawal

Associate Professor MIT

Katerina Fragkiadaki

Associate Professor CMU

Yunzhu Li

Assistant Professor Columbia

Ankit Goyal

Research Scientist NVIDIA

Li Chen

Research Scientist HKU

Organizers

- Hongyang Li HKU
- Philipp Krähenbühl UT Austin
- Kashyap Chitta NVIDIA
- Eric Jang 1X Technologies
- Huijie Wang OpenDriveLab