Learning to See:
Advancing Spatial Understanding for Embodied Intelligence

ICCV 2025 Workshop
Honolulu, USA

Introduction

The world is three-dimensional. This fact was first seen by trilobites, the first organisms capable of sensing light. From that moment, nervous systems began to evolve, gradually transforming mere sight into insight, understanding, and action. All these combined gives rise to intelligence.

Despite remarkable technological advancements in recent decades, modern embodied systems remain far from achieving full intelligence. They fall short in several key aspects: (i) contain information necessary for physical interaction, such as temporal dynamics of the scene; (ii) have a prior over semantic relevance, and should focus on task-relevant features like objects and their relationships; and (iii) be compact, avoiding the inclusion of irrelevant details, such as background elements. Attempts have been made, including integrating foundational models and utilizing large-scale data. Yet, the path to true intelligence remains long, with significant progress still required.

Contact

Schedule

Full-day workshop. The schedule is to be announced.

Speakers

The speakers is to be announced.

Organizers

- Hongyang Li HKU
- Philipp Krähenbühl UT Austin
- Kashyap Chitta NVIDIA
- Eric Jang 1X Technologies
- Huijie Wang OpenDriveLab