Workshop

Autonomous systems, such as robots and self-driving cars, have rapidly evolved over the past decades. Recently, foundation models have emerged as a promising approach to building more generalist autonomous systems due to their ability to learn from vast amounts of data and generalize to new tasks. The motivation behind this workshop is to explore the potential of foundation models for autonomous agents and discuss the challenges and opportunities associated with this approach.

Contact

Contact us via [email protected]

Join discussions in our WeChat group

Autonomous Grand Challenge

The field of autonomy is rapidly evolving, and recent advancements from the machine learning community, such as large language models (LLM) and world models, bring great potential. We believe the future lies in explainable, end-to-end models that understand the world and generalize to unvisited environments. In light of this, we propose seven new challenges that push the boundary of existing perception, prediction, and planning pipelines.

Check out the Challenge website for more details.

Schedule

Time zone:

Time		Speaker	Theme	Content
		Hongyang Li	Opening Remarks
		Sergey Levine UC Berkeley, USA	Robotic Foundation Models
		Sherry Yang Google DeepMind, USA	Foundation Models as Real-World Simulators
		Coffee Break	☕️
		Alex Kendall Wayve, UK	Building Embodied AI to be Safe and Scalable
		Autonomous Grand Challenge Part Ⅰ	End-to-End Driving at Scale Introduction (5 min, Igor Gilitschenski) Innovation Award & Outstanding Champion (10 min) Predictive World Model Introduction (5 min, Zetong Yang) Innovation Award & Honorable Runner-up (10 min) Multi-View 3D Visual Grounding Introduction (5 min, Tai Wang) Innovation Award & Outstanding Champion (10 min) Occupancy and Flow Introduction (5 min, Jiazhi Yang) Outstanding Champion (10 min)
		Lunch Break	Poster Session of Autonomous Grand Challenge
		Rares Ambrus Toyota Research Institute, USA	Visual Foundation Models for Embodied Applications
		Autonomous Grand Challenge Part Ⅱ	Mapless Driving Introduction (5 min, Huijie Wang) Innovation Award & Outstanding Champion (10 min) Driving with Language Introduction (5 min, Chonghao Sima) 1st Place (5 min) 2nd Place (5 min) CARLA Autonomous Driving Challenge Introduction (5 min, Matt Rowe) Innovation Award & Outstanding Champion (10 min)
		Andrei Bursuc Valeo, France	Foundation Models in the Automotive Industry
		Coffee Break	☕️
		Ted Xiao Google DeepMind, USA	What's Missing for Robotics-first Foundation Models?
		Li Chen Shanghai AI Lab, China	Visual World Models as Foundation Models for Autonomous Systems
		Panel Host: Anthony Hu	Challenges in Building Foundations Models for Embodied AI Panelist: Andrei Bursuc, Alex Kendall, Hongyang Li, Christos Sakaridis, Ted Xiao

Brief takeaways on the afternoon session by Christos Sakaridis

Visual Foundation Models for Embodied Applications

- Metric, domain-generic, and uncertainty-aware character of geometric representations and models is key to embodied AI.
- Motion is a central cue for unsupervised learning of semantic and geometric features and concepts.

Foundation Models in the Automotive Industry

- Two stages: large-scale self-supervised pre-training, followed by versatile supervised adaptation/fine-tuning to different tasks/inputs-outputs.
- Modular pre-training is preferable for addressing diverse setups.
- Driving scenes usually do not have the diversity of internet-scale data: data richness is very important for driving-related pre-training.

What's Missing for Robotics-first Foundation Models?

- Narrow communication bottlenecks between different intelligence spaces.
- Missing properties from robotics foundation models: positive transfer from scaling, promptability, scalable evaluation.
- Injecting physics, kinematics, and trajectory information into our robotic foundation models can upgrade language-heavy models.

Visual World Models as Foundation Models for Autonomous Systems

- LiDARs capture accurate geometric cues while cameras capture rich semantic cues: combine them for pre-training representations for automated driving.
- BEV models are useful for building world models.