Vista: A Generalizable Driving World Model with
High Fidelity and Versatile Controllability

1Hong Kong University of Science and Technology;  2OpenDriveLab at Shanghai AI Lab; 
3University of Tübingen;  4Tübingen AI Center;  5University of Hong Kong
*Equal advising.

1. High-Fidelity Open-World Prediction

Videos in this section are: 5 seconds, 10 Hz, 576×1024 resolution.

2. Continuous Long-Horizon Rollout

Videos in this section are: 16 seconds, 10 Hz, 576×1024 resolution.

3. Zero-Shot Action Controllability

In this section, we use either [trajectory] or [angle+speed] to control the ego-vehicle.
Hover the mouse to see the action types that are derived from [trajectory] and [angle+speed] for demonstration clarity.