Vista driving world model

Background background

Vista is a generative driving world model. Built on Stable Video Diffusion it can generate driving scenes conditioned on a single input image and optional, additional control inputs. In this example we visualize the latent diffusion steps and the generated, decoded image sequence.

Run the code run-the-code

This is an external example, check the repository for more information.

You can try the example on Rerun's HuggingFace space here.

If you have a GPU with ~20GB of memory you can run the example locally. To do so, clone the repo and run:

pixi run example