Vista is a generative driving world model. Built on Stable Video Diffusion it can generate driving scenes conditioned on a single input image and optional, additional control inputs. In this example we visualize the latent diffusion steps and the generated, decoded image sequence.
This is an external example, check the repository for more information.
You can try the example on Rerun's HuggingFace space here.
If you have a GPU with ~20GB of memory you can run the example locally. To do so, clone the repo and run:
pixi run example