Definition

Latent diffusion

The deep learning technique behind most modern AI video models: iteratively denoising a compressed (latent) representation of a video.

Latent diffusion is the technique behind most modern AI video and image models, including Sora 2, Veo 3.1, Kling, and Stable Diffusion. Instead of generating pixels directly, the model works in a 'latent space' — a compressed representation of an image or video. It starts from random noise in latent space and iteratively denoises toward a coherent output that matches the prompt. The denoised latent is then decoded back into pixel space. Latent diffusion is computationally much cheaper than pixel-space diffusion, which is what made high-resolution AI image and video generation practical. Most AI video models in VIBE use a variant of latent diffusion.

Related terms

AI video model
Text-to-video

Make AI video inside VIBE

19 AI video models. Free starter generations. iPhone, Android, and web.