Definition
Text-to-video
An AI video generation method where the input is a text prompt and the output is a generated video clip.
Text-to-video (T2V) is the most common form of AI video generation. The user writes a text prompt — for example, 'Cinematic drone shot of a dragon over a frozen fjord at twilight' — and the AI model generates a clip that tries to match. Text-to-video models include Sora 2, Google Veo 3.1, Kling 3.0, Hailuo, Seedance 2.0, and most other major AI video models. The quality of a text-to-video output depends heavily on prompt craft: cinematic language (camera moves, lens, lighting) tends to produce better results than literal descriptions. VIBE includes 19 text-to-video models you can compare side-by-side using the same prompt.
Related terms
Make AI video inside VIBE
19 AI video models. Free starter generations. iPhone, Android, and web.