← Back to blog · 12 min read · Published 2026-04-06 · Updated 2026-04-20 · By Flik AI
How to Prompt AI Video: The Complete 2026 Framework
The six-part prompt formula that turns two-word ideas into cinematic clips across Veo 3.1, Kling 3.0 Pro, Seedance 2.0, and Hailuo 2.3 — with real examples.
Bad AI video prompts are two words. Good AI video prompts are six sentences with intent. The difference isn't length — it's structure. This guide is the prompting framework Flik AI's Agent mode uses internally to rewrite user briefs before they hit Veo 3.1, Kling 3.0 Pro, Seedance 2.0, or Hailuo 2.3. Use it in Manual mode to get the same quality without the agent.
Why generic AI video prompts fail
When you write "a cyclist on a street" as a prompt, every frontier model has to make roughly a hundred decisions that you didn't specify — gender, age, clothing, bike model, time of day, weather, camera angle, camera movement, lighting direction, color palette, frame rate, style reference. Without guidance, each model picks defaults that optimize for its training data — which is why two-word prompts feel generic regardless of which model you use.
The fix isn't longer prompts. It's more structured prompts. Film directors don't say "make a movie" — they specify subject, context, action, style, camera, and lighting. AI video responds to the same framework.
1. Subject — who or what is in the frame
Describe the subject with specific detail. Not "cyclist" — "a lone cyclist in a red wind-breaker, mid-30s, hands locked on drop-bars." Specificity feeds the model's ability to generate a coherent figure across frames. Gender, age, clothing, posture, accessories, species (if non-human), emotional state all help.
2. Context — where and when
Establish location, time of day, and weather in a single phrase. "On a wet Tokyo street at 2am" sets all three. All three influence lighting, palette, motion physics, and mood. If you skip this, the model invents — and its invention usually feels generic.
3. Action — what are they doing
Specific action verbs matter more than abstract ones. "Slowly turning the handlebars right" is more deterministic than "biking." Action verbs compound with subject detail — "the cyclist slowly turns the handlebars right, glancing at a neon sign over their shoulder" is a directable shot; "the cyclist bikes" is not.
4. Style — the aesthetic lever
Style is the biggest single lever in the prompt. A swap from "cinematic" to "anime" rewrites the entire output. Useful style vocabulary: cinematic, anime, documentary, noir, vintage 16mm, Studio Ghibli, Wes Anderson symmetry, Blade Runner neon, A24 palette, HBO prestige, Saturday morning cartoon, claymation, stop-motion, 8-bit, vaporwave.
Use style vocabulary the model has seen — proper-noun references ("Christopher Doyle cinematography", "Roger Deakins lighting") consistently produce stronger output than abstract descriptors.
5. Camera — direct the shot
Lead with a cinematography term: tracking shot, dolly-in, push-in, pull-back, whip pan, crane shot, handheld, locked-off, Dutch angle, over-the-shoulder, two-shot. Frontier models interpret cinematography language directly — "slow push-in" produces very different motion from "slow zoom" (which triggers a lens zoom vs camera move).
If you want a specific lens feel, specify it: "shot on 35mm anamorphic, shallow depth of field" produces cinema-lens aesthetics; "shot on iPhone, handheld" produces casual-documentary aesthetics.
6. Lighting and mood
Specify illumination direction, color, and mood. "Moody blue-magenta neon reflections," "golden-hour backlight," "low-key noir with hard shadows," "soft overcast window light." Lighting is what separates an amateur-feeling render from a cinematic one — more than resolution or motion quality.
Full prompt example
That prompt hits all six parts: (1) subject = "lone cyclist in a red wind-breaker, mid-30s, hands locked on drop-bars", (2) context = "on a wet Tokyo street at 2am", (3) action = "slowly turning the handlebars right while glancing over their shoulder at a neon sign", (4) style = "cinematic", (5) camera = "tracking shot, shot on 35mm anamorphic, shallow depth of field, slow push-in", (6) lighting = "moody blue-magenta neon reflections on pavement, light rainfall."
This prompt runs cleanly on Veo 3.1, Kling 3.0 Pro, and Seedance 2.0. The output looks different per model, but the creative intent is locked — which is the point of the formula.
Model-specific prompting tips
Veo 3.1 — add audio direction
Veo 3.1 generates synchronized audio natively. Include audio direction in the prompt: "soft footfall, distant city hum, light rain on pavement" produces a richer output than video-only prompts. For dialogue, include the line explicitly in quotes within the prompt: "the cyclist whispers 'I shouldn't be here' as they look over their shoulder."
Kling 3.0 Pro — specify 4K intent
Kling 3.0 Pro outputs native 4K/60fps but defaults can be lower. Anchor the output: "broadcast-ready 4K, cinematic color grade, film grain" reliably locks the 4K tier. Kling also responds well to multi-shot direction: "three-shot sequence: establishing wide, medium on the cyclist, close-up on the handlebars" produces coherent shot progression.
Seedance 2.0 — lean on references
Seedance 2.0 is strongest when you provide references. With a 9-image moodboard, short text prompts work fine because the references carry the style. With no references, write longer prompts. For beat-sync, attach the audio track and tell Seedance what to do on the downbeats: "camera push-in on the first downbeat, whip pan on the chorus hit."
Hailuo 2.3 — describe motion intensity
Hailuo 2.3 is tuned for kinetic output but responds to motion-intensity language: "explosive", "snap movement", "whip-fast", "decelerate-to-frame" all unlock its strengths. For sports and stunts, specify the physics: "boxer throws a right cross, slow-mo impact, water spraying from the glove."
Common prompting mistakes
- Prompt ambiguity — "something futuristic" means a different thing to every model. Be specific.
- Conflicting instructions — "cinematic handheld" fights itself; pick one camera grammar per prompt.
- Over-stuffing — packing 30 adjectives dilutes signal. 15–25 words per part is the sweet spot.
- No reference when one would help — if the style is hard to articulate, use a reference image or video instead of 100 words.
- Skipping lighting — the most underused of the 6 parts. Always specify.
How to iterate productively
Once you have a v1, don't rewrite the whole prompt. Change one variable at a time and regenerate. Change lighting only → compare. Change camera only → compare. Change style only → compare. Flik AI keeps prior takes side-by-side so you can A/B across the six dimensions and keep whatever wins.
This approach surfaces what each variable contributes. After 3–5 generations you'll have a strong sense of which direction to push — and the keeper prompt is almost always the merge of the best single-variable choices, not the first prompt you wrote.
Model-specific prompting guides
Each frontier model has its own prompting quirks beyond the 6-part formula. We've published dedicated prompting guides for every supported model:
- /veo-3-prompts — Veo 3.1 dialogue and native-audio prompting
- /kling-prompts — Kling 3.0 Pro 4K and multi-shot prompting
- /seedance-prompts — Seedance 2.0 multimodal and beat-sync prompting
- /hailuo-prompts — Hailuo 2.3 kinetic and action prompting
- /seedream-prompts — Seedream 4.5 image prompting
- /nano-banana-prompts — Nano Banana Pro/2 image editing prompting
- /elevenlabs-prompts — ElevenLabs 3.0 voice and emotion prompting
- /suno-prompts — Suno 5.0 music prompting
The bottom line
The six-part formula works because it mirrors how film sets actually brief shots. Every professional production gets a subject, context, action, style, camera, and lighting direction before the camera rolls. AI video prompting isn't magic — it's shot-briefing, typed into a prompt box. Master the formula and the model matters less.
To try the formula inside Flik AI, open any template from /templates, rewrite its starter prompt using the 6-part structure, and compare outputs across models. The creative agent will also pick the right model for you in Agent mode — see /agent-mode for how.
Tags: prompting how-to ai video cinematography guide tutorial
Frequently asked questions
What is the best way to prompt AI video in 2026?
Use the six-part cinematographer's formula: (1) Subject, (2) Context, (3) Action, (4) Style, (5) Camera, (6) Lighting. This structure works across every frontier AI video model — Veo 3.1, Kling 3.0 Pro, Seedance 2.0, Hailuo 2.3. Two-word prompts produce generic output; six-part prompts produce cinematic output.
How long should an AI video prompt be?
Aim for 60–120 words total across the six parts (10–20 words per part). Longer isn't better — packing 30 adjectives dilutes signal. If style is hard to articulate in words, use a reference image or video instead.
Do I need to prompt differently for each AI video model?
The 6-part formula is universal. Each model has specific tips on top of the formula — Veo 3.1 benefits from audio direction, Kling 3.0 Pro benefits from "4K" anchoring, Seedance 2.0 benefits from references, Hailuo 2.3 benefits from motion-intensity verbs. See our model-specific prompting guides at /veo-3-prompts, /kling-prompts, /seedance-prompts, /hailuo-prompts.
How do I control the camera in an AI video prompt?
Lead with a cinematography term: tracking shot, dolly-in, push-in, pull-back, whip pan, crane shot, handheld, locked-off, Dutch angle. Frontier models interpret cinematography vocabulary directly. Specify lens feel for additional control: "shot on 35mm anamorphic, shallow depth of field."
How do I iterate on AI video prompts efficiently?
Change one variable at a time — lighting only, then camera only, then style only — and compare. Don't rewrite the whole prompt between generations. Flik AI keeps prior takes side-by-side so you can A/B across the six dimensions and merge the best single-variable choices into a final winning prompt.
Can AI models handle dialogue in video prompts?
Veo 3.1 from Google DeepMind generates synchronized dialogue natively — include the line in quotes inside the prompt: "the cyclist whispers 'I shouldn't be here.'" Other frontier models (Kling 3.0 Pro, Seedance 2.0, Hailuo 2.3) generate ambient audio but dialogue is weaker; use ElevenLabs 3.0 for voice layered over the video.
Related posts
Try Flik AI · More posts · FAQ · Pricing
Home · AI Video Generator · Text to Video · Image to Video · Veo 3.1 · Seedance 2.0 · Kling 3.0 Pro · Seedream 4.5 · ElevenLabs 3.0 · Suno 5.0 · Pricing
© 2026 Flik AI. All rights reserved.