Models

Every base AI generation model available in the Playground — from cinematic video to image generation.

8 min read

Models are the underlying AI engines that power everything in Ad Studio. In the Playground, you interact with them directly — no guided format, just raw generation with full control over settings. Each model has different strengths, speed profiles, and output styles.

Which model should I use?

Start with a Preset if you are new — Presets use the right model automatically. Come back to Models when you want to fine-tune or experiment beyond the guided formats.

Nano Banana 2

Advanced image generation with support for up to 16 reference images and 4K resolution output. Faster than NanoBanana Pro with very similar quality — a strong option when you need high-quality images without the longer wait time.

Best for

Quick image generation, concept art, and reference images when speed matters.

Provider: Google Gemini

Nano Banana Pro

The most advanced image generation model available on the platform. Takes 2 to 3 minutes per image but produces exceptional detail and accuracy. Supports up to 16 reference images and 4K resolution.

Best for

High-quality marketing materials, detailed product photography, and professional artwork where quality matters more than speed.

Pro tip

Turn on Google Search Grounding whenever you mention a real brand or product released in the last two years. It dramatically improves accuracy.

Google Search Grounding

A toggle available in NanoBanana Pro. Turn it on when you reference real-world products, brands, or objects — like a Bentley interior, the latest iPhone, or new AirPods. It searches Google Images to understand your reference and produce more accurate results.

Provider: Google Gemini

Sora 2 Pro

The most advanced, most popular, and most expensive video generation model on the platform. Produces the most natural, human-like movement — including the small imperfections that make a video look like it was shot on an iPhone. Best for realistic UGC, authentic social media ads, and any generation where natural lip sync and actor consistency matter.

Duration: Up to 20 seconds at full HD
Cost: Highest tier — up to 1,200 credits per video

Best for

Realistic UGC content, natural lip sync, actor and product consistency, and authentic social media video ads.

Note

With Sora 2, uploaded images act as references, not start frames. The AI uses the image for visual guidance but does not necessarily begin the video from that exact frame.

Provider: OpenAI

Sora 2 Remix

Select a Sora 2 video and describe the visual edits you want to make. Think of it as a visual editor for Sora-generated content — change the setting, lighting, background, or look without regenerating from scratch.

Best for

Transforming existing Sora videos with visual creative variations.

Warning

Trying to change what a person is saying can cause unintended visual changes to other parts of the video. Stick to visual edits — backgrounds, settings, lighting — for best results.

Provider: OpenAI

Kling 2.6

Last year's leading video model. Reliable for B-rolls and low-movement videos. Not well-suited for realistic talking heads or authentic UGC — it tends to produce smoother, more polished movement that can look less natural than Sora.

Best for

B-roll footage, product videos, and dynamic scenes where natural human movement is not the priority.

Note

For Kling 2.6, Kling 3, Kling V3 OMNI, and Seedance 1.5 Pro, uploaded images act as start frames or end frames — not references.

Provider: Kling

Kling 3.0

The newest Kling model, released February 2026. Significantly better quality, sound, and lip sync compared to Kling 2.6. Much better at keeping text consistent across frames — for example, text printed on a T-shirt stays readable and stable. Still not the best option for natural, authentic UGC movement.

Duration: Up to 15 seconds

Best for

High-quality videos, cinematic shots, and scenes where on-screen text needs to stay visually consistent.

Provider: Kling

VEO 3.1

Google's video generation model. Shorter maximum duration than most models but higher resolution output. Good for B-rolls, controlled product animations, and visuals where resolution quality is the priority. More expensive per second than comparable models.

Duration: Up to 8 seconds at up to 4K resolution

Best for

High-resolution B-rolls, product animations, and controlled visuals where output resolution matters.

Note

For VEO 3.1, uploaded images act as start frames.

Provider: Google Veo

VEO Reference

VEO 3.1 with support for up to 3 reference images. Excellent for multi-character scenes where you want to reference specific people and describe interactions between them.

Best for

Multi-character scenes, style-consistent videos, and reference-guided animations.

Note

For VEO Reference, uploaded images act as references — not start frames.

Provider: Google Veo

Seedance 1.5 Pro

Strong for B-rolls and static-to-motion content. If you have an image of a person and want subtle, natural movement — a slight head turn, a live-photo feel, someone glancing up — Seedance handles this well. Similar quality tier to Kling for B-roll work. Not the best choice for authentic UGC or social media content.

Best for

B-roll footage, subtle animation of still images, and live-photo style motion from a single image.

Note

For Seedance 1.5 Pro, uploaded images act as start frames.

Provider: Bytedance / Fal

Motion Control

Upload a video of yourself performing an action, then upload an image of a different person or setting. The AI maps the movement from your video onto the target image. Useful for specific movements that are difficult to describe in a prompt — meme-style content, creative transitions, or character animations.

Best for

Specific action replication, meme-style content, creative motion transfers, and movements that are hard to describe in text.

Provider: Kling

Kling V3 OMNI

Multi-shot scene generation — up to five shots per video, each with an independent duration and its own prompt. Best for cinematic ads, brand awareness sequences, and product showcases that require multiple connected shots (for example: someone reaching into a bag, pulling out a product, and using it).

Best for

Cinematic ads, brand awareness campaigns, and multi-shot product showcase sequences.

Pro tip

Elements are not limited to products. Use them for characters, environments, or any visual that needs to stay consistent across all shots in the same generation.

Note

Not recommended for UGC-style content. Best for brand-focused, cinematic multi-shot sequences.

Elements

A unique feature of Kling V3 OMNI. Create named elements by uploading 2 to 4 reference images with a description. Then reference them in your prompts using @elementname tags. This keeps characters, products, or objects visually consistent across every shot. You can use multiple elements per generation and combine them with multi-shot for complex sequences.

Provider: Kling

GPT Image

Standard image generation powered by OpenAI. Faster and cheaper than NanoBanana Pro. Best for quick visualizations, concept iterations, or any time you need an image fast without requiring maximum quality.

Best for

Quick image generation, fast concept visualization, and situations where speed matters more than quality.

Provider: OpenAI

Ad Studio Presets