Models
Every base AI generation model available in the Playground — from cinematic video to image generation.
Models are the underlying AI engines that power everything in Ad Studio. In the Playground, you interact with them directly — no guided format, just raw generation with full control over settings. Each model has different strengths, speed profiles, and output styles.
Nano Banana 2
Nano Banana 2
Advanced image generation with support for up to 16 reference images and 4K resolution output. Faster than NanoBanana Pro with very similar quality — a strong option when you need high-quality images without the longer wait time.
Provider: Google Gemini
Nano Banana Pro
Nano Banana Pro
The most advanced image generation model available on the platform. Takes 2 to 3 minutes per image but produces exceptional detail and accuracy. Supports up to 16 reference images and 4K resolution.
Google Search Grounding
A toggle available in NanoBanana Pro. Turn it on when you reference real-world products, brands, or objects — like a Bentley interior, the latest iPhone, or new AirPods. It searches Google Images to understand your reference and produce more accurate results.
Provider: Google Gemini
Sora 2 Pro
Sora 2 Pro
The most advanced, most popular, and most expensive video generation model on the platform. Produces the most natural, human-like movement — including the small imperfections that make a video look like it was shot on an iPhone. Best for realistic UGC, authentic social media ads, and any generation where natural lip sync and actor consistency matter.
- Duration: Up to 20 seconds at full HD
- Cost: Highest tier — up to 1,200 credits per video
Provider: OpenAI
Sora 2 Remix
Sora 2 Remix
Select a Sora 2 video and describe the visual edits you want to make. Think of it as a visual editor for Sora-generated content — change the setting, lighting, background, or look without regenerating from scratch.
Provider: OpenAI
Kling 2.6
Kling 2.6
Last year's leading video model. Reliable for B-rolls and low-movement videos. Not well-suited for realistic talking heads or authentic UGC — it tends to produce smoother, more polished movement that can look less natural than Sora.
Provider: Kling
Kling 3.0
Kling 3.0
The newest Kling model, released February 2026. Significantly better quality, sound, and lip sync compared to Kling 2.6. Much better at keeping text consistent across frames — for example, text printed on a T-shirt stays readable and stable. Still not the best option for natural, authentic UGC movement.
- Duration: Up to 15 seconds
Provider: Kling
VEO 3.1
VEO 3.1
Google's video generation model. Shorter maximum duration than most models but higher resolution output. Good for B-rolls, controlled product animations, and visuals where resolution quality is the priority. More expensive per second than comparable models.
- Duration: Up to 8 seconds at up to 4K resolution
Provider: Google Veo
VEO Reference
VEO Reference
VEO 3.1 with support for up to 3 reference images. Excellent for multi-character scenes where you want to reference specific people and describe interactions between them.
Provider: Google Veo
Seedance 1.5 Pro
Seedance 1.5 Pro
Strong for B-rolls and static-to-motion content. If you have an image of a person and want subtle, natural movement — a slight head turn, a live-photo feel, someone glancing up — Seedance handles this well. Similar quality tier to Kling for B-roll work. Not the best choice for authentic UGC or social media content.
Provider: Bytedance / Fal
Motion Control
Motion Control
Upload a video of yourself performing an action, then upload an image of a different person or setting. The AI maps the movement from your video onto the target image. Useful for specific movements that are difficult to describe in a prompt — meme-style content, creative transitions, or character animations.
Provider: Kling
Kling V3 OMNI
Kling V3 OMNI
Multi-shot scene generation — up to five shots per video, each with an independent duration and its own prompt. Best for cinematic ads, brand awareness sequences, and product showcases that require multiple connected shots (for example: someone reaching into a bag, pulling out a product, and using it).
Elements
A unique feature of Kling V3 OMNI. Create named elements by uploading 2 to 4 reference images with a description. Then reference them in your prompts using @elementname tags. This keeps characters, products, or objects visually consistent across every shot. You can use multiple elements per generation and combine them with multi-shot for complex sequences.
Provider: Kling
GPT Image
GPT Image
Standard image generation powered by OpenAI. Faster and cheaper than NanoBanana Pro. Best for quick visualizations, concept iterations, or any time you need an image fast without requiring maximum quality.
Provider: OpenAI