Veo 3.1 in the Gemini API: Build AI Video Generation Into Your Apps

Google released Veo 3.1 into the Gemini API this week, and it shipped with a new Lite tier that cuts cost by more than half. Three model variants, one API surface, and you can now generate 8-second 4K videos with natively synchronized audio directly from Python. If you’ve been watching the AI video generation space and waiting for something that fits a real developer workflow — not a creative studio’s budget — this is the release worth your attention.

Three Tiers, One API

Google is offering three variants under the Veo 3.1 family, and the choice between them isn’t complicated once you map them to actual use cases:

Veo 3.1 — 4K output, cinematic spatial audio, highest character consistency. Use this when quality is the constraint and cost isn’t — final hero clips, client-facing demos, premium feature output.
Veo 3.1 Fast — Nearly identical visual quality at roughly 2x the generation speed. The default choice for most production apps. Priced at around $0.15/second (~$1.20 per 8-second clip with audio included).
Veo 3.1 Lite — Less than 50% of Veo 3.1 Fast’s cost at the same speed. Built for high-volume generation: B-roll pipelines, prototype outputs, bulk marketing content. Roughly $0.06/second means an 8-second clip for under $0.50.

Most teams will settle on Veo 3.1 Fast as their default and reach for Lite when they need to generate at scale. The full Veo 3.1 makes sense only when 4K and audio perfection genuinely matter to the output.

Native Audio Is the Real Story Here

Sora 2 produces impressive visuals. It also ships silent. Every video from OpenAI’s model needs audio added in post — foley, SFX, dialogue, ambient sound — which means extra tooling, extra time, and extra cost in every production pipeline. Veo 3.1 generates all of that natively, in sync, at 48kHz stereo with spatial audio that follows on-screen motion. One developer put it plainly in a community thread: “Sora videos are beautiful but silent — useless for most content.”

The audio improvement over Veo 3 is not incremental. Veo 3 had functional sound effects. Veo 3.1 has cinematic sound design — reverb that reflects the scene’s environment, audio that pans as subjects move, ambient sound that matches the lighting and location cues in the prompt. For product demos, social content, and app features that surface video to users, this changes the production math significantly.

New Capabilities: The Pipeline Features

Beyond audio, three new capabilities make Veo 3.1 useful in production pipelines rather than just as a standalone generation tool:

Portrait mode (9:16) — Generate mobile-first vertical video natively. No cropping, no letterboxing. Set the aspect ratio in your config and you get content built for Reels, Shorts, and TikTok-formatted app surfaces.
Video extension — Pass an existing Veo clip back as input and extend it forward. Build longer sequences by chaining generations — three 8-second clips gives you 24 seconds of coherent video.
First/last frame control — Specify exact start and end frames; the model fills the transition. Character consistency across this boundary is significantly better than Veo 3, which makes serialized content finally practical.

The Code: Minimal Working Example

Install the SDK and get your API key from Google AI Studio:

pip install google-genai

Text-to-video with the Fast model — the async pattern you need for production:

import time
from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

operation = client.models.generate_videos(
    model="veo-3.1-fast-generate-preview",
    prompt="Developer typing code at a keyboard, close-up, neon-lit room, cinematic",
    config=types.GenerateVideosConfig(
        number_of_videos=1,
        duration_seconds=8,
        generate_audio=True,
        enhance_prompt=True,
    )
)

while not operation.done:
    time.sleep(10)
    operation = client.operations.get(operation)

for video in operation.response.generated_videos:
    video.video.save("output.mp4")

For image-to-video — useful for product demos or reference-based generation:

operation = client.models.generate_videos(
    model="veo-3.1-fast-generate-preview",
    prompt="The product slowly rotates, studio lighting, clean background",
    image=types.Image.from_file("product.jpg"),
)

Note the async design. Generation takes one to two minutes even on the Fast tier. Build your app around this from the start: queue submissions, poll asynchronously, and never block a user thread waiting on a video. The official Gemini API video generation docs have the full parameter reference.

Cost and Production Reality

Pricing is per second of video generated, audio included. Veo 3.1 standard runs about $0.40/second. Veo 3.1 Fast is around $0.15/second. Veo 3.1 Lite gets under $0.08/second. An 8-second clip via the Fast tier costs roughly $1.20. Generate 1,000 clips and you’re looking at $1,200 for that batch — not cheap, but within the range of a real production budget for the right use case.

Rate limits are a legitimate concern: 50 requests per minute at baseline, 10 on Vertex AI. If your app has any burst generation requirements, queue management is not optional — it’s the first thing to architect.

The honest assessment: some developers have noted that Veo 3.1 doesn’t consistently beat Sora 2 on pure visual quality in direct comparisons. That’s a fair point. What those comparisons miss is the audio-native output, the video extension capability, the Firebase and Vertex AI integration, and the three-tier pricing that makes high-volume generation economically viable. According to Google’s developer blog, Veo 3.1 is designed specifically for developers building production applications — not just creative tools.

What to Try First

Start with Google AI Studio — no GCP setup, direct browser access, and you can prototype with the Veo 3.1 models before writing any API calls. When you’re ready to build programmatically, use the Python SDK with the Fast model and build in the async polling pattern from day one. If you’re already on a Google Cloud stack, the Vertex AI path gives you the rate controls and enterprise billing you’ll need at production scale.

The video generation API space has been fragmented and expensive. Veo 3.1 with native audio and a three-tier pricing model is the clearest signal yet that AI video is becoming a standard developer primitive — something you add to an app the same way you add text generation or image analysis. The full announcement from Google has additional context on what’s coming next for the API.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.