VEO 3.1 Video Generation

Overview

VEO 3.1 is Google’s flagship AI video generation series, producing video with synchronized audio natively — fixed 8-second clips from text prompts or reference images. APIYI exposes VEO 3.1 through a reverse-engineered channel that proxies Google Flow, billed per-clip with both synchronous streaming and async task modes.

🎬 Highlights: Native synchronized audio + video output, fixed 8-second clips, Frame-to-Video creative mode, HD portrait/landscape, dramatically lower pricing than Google official (from $0.15), and live progress streaming. Best for short-form video, ad clips, product demos, and social-media assets in high-throughput production scenarios.

Sync API

POST /v1/chat/completions, reuses the OpenAI Chat Completions protocol with stream: true for live progress.

Async API

POST /v1/videos three-step async flow, supports text-to-video and Frame-to-Video uploads — built for batch management.

Why APIYI’s VEO 3.1?

VEO 3.1 is delivered through a reverse-engineered channel (transparent proxy to Google Flow), optimized for production scenarios across price, integration friction, and feature completeness:

Price Killer · Far Below Official Pricing

Starts at $0.15 per 8-second clip — over 80% cheaper than Google’s official pricing. No need to provision Google Cloud / Vertex AI accounts; per-clip billing is fully transparent.

Unlimited Concurrency · Production Scale

APIYI maintains a transparent account pool — linearly scale batch shoots, short-form video matrices, and ad pipelines. No Google account tier ceilings.

Same Per-Clip Pricing + Top-Up Bonuses

Stack top-up bonuses for further savings. Failed generations are not billed — settlement is by successful results only.

Global Zero-Friction Access

No overseas server or proxy required — connect to api.apiyi.com directly from Mainland China data centers, residential networks, or overseas nodes. Skip the Google Flow cross-border setup entirely.

OpenAI-Compatible · Dual-Mode Access

Sync uses /v1/chat/completions (same as chat models); async uses /v1/videos (OpenAI Video API style). Both protocols drop into your existing SDK / engineering code with zero changes.

Professional Support · Enterprise Onboarding

Our team has deep video-generation expertise: prompt engineering, Frame-to-Video reference prep, batch production, and post-processing. Full PoC-to-production technical support for enterprise customers.

Key Features

Native Synchronized Audio

VEO 3.1 outputs video with synchronized native audio (ambient sound, dialogue, score) generated alongside the visuals — no separate audio post-production needed.

Generation Speed Leader

-fast series in 30–60 seconds, standard series in 1–2 minutes — 50% faster than Sora 2, ideal for high-throughput content production.

Frame-to-Video Creative Mode

-fl suffix models accept 1 reference image (start frame) or 2 (start + end frames) to animate static visuals or generate seamless transitions between two frames.

Portrait / Landscape Switching

Portrait 720×1280 (social-media short-form) and landscape 1280×720 (ads, demos) — toggled via the -landscape model suffix.

Live Streaming Progress

Sync mode (/v1/chat/completions + stream: true) returns real-time > 🏃 Progress: XX% text fragments — your frontend can render a progress bar directly.

Async Task Model

Async mode returns a video_id for independent polling and download — ideal for batch management, resume-on-failure, and long-running background jobs.

Pay on Success

Failed generations / content-policy rejections / capacity errors are not billed — you only pay for the videos you actually receive.

Multi-Video Parallel (n parameter)

Sync mode n parameter generates up to 4 different videos per request (same prompt, multiple results) for variety selection.

Pricing

Billed per clip (each clip is a fixed 8-second video). Only successfully generated videos are billed — failed tasks are free.

HD Series (720p, Live)

Model	Description	Resolution	Price
`veo-3.1`	Default portrait	720×1280	$0.25
`veo-3.1-fl`	Portrait + Frame-to-Video	720×1280	$0.25
`veo-3.1-fast`	Portrait + fast	720×1280	$0.15
`veo-3.1-fast-fl`	Portrait + fast + Frame-to-Video	720×1280	$0.15
`veo-3.1-landscape`	Landscape	1280×720	$0.25
`veo-3.1-landscape-fl`	Landscape + Frame-to-Video	1280×720	$0.25
`veo-3.1-landscape-fast`	Landscape + fast	1280×720	$0.15
`veo-3.1-landscape-fast-fl`	Landscape + fast + Frame-to-Video	1280×720	$0.15

4K Series (Rolling Out)

4K HD variants are rolling out. Model variants will cover the same matrix (portrait / landscape × standard / fast × text-to-video / Frame-to-Video), with naming following the HD series convention. Per-clip pricing will be added to this table once finalized; enterprise customers with batch needs can contact sales for early access.

Billing notes:

Per-clip billing: Each 8-second video is a fixed unit price, independent of prompt length, reference images, or n (n=2 means billed for 2 clips)
Failures are free: Tasks ending in failed / content-policy rejection / gateway errors are not billed — retry safely
Top-up bonuses: See Top-Up Promotions

Technical Specs

Dimension	Spec
Base model name	`veo-3.1` (HD) / 4K series TBD
Variant axes	Orientation (portrait/landscape) × Speed (standard/fast) × Mode (text-only / Frame-to-Video `-fl`)
Video duration	Fixed 8 seconds (not adjustable)
HD resolutions	Portrait 720×1280, landscape 1280×720
4K resolutions	Rolling out, specs TBD
Audio track	✅ Synchronized native audio
Frame-to-Video (-fl)	✅ Models with `-fl` suffix; 1 image (start frame) or 2 images (start + end)
Sync generation time	`-fast` series 30–60 sec, standard series 1–2 min
Sync progress streaming	✅ `/v1/chat/completions` + `stream: true`
Async polling	✅ `/v1/videos` + task ID + `/content` download
`n` parameter	Sync mode max 4 per request (async mode recommended at 1)
Video URL TTL	24 hours

API Endpoints

Endpoint	Method	Purpose	Content-Type
`/v1/chat/completions`	POST	Sync streaming generation (recommended for real-time UX)	`application/json`
`/v1/videos`	POST	Async task: submit text-to-video or Frame-to-Video	`application/json` or `multipart/form-data`
`/v1/videos/{video_id}`	GET	Async poll task status	—
`/v1/videos/{video_id}/content`	GET	Async download video URL	—

Domain options: api.apiyi.com is the primary endpoint. vip.apiyi.com / b.apiyi.com are equivalent backup gateways with identical behavior.

Key Parameters

Model Variant Naming Rules

VEO 3.1 toggles capabilities via model name suffixes — not separate parameters:

Suffix	Effect	Default (no suffix)
`-landscape`	Landscape (1280×720)	Portrait (720×1280)
`-fast`	Fast tier (speed-first, lower price)	Standard tier
`-fl`	Frame-to-Video (requires uploaded image)	Pure text-to-video

Combination examples:

veo-3.1 — Standard portrait text-to-video (default)
veo-3.1-landscape-fast — Fast landscape text-to-video (best value)
veo-3.1-landscape-fl — Standard landscape Frame-to-Video
veo-3.1-landscape-fast-fl — Fast landscape Frame-to-Video (cheapest image-to-video)

-fl models require input_reference image upload, otherwise you get an error; pure text-to-video must not use the -fl suffix
Async Frame-to-Video requests must use multipart/form-data (not JSON); upload 1 image for start frame, 2 for start + end
Combining 4 axes yields 8 HD model IDs total — suffix order is fixed: landscape → fast → fl

`n` (Number of Videos per Sync Request)

Range: 1 to 4, default 1
Only the sync mode (/v1/chat/completions) supports n; async mode ignores it
Billed per video (n=2 means billed for 2 clips)

Best Practices

Validate prompts with -fast first

Run each new prompt at veo-3.1-fast or veo-3.1-landscape-fast first ($0.15, 30–60 seconds), then switch to standard tier for the final asset.

Pick orientation by use case

Social-media short-form (TikTok, Reels) → portrait (no -landscape)
YouTube / ads / product demos → landscape (-landscape)

Sync vs async by need

Need live progress feedback to users → sync streaming (/v1/chat/completions + stream: true)
Background batch processing or long tasks → async task model (/v1/videos + polling)
Details: Sync API / Async API

Frame-to-Video prompts focus on "motion"

-fl models already define visuals (start frame or start+end frames). The prompt should focus on how the image animates: camera motion, object motion, lighting changes, character expressions. Example: "Camera slowly pushes in, leaves gently swaying, sunlight flickering through branches".

Frame-to-Video shines for "transitions"

The strongest Frame-to-Video use case is smooth transitions between two frames (day → night, season changes, expression shifts, object morphing). Describe the transition process and motion changes — no need to detail visuals.

Client timeout ≥ 2 minutes

Sync streaming holds the connection until generation completes (-fast ≈ 60 sec, standard ≈ 2 min) — set client timeout to 120 seconds minimum. Async POST submission is sub-second, but use 30 seconds as a baseline.

Download videos immediately

Video URLs expire in 24 hours. Production flows must download to your own OSS / CDN as soon as completed to avoid expired links.

Run multiple tasks via n or parallel POSTs

Same prompt, multiple variants → use n: 4 for 4 results in one call
Different prompts in batch → submit multiple async POSTs, each with an independent video_id, then poll independently

Error Codes & Retries

Status	Meaning	Recommended Action
`400`	Invalid parameters (model name doesn’t exist, `-fl` missing image, `n` out of range)	Validate parameters; Frame-to-Video must use multipart upload
`401` / `invalid_api_key`	Invalid API Key	Check Bearer Token; verify console group setting
`403`	Content-policy rejection	Adjust prompt; ensure reference images are non-sensitive
`429` / `quota_exceeded`	Rate limit / quota exceeded / insufficient balance	Exponential backoff; contact sales for higher quota
`5xx`	Gateway / upstream error	Retry async tasks 1–2 times (no charge)
Task `failed`	Generation failed (mostly content policy or upstream capacity)	Adjust prompt and retry; failed task is not billed
`video_not_found`	video_id doesn’t exist or has expired	Verify ID; query within 24 hours

Recommended client config:

Sync request timeout: 120 seconds baseline (standard tier); -fast can drop to 60 seconds
Async POST submission timeout: 30 seconds; GET polling interval 5–10 seconds, max wait 10 minutes
Exponential backoff retries on 5xx and failed tasks (recommend 2 retries)
Log the x-request-id response header for debugging

FAQ

Is VEO 3.1 official-relay or reverse-engineered? Is an official channel available?

Reverse-engineered. VEO 3.1 is delivered through APIYI’s transparent account pool to Google Flow — pricing is dramatically lower than Google’s official Veo Studio rates, billed per clip with failures not billed. No official-relay channel currently — once Google’s official Vertex AI Veo API becomes generally available, we’ll evaluate adding it and update this page accordingly.

VEO 3.1 vs Sora 2 — which should I choose?

Dimension	VEO 3.1	Sora 2 (Official)
Price	$0.15–$0.25 / 8 sec (per clip)	$0.40–$8.40 / 4–12 sec (per second)
Duration	Fixed 8 sec	4 / 8 / 12 sec
Generation time	30 sec – 2 min	3–10 min
Audio	✅ Native sync	✅ Native sync
Frame-to-Video	✅ `-fl` series	✅ `input_reference` single image
Stability	Reverse-engineered, subject to risk control	Official 99.99%
Resolution	720p (4K rolling out)	720p / 1024p / 1080p

Pick VEO for fast, cheap, batch use cases; pick Sora 2 Pro for highest quality and stability. See the Sora 2 Overview.

Why is the video duration fixed at 8 seconds? Can I extend it?

Google Flow upstream itself only exposes 8-second fixed duration — there’s currently no parameter to adjust length. For longer videos, chain Frame-to-Video clips: generate multiple 8-second segments with -fl models using each clip’s end frame as the next clip’s start frame, then stitch with ffmpeg.

How do I choose between standard and -fast?

Highest quality / hero assets → standard (veo-3.1 / veo-3.1-landscape), $0.25/clip
Volume / experimentation / internal preview → fast (-fast suffix), $0.15/clip, faster
Quality difference between fast and standard is small — fast tier is sufficient for most production use cases

How do Frame-to-Video (-fl) models work?

-fl series requires input_reference image upload:

1 image → start-frame mode: image becomes the video’s opening, AI generates subsequent frames
2 images → start + end mode: first image opens, second image closes, AI generates the transition

Must use multipart/form-data (not JSON). See Async API - Frame-to-Video.

Are failed generations billed?

No. VEO 3.1 bills by successful results: tasks that end in failed, content-policy rejections, gateway 5xx errors, and parameter errors are all not billed. Only videos that actually complete (with a returned URL) are billed.

How long are video URLs valid?

24 hours. Download to your own OSS / CDN immediately after generation completes to avoid losing access.

How do I read progress in sync streaming mode?

/v1/chat/completions + stream: true returns SSE format with progress text in each chunk:

data: {"choices":[{"delta":{"content":"> 🏃 Progress: 45.0%\n\n"}}]}
...
data: {"choices":[{"delta":{"content":"> ✅ Video 1 complete, [click here](https://.../xxx.mp4) to view~~~\n\n"}}]}
data: [DONE]

Frontend just needs to parse “progress” and the video URL out of delta.content. Full example in Sync API.

Which image formats are supported? Reference image size limits?

-fl models accept jpeg / png for input_reference, recommended size ≤ 5 MB per image. No strict resolution requirement (unlike Sora 2), but the image aspect ratio should match the target video orientation: portrait video → portrait image, landscape → landscape; otherwise the AI will auto-crop.

Can I use the official OpenAI SDK?

Yes. Sync mode is fully OpenAI Chat Completions-compatible:

from openai import OpenAI
client = OpenAI(api_key="sk-your-key", base_url="https://api.apiyi.com/v1")
resp = client.chat.completions.create(
    model="veo-3.1-fast",
    messages=[{"role": "user", "content": "A cat flying in the sky"}],
    stream=True,
    n=1
)
for chunk in resp:
    print(chunk.choices[0].delta.content or "", end="")

Async mode also works via client.videos.create(), but Frame-to-Video must use raw requests for multi-file upload (the OpenAI SDK only handles single-file uploads natively).

Can I run multiple tasks in parallel? What are the rate limits?

Yes. Each POST /v1/videos returns an independent video_id. Submit and poll in parallel. Default quota covers most business needs; for enterprise batch use cases (>10 concurrent, >100 clips per day), contact sales for a dedicated resource pool.

Can I cancel a running task?

No. There’s no cancel endpoint currently — once submitted, a task runs to completion. Validate prompts at -fast first to avoid wasting standard-tier runs.

Can I disable the audio track?

Not currently. VEO 3.1 outputs synchronized audio by default and Google does not expose a parameter to disable it. For audio-free output, strip with ffmpeg after download: ffmpeg -i input.mp4 -an output.mp4.

When does the 4K version launch? What's the price?

The 4K series is in gradual rollout, with model variants following the HD naming convention (covering portrait / landscape × fast / standard × Frame-to-Video). Final per-clip pricing will be reflected in the pricing table above once confirmed; enterprise customers with batch needs can contact sales for early access.

Sync API — /v1/chat/completions + stream: true live streaming, text-to-video + Frame-to-Video samples
Async API — /v1/videos three-step async flow, Frame-to-Video upload, full Python client example
Sora 2 Video Generation — OpenAI official-relay channel comparison
Top-Up Promotions — Bonus tiers and applicable channels
API Manual — General request, timeout, and retry guidance
Google official Veo introduction: deepmind.google/technologies/veo/

VEO 3.1 on APIYI is delivered through a Google Flow reverse-engineered channel for high-value-for-money video generation — leading speed and dramatically lower pricing than official. Two call modes (sync streaming, async task) accommodate different scenarios and integrate seamlessly with your existing OpenAI SDK / engineering code. Open a ticket from your console for any feedback.

Basics

Basic API

Image API

Video API

Multimodal Understanding API

Text API

Overview

Sync API

Async API

Why APIYI’s VEO 3.1?

Price Killer · Far Below Official Pricing

Unlimited Concurrency · Production Scale

Same Per-Clip Pricing + Top-Up Bonuses

Global Zero-Friction Access

OpenAI-Compatible · Dual-Mode Access

Professional Support · Enterprise Onboarding

Key Features

Native Synchronized Audio

Generation Speed Leader

Frame-to-Video Creative Mode

Portrait / Landscape Switching

Live Streaming Progress

Async Task Model

Pay on Success

Multi-Video Parallel (n parameter)

Pricing

HD Series (720p, Live)

4K Series (Rolling Out)

Technical Specs

API Endpoints

Key Parameters

Model Variant Naming Rules

`n` (Number of Videos per Sync Request)

Best Practices

Error Codes & Retries

FAQ

Basics

Basic API

Image API

Video API

Multimodal Understanding API

Text API

Documentation Index

​Overview

Sync API

Async API

​Why APIYI’s VEO 3.1?

Price Killer · Far Below Official Pricing

Unlimited Concurrency · Production Scale

Same Per-Clip Pricing + Top-Up Bonuses

Global Zero-Friction Access

OpenAI-Compatible · Dual-Mode Access

Professional Support · Enterprise Onboarding

​Key Features

Native Synchronized Audio

Generation Speed Leader

Frame-to-Video Creative Mode

Portrait / Landscape Switching

Live Streaming Progress

Async Task Model

Pay on Success

Multi-Video Parallel (n parameter)

​Pricing

​HD Series (720p, Live)

​4K Series (Rolling Out)

​Technical Specs

​API Endpoints

​Key Parameters

​Model Variant Naming Rules

​n (Number of Videos per Sync Request)

​Best Practices

​Error Codes & Retries

​FAQ

​Related Docs

Overview

Why APIYI’s VEO 3.1?

Key Features

Pricing

HD Series (720p, Live)

4K Series (Rolling Out)

Technical Specs

API Endpoints

Key Parameters

Model Variant Naming Rules

`n` (Number of Videos per Sync Request)

Best Practices

Error Codes & Retries

FAQ

Related Docs