Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

VEO 3.1 is Google’s flagship AI video generation series, producing video with synchronized audio natively — fixed 8-second clips from text prompts or reference images. APIYI exposes VEO 3.1 through a reverse-engineered channel that proxies Google Flow, billed per-clip with both synchronous streaming and async task modes.
🎬 Highlights: Native synchronized audio + video output, fixed 8-second clips, Frame-to-Video creative mode, HD portrait/landscape, dramatically lower pricing than Google official (from $0.15), and live progress streaming. Best for short-form video, ad clips, product demos, and social-media assets in high-throughput production scenarios.

Sync API

POST /v1/chat/completions, reuses the OpenAI Chat Completions protocol with stream: true for live progress.

Async API

POST /v1/videos three-step async flow, supports text-to-video and Frame-to-Video uploads — built for batch management.

Why APIYI’s VEO 3.1?

VEO 3.1 is delivered through a reverse-engineered channel (transparent proxy to Google Flow), optimized for production scenarios across price, integration friction, and feature completeness:

Price Killer · Far Below Official Pricing

Starts at $0.15 per 8-second clip — over 80% cheaper than Google’s official pricing. No need to provision Google Cloud / Vertex AI accounts; per-clip billing is fully transparent.

Unlimited Concurrency · Production Scale

APIYI maintains a transparent account pool — linearly scale batch shoots, short-form video matrices, and ad pipelines. No Google account tier ceilings.

Same Per-Clip Pricing + Top-Up Bonuses

Stack top-up bonuses for further savings. Failed generations are not billed — settlement is by successful results only.

Global Zero-Friction Access

No overseas server or proxy required — connect to api.apiyi.com directly from Mainland China data centers, residential networks, or overseas nodes. Skip the Google Flow cross-border setup entirely.

OpenAI-Compatible · Dual-Mode Access

Sync uses /v1/chat/completions (same as chat models); async uses /v1/videos (OpenAI Video API style). Both protocols drop into your existing SDK / engineering code with zero changes.

Professional Support · Enterprise Onboarding

Our team has deep video-generation expertise: prompt engineering, Frame-to-Video reference prep, batch production, and post-processing. Full PoC-to-production technical support for enterprise customers.

Key Features

Native Synchronized Audio

VEO 3.1 outputs video with synchronized native audio (ambient sound, dialogue, score) generated alongside the visuals — no separate audio post-production needed.

Generation Speed Leader

-fast series in 30–60 seconds, standard series in 1–2 minutes — 50% faster than Sora 2, ideal for high-throughput content production.

Frame-to-Video Creative Mode

-fl suffix models accept 1 reference image (start frame) or 2 (start + end frames) to animate static visuals or generate seamless transitions between two frames.

Portrait / Landscape Switching

Portrait 720×1280 (social-media short-form) and landscape 1280×720 (ads, demos) — toggled via the -landscape model suffix.

Live Streaming Progress

Sync mode (/v1/chat/completions + stream: true) returns real-time > 🏃 Progress: XX% text fragments — your frontend can render a progress bar directly.

Async Task Model

Async mode returns a video_id for independent polling and download — ideal for batch management, resume-on-failure, and long-running background jobs.

Pay on Success

Failed generations / content-policy rejections / capacity errors are not billed — you only pay for the videos you actually receive.

Multi-Video Parallel (n parameter)

Sync mode n parameter generates up to 4 different videos per request (same prompt, multiple results) for variety selection.

Pricing

Billed per clip (each clip is a fixed 8-second video). Only successfully generated videos are billed — failed tasks are free.

HD Series (720p, Live)

ModelDescriptionResolutionPrice
veo-3.1Default portrait720×1280$0.25
veo-3.1-flPortrait + Frame-to-Video720×1280$0.25
veo-3.1-fastPortrait + fast720×1280$0.15
veo-3.1-fast-flPortrait + fast + Frame-to-Video720×1280$0.15
veo-3.1-landscapeLandscape1280×720$0.25
veo-3.1-landscape-flLandscape + Frame-to-Video1280×720$0.25
veo-3.1-landscape-fastLandscape + fast1280×720$0.15
veo-3.1-landscape-fast-flLandscape + fast + Frame-to-Video1280×720$0.15

4K Series (Rolling Out)

4K HD variants are rolling out. Model variants will cover the same matrix (portrait / landscape × standard / fast × text-to-video / Frame-to-Video), with naming following the HD series convention. Per-clip pricing will be added to this table once finalized; enterprise customers with batch needs can contact sales for early access.
Billing notes:
  • Per-clip billing: Each 8-second video is a fixed unit price, independent of prompt length, reference images, or n (n=2 means billed for 2 clips)
  • Failures are free: Tasks ending in failed / content-policy rejection / gateway errors are not billed — retry safely
  • Top-up bonuses: See Top-Up Promotions

Technical Specs

DimensionSpec
Base model nameveo-3.1 (HD) / 4K series TBD
Variant axesOrientation (portrait/landscape) × Speed (standard/fast) × Mode (text-only / Frame-to-Video -fl)
Video durationFixed 8 seconds (not adjustable)
HD resolutionsPortrait 720×1280, landscape 1280×720
4K resolutionsRolling out, specs TBD
Audio track✅ Synchronized native audio
Frame-to-Video (-fl)✅ Models with -fl suffix; 1 image (start frame) or 2 images (start + end)
Sync generation time-fast series 30–60 sec, standard series 1–2 min
Sync progress streaming/v1/chat/completions + stream: true
Async polling/v1/videos + task ID + /content download
n parameterSync mode max 4 per request (async mode recommended at 1)
Video URL TTL24 hours

API Endpoints

EndpointMethodPurposeContent-Type
/v1/chat/completionsPOSTSync streaming generation (recommended for real-time UX)application/json
/v1/videosPOSTAsync task: submit text-to-video or Frame-to-Videoapplication/json or multipart/form-data
/v1/videos/{video_id}GETAsync poll task status
/v1/videos/{video_id}/contentGETAsync download video URL
Domain options: api.apiyi.com is the primary endpoint. vip.apiyi.com / b.apiyi.com are equivalent backup gateways with identical behavior.

Key Parameters

Model Variant Naming Rules

VEO 3.1 toggles capabilities via model name suffixes — not separate parameters:
SuffixEffectDefault (no suffix)
-landscapeLandscape (1280×720)Portrait (720×1280)
-fastFast tier (speed-first, lower price)Standard tier
-flFrame-to-Video (requires uploaded image)Pure text-to-video
Combination examples:
  • veo-3.1 — Standard portrait text-to-video (default)
  • veo-3.1-landscape-fast — Fast landscape text-to-video (best value)
  • veo-3.1-landscape-fl — Standard landscape Frame-to-Video
  • veo-3.1-landscape-fast-fl — Fast landscape Frame-to-Video (cheapest image-to-video)
  • -fl models require input_reference image upload, otherwise you get an error; pure text-to-video must not use the -fl suffix
  • Async Frame-to-Video requests must use multipart/form-data (not JSON); upload 1 image for start frame, 2 for start + end
  • Combining 4 axes yields 8 HD model IDs total — suffix order is fixed: landscapefastfl

n (Number of Videos per Sync Request)

  • Range: 1 to 4, default 1
  • Only the sync mode (/v1/chat/completions) supports n; async mode ignores it
  • Billed per video (n=2 means billed for 2 clips)

Best Practices

1

Validate prompts with -fast first

Run each new prompt at veo-3.1-fast or veo-3.1-landscape-fast first ($0.15, 30–60 seconds), then switch to standard tier for the final asset.
2

Pick orientation by use case

  • Social-media short-form (TikTok, Reels) → portrait (no -landscape)
  • YouTube / ads / product demos → landscape (-landscape)
3

Sync vs async by need

  • Need live progress feedback to users → sync streaming (/v1/chat/completions + stream: true)
  • Background batch processing or long tasks → async task model (/v1/videos + polling)
  • Details: Sync API / Async API
4

Frame-to-Video prompts focus on "motion"

-fl models already define visuals (start frame or start+end frames). The prompt should focus on how the image animates: camera motion, object motion, lighting changes, character expressions. Example: "Camera slowly pushes in, leaves gently swaying, sunlight flickering through branches".
5

Frame-to-Video shines for "transitions"

The strongest Frame-to-Video use case is smooth transitions between two frames (day → night, season changes, expression shifts, object morphing). Describe the transition process and motion changes — no need to detail visuals.
6

Client timeout ≥ 2 minutes

Sync streaming holds the connection until generation completes (-fast ≈ 60 sec, standard ≈ 2 min) — set client timeout to 120 seconds minimum. Async POST submission is sub-second, but use 30 seconds as a baseline.
7

Download videos immediately

Video URLs expire in 24 hours. Production flows must download to your own OSS / CDN as soon as completed to avoid expired links.
8

Run multiple tasks via n or parallel POSTs

  • Same prompt, multiple variants → use n: 4 for 4 results in one call
  • Different prompts in batch → submit multiple async POSTs, each with an independent video_id, then poll independently

Error Codes & Retries

StatusMeaningRecommended Action
400Invalid parameters (model name doesn’t exist, -fl missing image, n out of range)Validate parameters; Frame-to-Video must use multipart upload
401 / invalid_api_keyInvalid API KeyCheck Bearer Token; verify console group setting
403Content-policy rejectionAdjust prompt; ensure reference images are non-sensitive
429 / quota_exceededRate limit / quota exceeded / insufficient balanceExponential backoff; contact sales for higher quota
5xxGateway / upstream errorRetry async tasks 1–2 times (no charge)
Task failedGeneration failed (mostly content policy or upstream capacity)Adjust prompt and retry; failed task is not billed
video_not_foundvideo_id doesn’t exist or has expiredVerify ID; query within 24 hours
Recommended client config:
  • Sync request timeout: 120 seconds baseline (standard tier); -fast can drop to 60 seconds
  • Async POST submission timeout: 30 seconds; GET polling interval 5–10 seconds, max wait 10 minutes
  • Exponential backoff retries on 5xx and failed tasks (recommend 2 retries)
  • Log the x-request-id response header for debugging

FAQ

Reverse-engineered. VEO 3.1 is delivered through APIYI’s transparent account pool to Google Flow — pricing is dramatically lower than Google’s official Veo Studio rates, billed per clip with failures not billed. No official-relay channel currently — once Google’s official Vertex AI Veo API becomes generally available, we’ll evaluate adding it and update this page accordingly.
DimensionVEO 3.1Sora 2 (Official)
Price$0.15–$0.25 / 8 sec (per clip)$0.40–$8.40 / 4–12 sec (per second)
DurationFixed 8 sec4 / 8 / 12 sec
Generation time30 sec – 2 min3–10 min
Audio✅ Native sync✅ Native sync
Frame-to-Video-fl seriesinput_reference single image
StabilityReverse-engineered, subject to risk controlOfficial 99.99%
Resolution720p (4K rolling out)720p / 1024p / 1080p
Pick VEO for fast, cheap, batch use cases; pick Sora 2 Pro for highest quality and stability. See the Sora 2 Overview.
Google Flow upstream itself only exposes 8-second fixed duration — there’s currently no parameter to adjust length. For longer videos, chain Frame-to-Video clips: generate multiple 8-second segments with -fl models using each clip’s end frame as the next clip’s start frame, then stitch with ffmpeg.
  • Highest quality / hero assets → standard (veo-3.1 / veo-3.1-landscape), $0.25/clip
  • Volume / experimentation / internal preview → fast (-fast suffix), $0.15/clip, faster
  • Quality difference between fast and standard is small — fast tier is sufficient for most production use cases
-fl series requires input_reference image upload:
  • 1 image → start-frame mode: image becomes the video’s opening, AI generates subsequent frames
  • 2 images → start + end mode: first image opens, second image closes, AI generates the transition
Must use multipart/form-data (not JSON). See Async API - Frame-to-Video.
No. VEO 3.1 bills by successful results: tasks that end in failed, content-policy rejections, gateway 5xx errors, and parameter errors are all not billed. Only videos that actually complete (with a returned URL) are billed.
24 hours. Download to your own OSS / CDN immediately after generation completes to avoid losing access.
/v1/chat/completions + stream: true returns SSE format with progress text in each chunk:
data: {"choices":[{"delta":{"content":"> 🏃 Progress: 45.0%\n\n"}}]}
...
data: {"choices":[{"delta":{"content":"> ✅ Video 1 complete, [click here](https://.../xxx.mp4) to view~~~\n\n"}}]}
data: [DONE]
Frontend just needs to parse “progress” and the video URL out of delta.content. Full example in Sync API.
-fl models accept jpeg / png for input_reference, recommended size ≤ 5 MB per image. No strict resolution requirement (unlike Sora 2), but the image aspect ratio should match the target video orientation: portrait video → portrait image, landscape → landscape; otherwise the AI will auto-crop.
Yes. Sync mode is fully OpenAI Chat Completions-compatible:
from openai import OpenAI
client = OpenAI(api_key="sk-your-key", base_url="https://api.apiyi.com/v1")
resp = client.chat.completions.create(
    model="veo-3.1-fast",
    messages=[{"role": "user", "content": "A cat flying in the sky"}],
    stream=True,
    n=1
)
for chunk in resp:
    print(chunk.choices[0].delta.content or "", end="")
Async mode also works via client.videos.create(), but Frame-to-Video must use raw requests for multi-file upload (the OpenAI SDK only handles single-file uploads natively).
Yes. Each POST /v1/videos returns an independent video_id. Submit and poll in parallel. Default quota covers most business needs; for enterprise batch use cases (>10 concurrent, >100 clips per day), contact sales for a dedicated resource pool.
No. There’s no cancel endpoint currently — once submitted, a task runs to completion. Validate prompts at -fast first to avoid wasting standard-tier runs.
Not currently. VEO 3.1 outputs synchronized audio by default and Google does not expose a parameter to disable it. For audio-free output, strip with ffmpeg after download: ffmpeg -i input.mp4 -an output.mp4.
The 4K series is in gradual rollout, with model variants following the HD naming convention (covering portrait / landscape × fast / standard × Frame-to-Video). Final per-clip pricing will be reflected in the pricing table above once confirmed; enterprise customers with batch needs can contact sales for early access.
  • Sync API/v1/chat/completions + stream: true live streaming, text-to-video + Frame-to-Video samples
  • Async API/v1/videos three-step async flow, Frame-to-Video upload, full Python client example
  • Sora 2 Video Generation — OpenAI official-relay channel comparison
  • Top-Up Promotions — Bonus tiers and applicable channels
  • API Manual — General request, timeout, and retry guidance
  • Google official Veo introduction: deepmind.google/technologies/veo/
VEO 3.1 on APIYI is delivered through a Google Flow reverse-engineered channel for high-value-for-money video generation — leading speed and dramatically lower pricing than official. Two call modes (sync streaming, async task) accommodate different scenarios and integrate seamlessly with your existing OpenAI SDK / engineering code. Open a ticket from your console for any feedback.