Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Wan (Tongyi Wanxiang) is Alibaba Cloud’s video generation model series. APIYI connects directly to Alibaba Cloud Model Studio through a DashScope passthrough channel, so a single APIYI Key (starting with sk-) unlocks all Wan video capabilities with no separate Alibaba Cloud account required. The current flagship is Wan2.7, covering four core use cases:
Use caseModel IDYour inputOutput
Text-to-videowan2.7-t2vA text prompt5-15 second short video
Image-to-videowan2.7-i2vFirst frame + prompt (optional driving audio)Bring a static image to life; add audio for lip-sync / rap
Reference-to-videowan2.7-r2v1-5 reference images/videos + promptSingle- or multi-character video that preserves reference subjects, with voice reference
Video editwan2.7-videoeditA video + 1-5 reference images + edit instructionEdited video: outfit swap, background swap, etc.
🎬 Key highlight: all four capabilities share the same async endpoint and the same request structure. Switch use cases by changing only the model field. Native support for 720P / 1080P resolutions and 2-15 second integer durations; wan2.7-i2v also supports driving audio for lip-sync. Ideal for short-video production, e-commerce assets, digital-human narration, and creative marketing.

Text-to-Video API

wan2.7-t2v generates video from a pure text prompt, the simplest entry point.

Image-to-Video API

wan2.7-i2v takes a first frame + optional driving audio for lip-sync / rap.

Reference-to-Video API

wan2.7-r2v preserves subject features from reference images/videos, with voice reference.

Video Edit API

wan2.7-videoedit edits a video with reference images: outfit swap, background swap, etc.

Why use Wan on APIYI

One Key for every capability

No Alibaba Cloud signup, no region setup, no environment variables. A single APIYI Key calls all four Wan2.7 capabilities plus the HappyHorse series.

Direct access, no VPN

Connect straight to api.apiyi.com, reachable from mainland data centers and home networks alike, with no need to configure an Alibaba Cloud regional endpoint.

No charge on failure

Tasks that end in failed (unreachable media URL, sensitive prompt, upstream capacity, etc.) are not billed, so retry freely.

DashScope protocol passthrough

The request body maps one-to-one to Alibaba Cloud’s native DashScope protocol, so you can migrate by following the official docs; responses are normalized for easy polling.

Core features

Four-in-one async endpoint

t2v / i2v / r2v / video-edit share POST /wan/api/v1/...video-synthesis. Submit, get a task_id, poll, and download. Easy batch management.

Audio-driven lip-sync

wan2.7-i2v supports driving_audio, making a static portrait match the audio’s mouth movements and rhythm. Great for rap / narration / digital humans.

Multi-subject reference

wan2.7-r2v mixes reference images + reference videos (5 total max), referenced in the prompt as “image 1 / video 1”, with voice reference support.

Multiple resolutions and durations

720P / 1080P resolutions, 2-15 second integer durations. prompt_extend smart rewriting further improves quality for short prompts.

Supported models

Model IDCapabilityRequired media inputNotes
wan2.7-t2vText-to-videoNonePure text generation
wan2.7-i2vImage-to-videofirst_frame (+ optional driving_audio)The only capability supporting audio drive
wan2.7-r2vReference-to-videoreference_image / reference_video (5 total max)Supports reference_voice voice reference
wan2.7-videoeditVideo editvideo + reference_image (1-5)Edit model name has no hyphen
wan2.7-videoedit is for editing video using images. A separate wan2.7-image-pro is an image model (uses /v1/images/generations) and is outside this video endpoint’s scope, so do not mix them up. For the legacy Wan2.6 series, see Historical Versions.

⚠️ Endpoint choice (most important)

APIYI mounts two paths, but only the DashScope passthrough endpoint fully supports every Wan capability:
PathProtocol stylei2v / r2v availabilityVerdict
/v1/videosOpenAI flat style❌ Media fields are droppedDo not use
/wan/api/v1/services/aigc/video-generation/video-synthesisDashScope native passthrough✅ Fully supportedAlways use this
If any doc or example tells you to submit Wan video tasks via /v1/videos, ignore it. That path’s adaptation for i2v / r2v media fields is incomplete and causes the upstream error [InvalidParameter] Field required: input.media. All Wan video creation requests go to /wan/api/v1/...video-synthesis.

Async call flow

The whole flow is three async steps: create task → poll status → download video.
1

Create the task

POST /wan/api/v1/services/aigc/video-generation/video-synthesis with the header X-DashScope-Async: enable. It returns a task_id immediately.
2

Poll the status

GET /v1/tasks/{task_id} (with Authorization), once every 5-10 seconds (never less than 3 seconds), until status becomes completed.
3

Download the video

GET the mp4 directly from the response’s result_url. Do not send the Authorization header (it is an OSS signed direct link; adding Auth causes a 403).

Task status reference

The top-level status field of the GET /v1/tasks/{task_id} response (already normalized by APIYI):
StatusMeaningNext step
submittedSubmitted, queuedKeep polling
in_progressGeneratingKeep polling (progress often stalls at 30%; that is the upstream’s coarse reporting, not a hang)
completedSuccessDownload from result_url
failedFailedCheck error.message / fail_reason

Full Python client

import json, time, urllib.request

BASE = "https://api.apiyi.com"
KEY  = "sk-your-api-key"   # your APIYI Key

def post(path, body):
    h = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json",
         "X-DashScope-Async": "enable"}
    req = urllib.request.Request(BASE + path, data=json.dumps(body).encode(), headers=h, method="POST")
    return json.loads(urllib.request.urlopen(req).read())

def get(path):
    req = urllib.request.Request(BASE + path, headers={"Authorization": f"Bearer {KEY}"})
    return json.loads(urllib.request.urlopen(req).read())

# 1. Create the task (switch use cases by changing only model and media)
r = post("/wan/api/v1/services/aigc/video-generation/video-synthesis", {
    "model": "wan2.7-t2v",
    "input": {"prompt": "A lighthouse on the seashore at dusk, the camera slowly pushing in, waves gently lapping the rocks, seabirds calling"},
    "parameters": {"resolution": "720P", "duration": 5, "prompt_extend": True, "watermark": True}
})
task_id = r["output"]["task_id"]
print("task_id:", task_id)

# 2. Poll (every 5-10 seconds)
while True:
    info = get(f"/v1/tasks/{task_id}")
    status = info["status"]
    print("status:", status, "progress:", info.get("progress"))
    if status == "completed":
        url = info["result_url"]
        break
    if status == "failed":
        raise RuntimeError(info.get("error") or info.get("fail_reason"))
    time.sleep(10)

# 3. Download (do not send Authorization! result_url is an OSS signed direct link)
urllib.request.urlretrieve(url, "out.mp4")
print("saved out.mp4")

Key parameters explained

When submitting, the body uses DashScope’s nested structure: { model, input: { prompt, media[] }, parameters: {...} }.

input fields

FieldTypeRequiredNotes
promptstringNatural-language description; wan2.7-r2v supports “image 1 / video 1” markers to reference media
negative_promptstringNegative prompt, ≤500 characters
mediaarrayRequired for i2v/r2v/editMedia asset array, see below

media[] types

typePurposeApplicable models
first_frameFirst frame image (≤1)i2v, r2v
reference_imageReference image (preserve subject/scene)r2v, videoedit
reference_videoReference video (subject/voice reference)r2v
driving_audioDriving audio (lip-sync)i2v only
videoInput videovideoedit
reference_voiceVoice reference (attached to reference_image/video)r2v
Each media object needs at least type + url. The url must be a public https link that can be fetched directly with GET (upload local files to OSS / CDN first).

parameters fields

FieldTypeValuesNotes
resolutionstring720P / 1080PUppercase; specifying it explicitly is recommended
ratiostring16:9 / 9:16 / 1:1 / 4:3 / 3:4Aspect ratio; ignored automatically when a first frame is supplied
durationint2-15Seconds (integer), commonly 5 / 10; capped at 10 when a reference video is included
prompt_extendbooltrue / falseSmart prompt rewriting, strongly recommend true
watermarkbooltrue / false”AI generated” watermark in the bottom-right corner
seedint0-2147483647Fixing it improves reproducibility
duration must be an integer 5, not the string "5", or you get cannot unmarshal string into Go struct field ... of type int. Writing resolution in uppercase (720P) is more reliable.

Choosing between Wan and HappyHorse

Wan and HappyHorse are both Alibaba video models and share the same endpoint and schema (swap them by changing only the model name), but their strengths differ:
DimensionWan2.7HappyHorse-1.0
Audio-driven lip-sync (i2v)wan2.7-i2v supports driving_audio❌ Not supported, i2v takes a first frame only
Reference-to-video image capreference image + reference video, 5 total maxup to 9 reference images
Video-edit reference images≤5≤5
Subject-consistency styleMulti-subject interaction, voice referenceLeans toward “faithful reproduction of dynamic footage”, keeps subjects stable
Need lip-sync / rap / digital-human narration → choose wan2.7-i2v (the only one with audio drive). Need many reference images to keep a subject consistent → consider HappyHorse r2v (up to 9 images).

Best practices

1

Iterate first at 720P / 5 seconds

During development, validate prompts and camera direction quickly with low-resolution short clips, then scale up to 720P / 1080P and longer durations once finalized, to cut cost and wait time.
2

Always enable prompt_extend

prompt_extend: true clearly improves quality for short prompts, at the cost of only a few extra seconds of generation time.
3

Poll every 5-10 seconds

Never less than 3 seconds (you will be rate-limited), and do not block indefinitely on long tasks. 720P / 5 seconds typically takes 70-140 seconds; 1080P / longer clips may exceed 5 minutes.
4

Set a 20-minute client timeout as a backstop

1080P or clips over 10 seconds are noticeably slower; give your polling loop a 20-minute backstop timeout.
5

Download as soon as you get result_url

result_url expires in 24 hours by default and is an OSS signed direct link, so do not send the Authorization header when downloading. In production, always re-store it to your own OSS / CDN.
6

Make submissions idempotent

Failed tasks are not billed, but resubmitting the same task bills again. Maintain a “business ID → task_id” mapping in your app layer to avoid accidental charges.

Error codes and retries

Errors come from two stages and are handled differently:
SourceSignatureHandling
Creation stage (rejected by APIYI)HTTP 4xx/5xx, type is task_error / parse_request_failed / build_request_failedFix the body and retry (usually a wrong field type, missing media, or wrong endpoint)
Execution stage (rejected by upstream Alibaba Cloud)Task ends as status=failed, error.message prefixed with [InvalidParameter] / [InvalidImageUrl] etc. in bracketsRead the bracketed hint; usually an unreachable media URL or a sensitive prompt
Recommended client behavior: exponential backoff retry on HTTP 5xx / network errors (1s / 4s / 16s); surface HTTP 4xx immediately without retry; a failed task with [InvalidImageUrl] can be retried (possibly a transient network issue), while [InvalidParameter] / sensitive words should not be retried.

FAQ

/v1/videos is an OpenAI flat-style endpoint with incomplete support for Wan’s i2v / r2v: media fields like media get dropped, and upstream Alibaba Cloud returns [InvalidParameter] Field required: input.media. All Wan video creation requests go to /wan/api/v1/services/aigc/video-generation/video-synthesis, and queries always go to /v1/tasks/{task_id}.
It tells the endpoint “this is an async task, return a task_id immediately and do not block.” It is required on every creation request; omitting it returns current user api does not support synchronous calls. The query call (GET) does not need this header.
APIYI normalizes all video task queries to /v1/tasks/{task_id}. No matter which path you used to create the task, you query it through this one endpoint, and the response’s top-level status / progress / result_url / error fields are consistent.
Drop the Authorization header. result_url is already an Alibaba Cloud OSS pre-signed direct link; adding an APIYI Key makes OSS reject it:
curl -L -o out.mp4 "$RESULT_URL"          # ✅ correct
curl -L -H "Authorization: Bearer $KEY" -o out.mp4 "$RESULT_URL"   # ❌ wrong
The link is valid for 24 hours by default. After it expires, re-GET /v1/tasks/{task_id} and you usually get a fresh result_url, but the task_id’s own query validity is also 24 hours (returns UNKNOWN after that). For long-term storage, download to your own storage as soon as possible.
No. The progress reported by upstream Alibaba Cloud is coarse-grained (only 0% / 10% / 30% / 100% buckets). As long as status is still in_progress, keep waiting; it usually jumps straight from 30% to 100%.
In practice you can submit 4-8 tasks at once without hitting rate limits. In production, keep simultaneously active tasks ≤10; anything beyond that queues. The query API has a fairly high default RPS, but a 5-10 second polling interval is still recommended.
status=failed is not billed. But note: resubmitting the same task bills again, so make it idempotent. During testing you can turn off prompt_extend and use 720P / 5 seconds / short prompts to lower the unit cost.
Yes. The Wan2.6 series (including wan2.6-r2v-flash) is still on the callable list, with the same protocol as Wan2.7; just change the model name. See Historical Versions.

Group Setup

The Wan and HappyHorse series share a single Wan group — one Token can call both series (the Token in the screenshot is named Wan2.7&HappyHorse). Video models are billed per second, so the Token must meet two conditions to route successfully:
  1. Billing model: choose Pay-as-you-go Priority or Pay-as-you-go — video is billed per second, so Pay-per-request Tokens cannot route
  2. Group: select a group that includes Wan
Create Token dialog: billing model set to Pay-as-you-go Priority, group dropdown showing Wan (rate 0.14x), one Token usable for both Wan2.7 and HappyHorse

Pricing

Default price = 98% of Alibaba’s official price (simple to reason about)

In the console the Wan group shows a rate of 0.14x, which is denominated in the built-in RMB pricing unit. Because APIYI bills in USD at a fixed 1:7 exchange rate, the effective conversion is:
0.14 (RMB pricing unit) × 7 (fixed exchange rate) = 0.98
In other words, the default price = 98% of Alibaba’s official price — cheaper than buying direct from Alibaba, with no overseas link to build yourself.
Conversion: USD price per second = official RMB price × 0.14 (i.e. × 0.98 ÷ 7). For example, the official 1080P price of ¥1.0/s → $0.14/s, exactly the 0.14x shown in the console.

Price detail (default price, billed per second)

Wan2.7 text-to-video / image-to-video / reference-to-video are priced the same, with two tiers — 720P / 1080P (480P is not supported):
ResolutionOfficial priceOur default /s5 s10 s12 s
720P¥0.6/s$0.084/s$0.42$0.84$1.01
1080P¥1.0/s$0.14/s$0.70$1.40$1.68
  • wan2.7-r2v defaults to 1080P, and duration is capped at 10 seconds when reference media includes a video.
  • wan2.7-videoedit (video edit) output duration follows the source video and is billed by actual output seconds, not by duration.
  • Prices shown are the default (98% of official); with the maximum top-up bonus, the effective price is roughly the table value ÷ 1.2 (e.g. 1080P 5 s $0.70 → about $0.58).

Stack top-up bonuses for an even lower effective price

After joining the top-up bonus program, credited balance can be boosted up to ~1.2x, pushing the effective price lower still:
0.98 ÷ 1.2 ≈ 0.816
So large customers can reach as low as ~81.6% of the official price.
TierEffective price (vs Alibaba official)Formula
Default98%rate 0.14x × fixed exchange rate 7
With top-up bonuses (max tier for large customers)~81.6%0.98 ÷ 1.2
  • Billing dimension = resolution tier × duration (seconds); failed tasks are not billed.
  • 1:7 is a fixed settlement exchange rate (not a preferential rate); it applies uniformly to all USD top-ups.
  • For the highest bonus tiers and eligible channels, see top-up bonuses. The latest rate is authoritative in the console.

Text-to-Video Playground

wan2.7-t2v live debugging + code samples

Image-to-Video Playground

wan2.7-i2v first frame + driving audio

Reference-to-Video Playground

wan2.7-r2v multi-subject reference + voice

Video Edit Playground

wan2.7-videoedit outfit / background swap

Historical Versions (Wan2.6)

Wan2.6 series and migration notes

HappyHorse Series

Also Alibaba-based, side-by-side selection guide
Alibaba Cloud official docs (reference): help.aliyun.com/zh/model-studio/text-to-video-api-reference. For questions or suggestions, please open a ticket in the APIYI console.