Wan Video Generation (Alibaba Cloud Tongyi Wanxiang)

Overview

Wan (Tongyi Wanxiang) is Alibaba Cloud’s video generation model series. APIYI connects directly to Alibaba Cloud Model Studio through a DashScope passthrough channel, so a single APIYI Key (starting with sk-) unlocks all Wan video capabilities with no separate Alibaba Cloud account required. The current flagship is Wan2.7, covering four core use cases:

Use case	Model ID	Your input	Output
Text-to-video	`wan2.7-t2v`	A text prompt	5-15 second short video
Image-to-video	`wan2.7-i2v`	First frame + prompt (optional driving audio)	Bring a static image to life; add audio for lip-sync / rap
Reference-to-video	`wan2.7-r2v`	1-5 reference images/videos + prompt	Single- or multi-character video that preserves reference subjects, with voice reference
Video edit	`wan2.7-videoedit`	A video + 1-5 reference images + edit instruction	Edited video: outfit swap, background swap, etc.

🎬 Key highlight: all four capabilities share the same async endpoint and the same request structure. Switch use cases by changing only the model field. Native support for 720P / 1080P resolutions and 2-15 second integer durations; wan2.7-i2v also supports driving audio for lip-sync. Ideal for short-video production, e-commerce assets, digital-human narration, and creative marketing.

Text-to-Video API

wan2.7-t2v generates video from a pure text prompt, the simplest entry point.

Image-to-Video API

wan2.7-i2v takes a first frame + optional driving audio for lip-sync / rap.

Reference-to-Video API

wan2.7-r2v preserves subject features from reference images/videos, with voice reference.

Video Edit API

wan2.7-videoedit edits a video with reference images: outfit swap, background swap, etc.

Visual API Testing

Debug this endpoint directly in the iCover visual testing tool — no code required.

Async Task Lookup / Download

View submitted video tasks and download video links in the APIYI console — a lookup entry outside the API.

Why use Wan on APIYI

One Key for every capability

No Alibaba Cloud signup, no region setup, no environment variables. A single APIYI Key calls all four Wan2.7 capabilities plus the HappyHorse series.

Direct access, no VPN

Connect straight to api.apiyi.com, reachable from mainland data centers and home networks alike, with no need to configure an Alibaba Cloud regional endpoint.

No charge on failure

Tasks that end in failed (unreachable media URL, sensitive prompt, upstream capacity, etc.) are not billed, so retry freely.

DashScope protocol passthrough

The request body maps one-to-one to Alibaba Cloud’s native DashScope protocol, so you can migrate by following the official docs; responses are normalized for easy polling.

Core features

Four-in-one async endpoint

t2v / i2v / r2v / video-edit share POST /wan/api/v1/...video-synthesis. Submit, get a task_id, poll, and download. Easy batch management.

Audio-driven lip-sync

wan2.7-i2v supports driving_audio, making a static portrait match the audio’s mouth movements and rhythm. Great for rap / narration / digital humans.

Multi-subject reference

wan2.7-r2v mixes reference images + reference videos (5 total max), referenced in the prompt as “image 1 / video 1”, with voice reference support.

Multiple resolutions and durations

720P / 1080P resolutions, 2-15 second integer durations. prompt_extend smart rewriting further improves quality for short prompts.

Supported models

Model ID	Capability	Required media input	Notes
`wan2.7-t2v`	Text-to-video	None	Pure text generation
`wan2.7-i2v`	Image-to-video	`first_frame` (+ optional `driving_audio`)	The only capability supporting audio drive
`wan2.7-r2v`	Reference-to-video	`reference_image` / `reference_video` (5 total max)	Supports `reference_voice` voice reference
`wan2.7-videoedit`	Video edit	`video` + `reference_image` (1-5)	Edit model name has no hyphen

wan2.7-videoedit is for editing video using images. A separate wan2.7-image-pro is an image model (uses /v1/images/generations) and is outside this video endpoint’s scope, so do not mix them up. For the legacy Wan2.6 series, see Historical Versions.

⚠️ Endpoint choice (most important)

APIYI mounts two paths, but only the DashScope passthrough endpoint fully supports every Wan capability:

Path	Protocol style	i2v / r2v availability	Verdict
`/v1/videos`	OpenAI flat style	❌ Media fields are dropped	Do not use
`/wan/api/v1/services/aigc/video-generation/video-synthesis`	DashScope native passthrough	✅ Fully supported	Always use this

If any doc or example tells you to submit Wan video tasks via /v1/videos, ignore it. That path’s adaptation for i2v / r2v media fields is incomplete and causes the upstream error [InvalidParameter] Field required: input.media. All Wan video creation requests go to /wan/api/v1/...video-synthesis.

Async call flow

The whole flow is three async steps: create task → poll status → download video.

Create the task

POST /wan/api/v1/services/aigc/video-generation/video-synthesis with the header X-DashScope-Async: enable. It returns a task_id immediately.

Poll the status

GET /v1/tasks/{task_id} (with Authorization), once every 5-10 seconds (never less than 3 seconds), until status becomes completed.

Download the video

GET the mp4 directly from the response’s result_url. Do not send the Authorization header (it is an OSS signed direct link; adding Auth causes a 403).

Task status reference

The top-level status field of the GET /v1/tasks/{task_id} response (already normalized by APIYI):

Status	Meaning	Next step
`submitted`	Submitted, queued	Keep polling
`in_progress`	Generating	Keep polling (progress often stalls at 30%; that is the upstream’s coarse reporting, not a hang)
`completed`	Success	Download from `result_url`
`failed`	Failed	Check `error.message` / `fail_reason`

Full Python client

import json, time, urllib.request

BASE = "https://api.apiyi.com"
KEY  = "sk-your-api-key"   # your APIYI Key

def post(path, body):
    h = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json",
         "X-DashScope-Async": "enable"}
    req = urllib.request.Request(BASE + path, data=json.dumps(body).encode(), headers=h, method="POST")
    return json.loads(urllib.request.urlopen(req).read())

def get(path):
    req = urllib.request.Request(BASE + path, headers={"Authorization": f"Bearer {KEY}"})
    return json.loads(urllib.request.urlopen(req).read())

# 1. Create the task (switch use cases by changing only model and media)
r = post("/wan/api/v1/services/aigc/video-generation/video-synthesis", {
    "model": "wan2.7-t2v",
    "input": {"prompt": "A lighthouse on the seashore at dusk, the camera slowly pushing in, waves gently lapping the rocks, seabirds calling"},
    "parameters": {"resolution": "720P", "duration": 5, "prompt_extend": True, "watermark": True}
})
task_id = r["output"]["task_id"]
print("task_id:", task_id)

# 2. Poll (every 5-10 seconds)
while True:
    info = get(f"/v1/tasks/{task_id}")
    status = info["status"]
    print("status:", status, "progress:", info.get("progress"))
    if status == "completed":
        url = info["result_url"]
        break
    if status == "failed":
        raise RuntimeError(info.get("error") or info.get("fail_reason"))
    time.sleep(10)

# 3. Download (do not send Authorization! result_url is an OSS signed direct link)
urllib.request.urlretrieve(url, "out.mp4")
print("saved out.mp4")

Key parameters explained

When submitting, the body uses DashScope’s nested structure: { model, input: { prompt, media[] }, parameters: {...} }.

`input` fields

Field	Type	Required	Notes
`prompt`	string	✓	Natural-language description; wan2.7-r2v supports “image 1 / video 1” markers to reference media
`negative_prompt`	string		Negative prompt, ≤500 characters
`media`	array	Required for i2v/r2v/edit	Media asset array, see below

`media[]` types

`type`	Purpose	Applicable models
`first_frame`	First frame image (≤1)	i2v, r2v
`reference_image`	Reference image (preserve subject/scene)	r2v, videoedit
`reference_video`	Reference video (subject/voice reference)	r2v
`driving_audio`	Driving audio (lip-sync)	i2v only
`video`	Input video	videoedit
`reference_voice`	Voice reference (attached to reference_image/video)	r2v

Each media object needs at least type + url. The url must be a public https link that can be fetched directly with GET (upload local files to OSS / CDN first).

`parameters` fields

Field	Type	Values	Notes
`resolution`	string	`720P` / `1080P`	Uppercase; specifying it explicitly is recommended
`ratio`	string	`16:9` / `9:16` / `1:1` / `4:3` / `3:4`	Aspect ratio; ignored automatically when a first frame is supplied
`duration`	int	2-15	Seconds (integer), commonly 5 / 10; capped at 10 when a reference video is included
`prompt_extend`	bool	`true` / `false`	Smart prompt rewriting, strongly recommend `true`
`watermark`	bool	`true` / `false`	”AI generated” watermark in the bottom-right corner
`seed`	int	0-2147483647	Fixing it improves reproducibility

duration must be an integer 5, not the string "5", or you get cannot unmarshal string into Go struct field ... of type int. Writing resolution in uppercase (720P) is more reliable.

Choosing between Wan and HappyHorse

Wan and HappyHorse are both Alibaba video models and share the same endpoint and schema (swap them by changing only the model name), but their strengths differ:

Dimension	Wan2.7	HappyHorse-1.0
Audio-driven lip-sync (i2v)	✅ `wan2.7-i2v` supports `driving_audio`	❌ Not supported, i2v takes a first frame only
Reference-to-video image cap	reference image + reference video, 5 total max	up to 9 reference images
Video-edit reference images	≤5	≤5
Subject-consistency style	Multi-subject interaction, voice reference	Leans toward “faithful reproduction of dynamic footage”, keeps subjects stable

Need lip-sync / rap / digital-human narration → choose wan2.7-i2v (the only one with audio drive). Need many reference images to keep a subject consistent → consider HappyHorse r2v (up to 9 images).

Best practices

Iterate first at 720P / 5 seconds

During development, validate prompts and camera direction quickly with low-resolution short clips, then scale up to 720P / 1080P and longer durations once finalized, to cut cost and wait time.

Always enable prompt_extend

prompt_extend: true clearly improves quality for short prompts, at the cost of only a few extra seconds of generation time.

Poll every 5-10 seconds

Never less than 3 seconds (you will be rate-limited), and do not block indefinitely on long tasks. 720P / 5 seconds typically takes 70-140 seconds; 1080P / longer clips may exceed 5 minutes.

Set a 20-minute client timeout as a backstop

1080P or clips over 10 seconds are noticeably slower; give your polling loop a 20-minute backstop timeout.

Download as soon as you get result_url

result_url expires in 24 hours by default and is an OSS signed direct link, so do not send the Authorization header when downloading. In production, always re-store it to your own OSS / CDN.

Make submissions idempotent

Failed tasks are not billed, but resubmitting the same task bills again. Maintain a “business ID → task_id” mapping in your app layer to avoid accidental charges.

Error codes and retries

Errors come from two stages and are handled differently:

Source	Signature	Handling
Creation stage (rejected by APIYI)	HTTP 4xx/5xx, `type` is `task_error` / `parse_request_failed` / `build_request_failed`	Fix the body and retry (usually a wrong field type, missing media, or wrong endpoint)
Execution stage (rejected by upstream Alibaba Cloud)	Task ends as `status=failed`, `error.message` prefixed with `[InvalidParameter]` / `[InvalidImageUrl]` etc. in brackets	Read the bracketed hint; usually an unreachable media URL or a sensitive prompt

Recommended client behavior: exponential backoff retry on HTTP 5xx / network errors (1s / 4s / 16s); surface HTTP 4xx immediately without retry; a failed task with [InvalidImageUrl] can be retried (possibly a transient network issue), while [InvalidParameter] / sensitive words should not be retried.

FAQ

Why can't I use /v1/videos to submit Wan tasks?

/v1/videos is an OpenAI flat-style endpoint with incomplete support for Wan’s i2v / r2v: media fields like media get dropped, and upstream Alibaba Cloud returns [InvalidParameter] Field required: input.media. All Wan video creation requests go to /wan/api/v1/services/aigc/video-generation/video-synthesis, and queries always go to /v1/tasks/{task_id}.

What does the X-DashScope-Async: enable header do? Is it required?

It tells the endpoint “this is an async task, return a task_id immediately and do not block.” It is required on every creation request; omitting it returns current user api does not support synchronous calls. The query call (GET) does not need this header.

Why query at /v1/tasks/{id} instead of /wan/api/v1/tasks/{id}?

APIYI normalizes all video task queries to /v1/tasks/{task_id}. No matter which path you used to create the task, you query it through this one endpoint, and the response’s top-level status / progress / result_url / error fields are consistent.

result_url download returns 403 / SignatureDoesNotMatch, what now?

Drop the Authorization header. result_url is already an Alibaba Cloud OSS pre-signed direct link; adding an APIYI Key makes OSS reject it:

curl -L -o out.mp4 "$RESULT_URL"          # ✅ correct
curl -L -H "Authorization: Bearer $KEY" -o out.mp4 "$RESULT_URL"   # ❌ wrong

What if result_url has expired?

The link is valid for 24 hours by default. After it expires, re-GET /v1/tasks/{task_id} and you usually get a fresh result_url, but the task_id’s own query validity is also 24 hours (returns UNKNOWN after that). For long-term storage, download to your own storage as soon as possible.

progress is stuck at 30%, is it hung?

No. The progress reported by upstream Alibaba Cloud is coarse-grained (only 0% / 10% / 30% / 100% buckets). As long as status is still in_progress, keep waiting; it usually jumps straight from 30% to 100%.

How many tasks can one Key run concurrently?

In practice you can submit 4-8 tasks at once without hitting rate limits. In production, keep simultaneously active tasks ≤10; anything beyond that queues. The query API has a fairly high default RPS, but a 5-10 second polling interval is still recommended.

Are failed tasks billed?

status=failed is not billed. But note: resubmitting the same task bills again, so make it idempotent. During testing you can turn off prompt_extend and use 720P / 5 seconds / short prompts to lower the unit cost.

Is wan2.6 still usable?

Yes. The Wan2.6 series (including wan2.6-r2v-flash) is still on the callable list, with the same protocol as Wan2.7; just change the model name. See Historical Versions.

Group Setup

The Wan and HappyHorse series share a single Wan group — one Token can call both series (the Token in the screenshot is named Wan2.7&HappyHorse). Video models are billed per second, so the Token must meet two conditions to route successfully:

Billing model: choose Pay-as-you-go Priority or Pay-as-you-go — video is billed per second, so Pay-per-request Tokens cannot route
Group: select a group that includes Wan

Create Token dialog: billing model set to Pay-as-you-go Priority, group dropdown showing Wan (rate 0.14x), one Token usable for both Wan2.7 and HappyHorse

Pricing

Default price = 98% of Alibaba’s official price (simple to reason about)

In the console the Wan group shows a rate of 0.14x, which is denominated in the built-in RMB pricing unit. Because APIYI bills in USD at a fixed 1:7 exchange rate, the effective conversion is:

0.14 (RMB pricing unit) × 7 (fixed exchange rate) = 0.98

In other words, the default price = 98% of Alibaba’s official price — cheaper than buying direct from Alibaba, with no overseas link to build yourself.

Conversion: USD price per second = official RMB price × 0.14 (i.e. × 0.98 ÷ 7). For example, the official 1080P price of ¥1.0/s → $0.14/s, exactly the 0.14x shown in the console.

Price detail (default price, billed per second)

Wan2.7 text-to-video / image-to-video / reference-to-video are priced the same, with two tiers — 720P / 1080P (480P is not supported):

Resolution	Official price	Our default /s	5 s	10 s	12 s
`720P`	¥0.6/s	$0.084/s	$0.42	$0.84	$1.01
`1080P`	¥1.0/s	$0.14/s	$0.70	$1.40	$1.68

wan2.7-r2v defaults to 1080P, and duration is capped at 10 seconds when reference media includes a video.
wan2.7-videoedit (video edit) output duration follows the source video and is billed by actual output seconds, not by duration.
Prices shown are the default (98% of official); with the maximum top-up bonus, the effective price is roughly the table value ÷ 1.2 (e.g. 1080P 5 s $0.70 → about $0.58).

Stack top-up bonuses for an even lower effective price

After joining the top-up bonus program, credited balance can be boosted up to ~1.2x, pushing the effective price lower still:

0.98 ÷ 1.2 ≈ 0.816

So large customers can reach as low as ~81.6% of the official price.

Tier	Effective price (vs Alibaba official)	Formula
Default	98%	rate 0.14x × fixed exchange rate 7
With top-up bonuses (max tier for large customers)	~81.6%	0.98 ÷ 1.2

Billing dimension = resolution tier × duration (seconds); failed tasks are not billed.
1:7 is a fixed settlement exchange rate (not a preferential rate); it applies uniformly to all USD top-ups.
For the highest bonus tiers and eligible channels, see top-up bonuses. The latest rate is authoritative in the console.

Text-to-Video Playground

wan2.7-t2v live debugging + code samples

Image-to-Video Playground

wan2.7-i2v first frame + driving audio

Reference-to-Video Playground

wan2.7-r2v multi-subject reference + voice

Video Edit Playground

wan2.7-videoedit outfit / background swap

Historical Versions (Wan2.6)

Wan2.6 series and migration notes

HappyHorse Series

Also Alibaba-based, side-by-side selection guide

Alibaba Cloud official docs (reference): help.aliyun.com/zh/model-studio/text-to-video-api-reference. For questions or suggestions, please open a ticket in the APIYI console.

​Overview

Text-to-Video API

Image-to-Video API

Reference-to-Video API

Video Edit API

Visual API Testing

Async Task Lookup / Download

​Why use Wan on APIYI

One Key for every capability

Direct access, no VPN

No charge on failure

DashScope protocol passthrough

​Core features

Four-in-one async endpoint

Audio-driven lip-sync

Multi-subject reference

Multiple resolutions and durations

​Supported models

​⚠️ Endpoint choice (most important)

​Async call flow

​Task status reference

​Full Python client

​Key parameters explained

​input fields

​media[] types

​parameters fields

​Choosing between Wan and HappyHorse

​Best practices

​Error codes and retries

​FAQ

​Group Setup

​Pricing

​Default price = 98% of Alibaba’s official price (simple to reason about)

​Price detail (default price, billed per second)

​Stack top-up bonuses for an even lower effective price

​Related docs

Text-to-Video Playground

Image-to-Video Playground

Reference-to-Video Playground

Video Edit Playground

Historical Versions (Wan2.6)

HappyHorse Series

Overview

Why use Wan on APIYI

Core features

Supported models

⚠️ Endpoint choice (most important)

Async call flow

Task status reference

Full Python client

Key parameters explained

`input` fields

`media[]` types

`parameters` fields

Choosing between Wan and HappyHorse

Best practices

Error codes and retries

FAQ

Group Setup

Pricing

Default price = 98% of Alibaba’s official price (simple to reason about)

Price detail (default price, billed per second)

Stack top-up bonuses for an even lower effective price

Related docs