Skip to main content
POST
/
v1
/
videos
Image-to-video: submit a video generation task from a reference image
curl --request POST \
  --url https://api.apiyi.com/v1/videos \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form model=sora-2 \
  --form 'prompt=Animate this scene: gentle waves lapping, leaves swaying, cinematic camera push-in' \
  --form seconds=8 \
  --form size=1280x720 \
  --form input_reference='@example-file'
{
  "id": "video_abc123def456",
  "object": "video",
  "model": "sora-2",
  "status": "queued",
  "progress": 0,
  "created_at": 1712697600,
  "completed_at": 1712697900,
  "size": "1280x720",
  "seconds": "8",
  "quality": "standard"
}
The interactive Playground on the right supports live debugging. Set your API Key in the Authorization field (format: Bearer sk-xxx), upload a reference image, enter a prompt, choose model / size / seconds, and send.
Scope: This page covers “generate video from a reference image” — upload one image as the starting frame / visual anchor to animate static visuals. If you don’t need a reference image, use the Text-to-Video endpoint (same path, JSON body).
⚠️ Reference image dimensions must exactly match size
  • The uploaded image’s pixel dimensions must equal the size field (e.g. size=1280x720 requires a 1280×720 image)
  • Mismatch returns 400: Inpaint image must match the requested width and height
  • Pre-crop with ffmpeg / Pillow before upload
Other notes:
  • Content-Type must be multipart/form-data (not JSON)
  • Only one file is supported; the field name is fixed as input_reference
  • Accepted formats: image/jpeg / image/png / image/webp

Code Samples

Python (OpenAI SDK Drop-In)

from openai import OpenAI
import time

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.apiyi.com/v1"
)

# Step 1: Submit (the OpenAI SDK auto-handles multipart when input_reference is provided)
with open("./reference.png", "rb") as f:
    video = client.videos.create(
        model="sora-2",
        prompt="Animate this scene: gentle waves lapping against the shore, leaves swaying in the breeze",
        seconds="8",
        size="1280x720",
        input_reference=f
    )
print(f"Video ID: {video.id}, status: {video.status}")

# Step 2: Poll
while True:
    video = client.videos.retrieve(video.id)
    print(f"Status: {video.status}, progress: {getattr(video, 'progress', 0)}%")
    if video.status == "completed":
        break
    if video.status == "failed":
        raise RuntimeError(f"Generation failed: {video}")
    time.sleep(15)

# Step 3: Download
client.videos.download_content(video.id).write_to_file("output.mp4")
print("Saved: output.mp4")

Python (Raw requests + multipart)

import requests
import time

API_KEY = "sk-your-api-key"
BASE_URL = "https://api.apiyi.com/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Step 1: Multipart upload (image dimensions must equal size)
with open("./reference.png", "rb") as f:
    resp = requests.post(
        f"{BASE_URL}/videos",
        headers=HEADERS,  # Don't manually set Content-Type — requests handles the multipart boundary
        data={
            "model": "sora-2",
            "prompt": "Animate this scene with cinematic camera push-in, soft golden hour lighting",
            "seconds": "8",
            "size": "1280x720"
        },
        files={
            "input_reference": ("reference.png", f, "image/png")
        },
        timeout=60  # Multipart uploads of large images can be slow; use a 60-second timeout
    ).json()
video_id = resp["id"]
print(f"Video ID: {video_id}, status: {resp['status']}")

# Step 2: Poll
deadline = time.time() + 900
while time.time() < deadline:
    status_resp = requests.get(f"{BASE_URL}/videos/{video_id}", headers=HEADERS).json()
    print(f"Status: {status_resp['status']}, progress: {status_resp.get('progress', 0)}%")
    if status_resp["status"] == "completed":
        break
    if status_resp["status"] == "failed":
        raise RuntimeError(f"Generation failed: {status_resp}")
    time.sleep(15)

# Step 3: Download
with requests.get(f"{BASE_URL}/videos/{video_id}/content", headers=HEADERS, stream=True) as r:
    r.raise_for_status()
    with open("output.mp4", "wb") as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)
print("Saved: output.mp4")

cURL

{/* Step 1: Multipart upload + submit */}
curl -X POST "https://api.apiyi.com/v1/videos" \
  -H "Authorization: Bearer sk-your-api-key" \
  -F "model=sora-2" \
  -F "prompt=Animate this scene: gentle waves lapping, leaves swaying, cinematic" \
  -F "seconds=8" \
  -F "size=1280x720" \
  -F "input_reference=@./reference.png;type=image/png"

{/* Step 2: Poll */}
curl -X GET "https://api.apiyi.com/v1/videos/video_abc123" \
  -H "Authorization: Bearer sk-your-api-key"

{/* Step 3: Download */}
curl -X GET "https://api.apiyi.com/v1/videos/video_abc123/content" \
  -H "Authorization: Bearer sk-your-api-key" \
  -o output.mp4

Node.js (fetch + FormData)

import fs from 'node:fs';
import { fileFromPath } from 'formdata-node/file-from-path';
import { FormData } from 'formdata-node';

const API_KEY = 'sk-your-api-key';
const BASE_URL = 'https://api.apiyi.com/v1';

// Step 1: Multipart upload
const form = new FormData();
form.set('model', 'sora-2');
form.set('prompt', 'Animate this scene with cinematic camera push-in, soft lighting');
form.set('seconds', '8');
form.set('size', '1280x720');
form.set('input_reference', await fileFromPath('./reference.png'));

const submitResp = await fetch(`${BASE_URL}/videos`, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${API_KEY}` },  // Don't manually set Content-Type
    body: form
});
const { id: videoId } = await submitResp.json();
console.log(`Video ID: ${videoId}`);

// Step 2: Poll
let status = 'queued';
while (status !== 'completed' && status !== 'failed') {
    await new Promise(r => setTimeout(r, 15000));
    const data = await (await fetch(`${BASE_URL}/videos/${videoId}`, {
        headers: { 'Authorization': `Bearer ${API_KEY}` }
    })).json();
    status = data.status;
    console.log(`Status: ${status}, progress: ${data.progress ?? 0}%`);
}

if (status === 'failed') throw new Error('Generation failed');

// Step 3: Download
const contentResp = await fetch(`${BASE_URL}/videos/${videoId}/content`, {
    headers: { 'Authorization': `Bearer ${API_KEY}` }
});
fs.writeFileSync('output.mp4', Buffer.from(await contentResp.arrayBuffer()));
console.log('Saved: output.mp4');

Browser JavaScript

{/* Demo only; route through your backend in production to avoid leaking the API key. */}
const fileInput = document.getElementById('refImage');  // <input type="file" />
const file = fileInput.files[0];

const form = new FormData();
form.append('model', 'sora-2');
form.append('prompt', 'Animate this scene, gentle motion');
form.append('seconds', '4');
form.append('size', '1280x720');
form.append('input_reference', file);

const submitResp = await fetch('https://api.apiyi.com/v1/videos', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer sk-your-api-key' },
    body: form
});
const { id } = await submitResp.json();
console.log('Video ID:', id);

{/* After polling completes, route the video URL through a backend proxy to avoid downloading large files in the browser. */}

Parameters Quick Reference

ParameterTypeRequiredDefaultDescription
modelstringYessora-2 (720p only) or sora-2-pro (720p / 1024p / 1080p tiers)
promptstringYesVideo description; focus on how the static image should animate (camera motion, object motion, lighting changes)
secondsstringNo"4"Duration as string enum: "4" / "8" / "12"
sizestringNo720x1280Output resolution, must equal the input_reference image dimensions exactly
input_referencefileYesReference image file: image/jpeg / image/png / image/webp, dimensions must equal size
Detailed parameter constraints, allowed values, and examples are visible in the right-hand Playground. input_reference must be uploaded via multipart — URLs and base64 are not accepted.

Reference Image Preparation

1

Pick the target resolution

Choose size first based on your use case: portrait 720x1280, landscape 1280x720, Pro 1080p landscape 1920x1080, etc.
2

Crop locally to exact pixels

Use Pillow / ffmpeg to crop the image to the target dimensions:
from PIL import Image
img = Image.open("source.jpg")
img = img.resize((1280, 720), Image.LANCZOS)  # Or crop first then resize to preserve aspect ratio
img.save("reference.png")
Or one-line ffmpeg:
ffmpeg -i source.jpg -vf "scale=1280:720" reference.png
3

Pick the right format

Prefer PNG (lossless, ideal for illustrations / screenshots), JPEG for photos to save bytes, WebP if you need transparency.
4

Focus the prompt on "motion" not "appearance"

The reference image already defines the visuals. The prompt should focus on how it should animate: camera push/pull, object motion, lighting changes, character expressions, etc. Example: "Camera slowly pushes in, leaves gently swaying, sunlight flickering through branches".

Response Format

The response shape is identical to Text-to-Video: submit returns id + status: "queued", polling reports progress, completion downloads via /v1/videos/{id}/content as MP4.
{
  "id": "video_abc123def456",
  "object": "video",
  "model": "sora-2",
  "status": "queued",
  "progress": 0,
  "created_at": 1712697600,
  "size": "1280x720",
  "seconds": "8",
  "quality": "standard"
}
⚠️ Common 400 errors
  • Inpaint image must match the requested width and height — reference image dimensions don’t match size. Most common. Validate dimensions client-side before upload
  • Invalid file format — uploaded file is not jpeg / png / webp, or is corrupted
  • Missing required parameter: input_reference — multipart field name is wrong (must be input_reference, not image or reference)
  • seconds must be one of "4", "8", "12" — passed integer 4 instead of string "4"
Image-to-video and text-to-video have the same per-second pricing (billed by seconds); uploading a reference image does not cost extra. See the pricing table.

Authorizations

Authorization
string
header
required

API Key from the APIYI console (must use Sora2官转 group + usage-based billing)

Body

multipart/form-data
model
enum<string>
default:sora-2
required

Model ID. sora-2 supports 720p only; sora-2-pro supports 720p / 1024p / 1080p

Available options:
sora-2,
sora-2-pro
prompt
string
required

Video generation prompt. Focus on how the image should animate: camera motion, object motion, lighting changes

Example:

"Animate this scene: gentle waves lapping, leaves swaying, cinematic camera push-in"

input_reference
file
required

Reference image file used as the video's starting frame / visual anchor.

  • Accepted formats: image/jpeg / image/png / image/webp
  • Dimensions must equal size, otherwise you get Inpaint image must match the requested width and height
  • Only one file is supported; field name is fixed as input_reference
seconds
enum<string>
default:4

Video duration as string enum: "4" / "8" / "12"

Available options:
4,
8,
12
size
enum<string>
default:720x1280

Output resolution. Must exactly match the input_reference image dimensions:

  • sora-2 (720p only): 720x1280 / 1280x720
  • sora-2-pro additionally: 1024x1792 / 1792x1024 / 1080x1920 / 1920x1080
Available options:
720x1280,
1280x720,
1024x1792,
1792x1024,
1080x1920,
1920x1080

Response

Task submitted, returns video_id with queued status

id
string

Task ID for subsequent polling and download

Example:

"video_abc123def456"

object
string

Object type, fixed video

Example:

"video"

model
string

Model ID used for this task

Example:

"sora-2"

status
enum<string>

Task status:

  • queued — submitted, waiting in queue
  • in_progress — generating
  • completed — done, ready to download (/v1/videos/{id}/content)
  • failed — failed (not billed), safe to retry
Available options:
queued,
in_progress,
completed,
failed
Example:

"queued"

progress
integer

Generation progress percentage (0–100), not strictly linear

Example:

0

created_at
integer

Task creation Unix timestamp (seconds)

Example:

1712697600

completed_at
integer

Task completion Unix timestamp (seconds), present only on completed status

Example:

1712697900

size
string

Actual output resolution (matches the requested size)

Example:

"1280x720"

seconds
string

Actual duration generated (matches the requested seconds)

Example:

"8"

quality
string

Quality tier (standard for sora-2, high for sora-2-pro)

Example:

"standard"