GPT-image-2 Official vs Reverse - API易文档中心

TL;DR

If you need	Pick
`quality` knob / mask inpainting / strict OpenAI-API field parity	`gpt-image-2` (Official)
Predictable flat $0.03/image + faster output	`gpt-image-2-all` (Reverse, ChatGPT-web line, ~90s)
Predictable flat $0.03/image + locked output sizes (incl. 4K)	`gpt-image-2-vip` (Reverse, Codex line, ~120–200s)

All three models are built on OpenAI’s gpt-image-2 underneath. The differences are in channel nature (official direct vs reverse-engineered), pricing model, and parameter granularity.

The two reverse siblings (-all / -vip): This page’s “Reverse” column covers both gpt-image-2-all and gpt-image-2-vip — they share identical call format and the same $0.03/image flat price. The differences are confined to the size field and generation time:

gpt-image-2-all: no size (describe in prompt), ChatGPT web line, ~90s generation
gpt-image-2-vip: 30 explicit sizes (10 ratios × 1K/2K/4K, including 4K), Codex line, ~120–200s generation (on par with the official version)
Both: no quality, no n, no mask inpainting

Cells in the “Reverse” column where they differ are tagged with -all / -vip sub-rows. For quality tiers or mask inpainting, you still need the official gpt-image-2.

About current speeds: -all / -vip generation is slower than at launch due to OpenAI upstream compute fluctuations — this affects all reverse-channel users, not just APIYI; our account pool and ops are healthy. Set client timeouts to 300s+ and leave more headroom for complex prompts.

Full Comparison Table

Dimension	gpt-image-2-all / -vip (Reverse, cost-effective)	gpt-image-2 (Official)
Model name	`gpt-image-2-all` (no size, fastest) / `gpt-image-2-vip` (explicit size or 4K)	`gpt-image-2`
Channel nature	`-all`: reverse-engineered ChatGPT web line `-vip`: reverse-engineered Codex line	Official direct (OpenAI Images API)
Pricing	Per-call: flat $0.03/call (both models, all sizes — no 4K surcharge)	Token-metered: matches official; ~85% of list price after APIYI deposit bonuses
Typical cost/image	$0.03 (regardless of size / quality / model)	Measured $0.03 – $0.2 (correlates with prompt length, size, quality)
Token group	Default	Default
Token type	Per-call or Token-priority both work	Token-priority only (this model is token-billed; per-call tokens will be rejected)
Recommended endpoint	`/v1/images/generations` + `/v1/images/edits` (more stable, more upstream supply, and same code as official — just swap the `model` name to switch during risk-control turbulence)	`/v1/images/generations` + `/v1/images/edits`
Alt endpoints	`/v1/chat/completions` (multi-turn chat-based editing, pass online image URLs directly)	(only the two official ones)
Upload format	base64 or https URL (chat endpoint) / multipart file (edits endpoint)	multipart file (edit endpoint)
Output format	`b64_json` (includes prefix) or `url` (R2 CDN)	`b64_json` (raw base64, no prefix)
Reference image count	Multiple (chat-mode upper bound is high)	Max 16 (`image[]`)
Mask inpainting	❌ Not supported	✅ Supported (alpha channel required)
Prompt adherence	Good	Excellent
Generation speed	`-all`: ~90 seconds (faster) `-vip`: ~120–200 seconds (on par with official) 📌 Currently slower than at launch — OpenAI upstream compute, not an APIYI-side issue	~100-120 seconds, complex + 4K can reach 3-5 minutes
`size` parameter	`-all`: ❌ Not accepted (describe in prompt) `-vip`: ✅ 30 explicit sizes (10 ratios × 1K/2K/4K)	✅ Any valid custom size
4K support	`-all`: ❌ `-vip`: ✅ 4K Detail tier (e.g., `3840x2160` / `2880x2880`)	✅ Including `3840×2160`
Common output sizes	`-all`: 16:9 → 1672×941, 9:16 → 941×1672, 1:1 → 1254×1254 (adaptive) `-vip`: see full 30-size table	8 presets + any valid custom size
`quality` parameter	❌ Both reverse models reject it (do not pass)	✅ `low` / `medium` / `high` / `auto`
`n` parameter	❌ Both reverse models reject it (1 image per call)	✅ Supported
Transparent background	—	❌ Not supported (`background: transparent` errors)
Chinese prompts	✅ Native	✅ Native
Text rendering	High fidelity	High fidelity (strongest at `high` tier)
Content restrictions	Looser	Stricter (OpenAI official policy)
API docs	GPT-Image-2-All Overview / GPT-Image-2-VIP Overview	GPT-Image-2 Overview

🔑 Create or manage API tokens: https://api.apiyi.com/token
When creating a token in the console, choose a group (Default is fine) and a token type (Per-call / Token-priority). Calling gpt-image-2 (official) requires a “Token-priority” token — per-call tokens will be rejected due to billing-mode mismatch.

When to Pick Each

Pick `gpt-image-2-all` (Reverse) when

💰 Predictable cost

Stable $0.03/image with no size/quality tier. Ideal for batch production with hard cost ceilings (infographics, marketing assets, e-commerce thumbnails).

⚡ Faster output

~90s generation — slightly faster than both -vip and the official version. Better real-time UX.

🗨️ Chat-style workflows

/v1/chat/completions handles multi-turn iterative editing, text-to-image, and reference editing — all from one endpoint. Simplest integration.

🌏 Chinese + marketing text

Native Chinese prompt support, excellent text rendering for signage / posters / infographics — great for Chinese-audience content production.

Pick `gpt-image-2-vip` (Reverse, when you need locked size or 4K) when

📐 Strictly locked output sizes

Supports 30 explicit sizes (10 ratios × 1K/2K/4K) — perfect for e-commerce hero shots, poster templates, video thumbnails, wallpapers where exact dimensions matter.

🖼️ 4K at the same flat price

The 4K Detail tier (2880×2880 / 3840×2160 / 3840×1632 etc.) costs the same $0.03/image as 1K/2K — no 4K surcharge.

🔁 Code shared with -all

Identical request structure to -all — just one extra size field. One codebase can switch between both models as needed.

💰 Cost still predictable

Flat $0.03/image — dramatically cheaper than the official 4K high-quality tier for the locked-4K use case.

Pick `gpt-image-2` (Official) when

🎚️ Quality tiers

quality supports low/medium/high/auto. Use low for drafts to save cost; high for print-grade finals — official-only; both reverse models reject it.

🎯 Mask inpainting

Alpha-channel mask supported — precisely modify a region while preserving the rest. Both reverse models do not support this.

🖼️ Any custom size

size accepts any valid resolution. Pick official when the size you need isn’t in -vip’s 30-size set, or you need finer-grained dimension control.

🔌 Same as OpenAI Official

Goes through the official Images API — fields and behavior identical to OpenAI official. Existing OpenAI-SDK-based code / systems migrate with zero changes and stay stable long-term.

Key Differences in Detail

1. b64_json format gotcha (migration trap!)

# gpt-image-2-all: b64_json already has prefix — drop into <img src>
all_b64 = resp["data"][0]["b64_json"]
# "data:image/png;base64,iVBORw0KGgo..."
img_tag = f'<img src="{all_b64}">'  # ✅ direct use

# gpt-image-2: b64_json is raw base64, no prefix — decode or prepend manually
official_b64 = resp.data[0].b64_json
# "iVBORw0KGgo..."
with open("out.png", "wb") as f:
    f.write(base64.b64decode(official_b64))  # ✅ write file
img_tag = f'<img src="data:image/png;base64,{official_b64}">'  # ✅ browser render

When switching between the two, the b64_json handling code must change, or you’ll get a corrupted data URL or a decode failure.

2. Resolution control

gpt-image-2-all (in the prompt):

"Landscape 16:9 cinematic, old lighthouse at sunset"   → ~1672×941
"Portrait 9:16 phone wallpaper, cyberpunk city"        → ~941×1672
"1024×1024 square logo, minimalist cat line art"        → ~1254×1254

gpt-image-2-vip (also reverse-engineered, accepts size directly, 30 sizes including 4K):

curl "https://api.apiyi.com/v1/images/generations" \
  -H "Authorization: Bearer $YI_API_KEY" \
  -d '{
    "model": "gpt-image-2-vip",
    "prompt": "White ceramic mug on a gray desk",
    "size": "2048x1360"
  }'

# 4K is also supported at the same $0.03 flat price
# "size": "3840x2160"

gpt-image-2 (size parameter strict + quality tiers):

client.images.generate(
    model="gpt-image-2",
    prompt="...",
    size="2048x1152",   # ✅ output exactly this
    quality="high"      # official-only
)

3. Upload / output format differences

Operation	gpt-image-2-all	gpt-image-2
Upload reference	base64 data URL or https URL (in chat messages’ `image_url`)	multipart `image[]` file field
Download output	Default `url` (R2 CDN, 24h validity), can switch to `b64_json` (with prefix)	`b64_json` (raw base64, requires decode)
Multi-image fusion	Multiple `image_url` blocks in chat	`image[]` array, max 16

4. Cost ballpark

Scenario	gpt-image-2-all / -vip	gpt-image-2
1024×1024 draft	$0.03	~$0.006 (low)
1024×1024 medium quality	$0.03	~$0.053 (medium)
1024×1024 high quality	$0.03	~$0.211 (high)
2048×1152 high quality	$0.03	~$0.20+ (token-metered)
3840×2160 4K high quality	$0.03 (only `-vip` supports 4K)	Token-metered, significantly higher than 1K
Edit / multi-image fusion	$0.03	Input tokens rise sharply, single call can hit $0.1+

Bottom line: For batch / low-quality workloads, the reverse channel isn’t always cheaper (1K low is actually less expensive on the official tier). For mid-to-high quality and 4K, the reverse channel’s flat $0.03 wins — -vip’s 4K matches 1K/2K pricing and beats the official 4K high-quality tier by an order of magnitude. Pick official only when you need quality tiers / mask inpainting / strict OpenAI-API field parity.

Client Settings

Setting	gpt-image-2-all / -vip	gpt-image-2
Timeout (conservative)	`-all`: 300s (typical ~90s) `-vip`: 300s (typical 120–200s, 4K long-tail goes higher)	360s (4K high quality realistically reaches 3-5 minutes)
Retry strategy	Exponential backoff on 5xx / timeout, max 2 retries	Same
Concurrency	chat endpoint is naturally concurrency-friendly; 1 image per call — parallel for multiple	1 image per call — issue parallel requests for multiple
Request ID	`request-id` response header	`x-request-id` response header

Common to all three models: for image edit / multi-image fusion, compress each input image to under 1.5MB (JPEG quality 80-90 / down-sized resolution). Sporadic shell_api_error / Unknown error responses are most often triggered by oversized inputs — compressing measurably improves success rate and latency. Output resolution is independent of input size — quality is set on the output side (size for -vip / official, prompt phrasing for -all), not by input file size.

FAQ

Should I compress input images? Does writing 4K / 8K in the prompt help?

Yes, strongly recommended. For all three models, compress each input image to under 1.5MB (JPEG quality 80-90 / down-sized resolution): sporadic shell_api_error / Unknown error responses are most often triggered by oversized inputs, and compressing measurably improves success rate and latency.Don’t worry about compression hurting quality — output resolution is independent of input size. The “output-side” controls differ across the three:

gpt-image-2-all: controlled by prompt composition phrasing (see the verified phrasing table on the -all overview page) — 4K / 8K in the prompt does not count
gpt-image-2-vip: controlled by the size field (30 sizes incl. 4K, flat $0.03/image)
gpt-image-2: controlled by size + quality (any valid size)

Bottom line: shrinking inputs only speeds things up — quality is set by output-side configuration, not input file size.

Can the same API Key call all three models?

Yes. All three run on the Default channel — the same API Key calls them with no extra config. Note: calling gpt-image-2 (official) requires a “Token-priority” token; -all / -vip accept either token type.

Which of the three reverse-channel endpoints should I use first?

Use the OpenAI Images API first (/v1/images/generations for text-to-image + /v1/images/edits for editing), for two reasons:

More stable: upstream resource supply for the Images API channel is more plentiful than for chat completions, so call success rates are higher
Compatible with the official relay for easy switching: the call method and parameters like size are fully compatible with the official-relay gpt-image-2 — if the reverse channel hits risk-control turbulence, just swap the model name to switch to the official relay with zero code changes

The chat-based API (/v1/chat/completions) is a supplement, best for multi-turn iterative editing or passing online image URLs directly. See GPT-Image-2-All overview / GPT-Image-2-VIP overview.

Can the chat endpoint return text instead of an image?

It can. When the image-generation intent isn’t strong enough, the reverse models’ chat endpoint may return plain text. Workaround: prepend a fixed prefix like “Generate image:” to the user prompt, or constrain output via a system message.

Within the reverse channel, -all vs -vip — which to pick?

Both are reverse-engineered channels at the same flat price ($0.03/image), with identical call format. The differences:

size field: -all rejects it (describe in prompt); -vip accepts 30 explicit sizes (incl. 4K)
Generation time: -all ~90s; -vip ~120–200s (on par with the official version). Currently slower than at launch due to OpenAI upstream compute fluctuations

Decision: don’t need locked size, want fastest output → -all; need locked size or 4K → -vip. See the GPT-Image-2-VIP Overview for details.

If `-vip` already supports 4K, do I still need the official one?

Yes. Official-only features: quality tiers (low/medium/high/auto), mask inpainting (alpha-channel mask), strict OpenAI-API field parity (zero-change migration for existing OpenAI-SDK code), arbitrary sizes outside -vip’s 30-size set.On cost: -vip’s 4K matches 1K/2K at $0.03/image — for the locked-4K use case, -vip is an order of magnitude cheaper than the official 4K high-quality tier.

Migrating from 1.5 — which one should I pick?

Stick with the OpenAI SDK / must match OpenAI official: pick gpt-image-2 (official). Drop input_fidelity, avoid background: transparent, leave the rest unchanged.
Cut cost, size-insensitive: pick gpt-image-2-all (reverse, ~90s).
Cut cost, need locked size or 4K: pick gpt-image-2-vip (reverse, ~120–200s).

Can I deploy multiple models for failover?

Yes. A common pattern: primary -all or -vip (predictable cost — pick by whether you need locked sizes), fallback gpt-image-2 (switch when you need quality tiers or mask). The reverse and official response shapes differ — normalize at the business layer.

The R2 CDN image link is slow — what can I do?

See Slow CDN downloads — what to do

GPT-Image-2 Overview - Full official integration docs
GPT-Image-2-All Overview - Reverse ChatGPT-web line (fastest output) full integration docs
GPT-Image-2-VIP Overview - Reverse Codex line (30 sizes, 4K) full integration docs
Deep dive: gpt-image-2 launch - Official version launch
Deep dive: gpt-image-2-all launch - Reverse-engineered version launch
Community: Luck GPT-Image 2 ComfyUI Nodes - Multi-model ComfyUI node pack
Community: APIYI GPT-Image 2 Skills - Multi-model AI Agent Skill pack
Deposit promotions - Recharge bonus policy

​TL;DR

​Full Comparison Table

​When to Pick Each

​Pick gpt-image-2-all (Reverse) when

💰 Predictable cost

⚡ Faster output

🗨️ Chat-style workflows

🌏 Chinese + marketing text

​Pick gpt-image-2-vip (Reverse, when you need locked size or 4K) when

📐 Strictly locked output sizes

🖼️ 4K at the same flat price

🔁 Code shared with -all

💰 Cost still predictable

​Pick gpt-image-2 (Official) when

🎚️ Quality tiers

🎯 Mask inpainting

🖼️ Any custom size

🔌 Same as OpenAI Official

​Key Differences in Detail

​1. b64_json format gotcha (migration trap!)

​2. Resolution control

​3. Upload / output format differences

​4. Cost ballpark

​Client Settings

​FAQ

​Related Docs

TL;DR

Full Comparison Table

When to Pick Each

Pick `gpt-image-2-all` (Reverse) when

Pick `gpt-image-2-vip` (Reverse, when you need locked size or 4K) when

Pick `gpt-image-2` (Official) when

Key Differences in Detail

1. b64_json format gotcha (migration trap!)

2. Resolution control

3. Upload / output format differences

4. Cost ballpark

Client Settings

FAQ

Related Docs