GPT-Image-2 Image Generation/Edit

Overview

gpt-image-2 is OpenAI’s latest flagship image generation model — the upgrade to gpt-image-1.5. Core upgrades: any valid resolution (incl. 2K / 3840×2160 4K), auto high-fidelity on reference images, 20-30% cheaper at the same tier. APIYI’s gateway is fully compatible with the OpenAI Images API — point the official OpenAI SDK’s base_url here for zero-code direct connection.

🎨 Key highlights: Native support for any valid resolution (max 3840×2160 4K) + auto high-fidelity on reference image edits + 20-30% lower cost than 1.5 at same size and quality + native Chinese prompt support. Best for production scenarios that need precise size/quality control, must match the OpenAI official API exactly, or require 4K output.

Text-to-Image API

/v1/images/generations — generate images from text prompts with size / quality / output_format control.

Image Edit API

/v1/images/edits — multipart upload of reference images (up to 16) + edit/fusion instructions, with mask inpainting support.

Why Choose APIYI’s GPT-image-2 Official Relay?

Built on OpenAI’s official channel, deeply optimized for enterprise production workloads across reliability, cost, and integration experience:

Official Channel · Same as Official

Strictly routed through OpenAI’s official relay — requests and responses are 100% identical to OpenAI official: same fields, same error codes, same model behavior. Lossless quality, no silent rewrites.

No Concurrency Limits

Not bound by OpenAI’s Tier-based RPM / TPM ceilings. Enterprise-scale traffic scales linearly — batch generation and peak-load scenarios handled with ease.

Same Price + Up to 15% Off

Default unit price matches OpenAI’s official pricing. Stack with our top-up bonus events for up to 15% off — long-term cost drops noticeably.

Global Zero-Barrier Access

No overseas server or proxy required. Connect directly to api.apiyi.com from domestic data centers, home broadband, or overseas nodes — stable latency, no cross-border re-architecture.

Full Model Lineup

Seamlessly switch to the reverse-engineered gpt-image-2-all ($0.03/image flat), or the cost-leader Nano Banana Pro / 2 — mix and match per scenario.

Professional Enterprise Support

Our team specializes in production image-generation deployments, with deep experience in model selection, tuning, and integration — end-to-end support from PoC to production.

Core Features

Any Resolution (incl. 4K)

Supports any valid output size. Presets cover 1K / 2K / 3840×2160 4K. Custom sizes only need to satisfy basic constraints (edges as multiples of 16, ratio ≤ 3:1).

Auto High-Fidelity

Reference image editing automatically enables high-fidelity. Detail, character identity, and text retention dramatically improved. Do not pass input_fidelity (will error).

20-30% Cheaper

1024×1024 high quality drops from the $0.25 range of 1.5 to $0.211/image. 2K/4K is token-metered but trends down equally — long-term cost noticeably lower.

Chinese + Text Rendering

Native Chinese prompt support. Stable rendering of Chinese/English text in signage, posters, UI screenshots. Fine text is rarely blurry on high quality.

Multi-Image Fusion (up to 16)

image[] array accepts up to 16 reference images. Use “image 1 / image 2 / image 3” in the prompt to reference them by upload order.

Mask Inpainting

Upload an alpha-channel mask. Transparent regions are inpaint areas, opaque regions are preserved.

Multiple Output Formats

Supports png (default) / jpeg / webp. Set output_compression for jpeg/webp to control file size.

OpenAI SDK Direct

Point base_url to https://api.apiyi.com/v1 and call directly with the official OpenAI SDK — zero-code migration.

Pricing

Token-metered (sum of input text + input image + output image tokens). Official per-image pricing reference:

Quality	1024×1024	1024×1536	1536×1024
Low	$0.006	$0.005	$0.005
Medium	$0.053	$0.041	$0.041
High	$0.211	$0.165	$0.165

Pricing notes:

2K / 4K has no fixed per-image price — billed by actual input + output tokens
Edit requests have noticeably higher input tokens than text-to-image due to forced high-fidelity
Streaming (stream: true + partial_images: N) costs an extra 100 output image tokens per partial
Compared to gpt-image-1.5 at the same size and quality, gpt-image-2 is about 20-30% cheaper

Group Setup

The gpt-image-2 official-relay channel offers two groups. Switch in dashboard → Token Settings → Group:

Group	Rate	When to use
`Default`	1.0x	Same price as OpenAI’s list — first choice when capacity is available; peak hours may see 429 / concurrency squeezes
`image2Enterprise`	1.2x	Stable fallback when the default group is tight — capacity-prioritized

Why 1.2x? It’s calibrated against “a $3,000 single-recharge promo with 20% bonus ≈ OpenAI list price” — APIYI takes no margin on this lane (tax costs aside) and runs it as a pure supply-priority channel. When the default group is unstable, switch your token to image2Enterprise to ride out the spike.

Token creation UI: billing mode = pay-as-you-go priority, group = image2Enterprise (1.2x), the high-speed list-price GPT-image-2 enterprise group

📖 Stability check (recent call log): /en/live/2026-04/image2-enterprise-stable

Technical Specifications

Dimension	Value
Model name	`gpt-image-2`
Speed	~120 seconds (4K high quality approaches 2 min)
Output resolution	Any valid size (1K/2K/4K, max 3840×2160)
Quality tiers	`auto` / `low` / `medium` / `high`
Output formats	`png` (default) / `jpeg` / `webp`
Chinese prompts	✅ Native
Per call	1 image (`n=1`)
Reference image limit	16 (`image[]`)
Per-image size limit	multipart file: under 50MB each (png/jpg/webp); base64 data URL: ~20MiB field limit, keep originals within 15MB
Mask inpainting	✅ Supported (alpha channel required, PNG under 4MB)
Transparent background	❌ Not supported (`background: transparent` errors)
Response field	`b64_json` (raw base64, no prefix)

Endpoints

Endpoint	Purpose	Content-Type
`POST /v1/images/generations`	Text-to-image	`application/json`
`POST /v1/images/edits`	Reference editing / multi-image fusion / mask inpainting	`multipart/form-data`

Domain selection: api.apiyi.com is the primary domain. Other gateway domains like b.apiyi.com / vip.apiyi.com work identically.

Size Reference

Preset Sizes

size	Meaning	Pixels
`auto`	Adaptive (default)	Model decides
`1024x1024`	Square 1:1	1K
`1536x1024`	Landscape 3:2	1K
`1024x1536`	Portrait 2:3	1K
`2048x2048`	Square 1:1	2K
`2048x1152`	Landscape 16:9	2K
`3840x2160`	Landscape 16:9	4K
`2160x3840`	Portrait 9:16	4K

Custom Size Constraints

gpt-image-2 accepts any valid size that satisfies all of:

Max edge ≤ 3840px
Both edges are multiples of 16
Aspect ratio ≤ 3:1
Total pixels ∈ [655,360, 8,294,400] (~0.65MP to ~8.3MP)

Valid examples: 1600x1200, 1792x1024, 2048x1536, 3200x1800 Invalid examples: 1000x1000 (not multiple of 16), 4000x4000 (over max), 3840x1000 (ratio > 3:1)

Outputs above 2560×1440 (~3.69MP) are officially marked experimental and may show quality fluctuations. For production, prefer presets like 2048x1152 / 2048x2048 / 3840x2160.

Best Practices

Onboarding tip: get the API working with low first, then scale upWe’ve seen new integrators jump straight to quality=high + high resolution and end up waiting ≈ 235 seconds (~4 minutes) per image — only to suspect the API was stuck. high mode has the highest inference complexity, and 4K can stretch close to 5 minutes. Before going to production, integrate end-to-end with quality=low first (auth, SDK, params, timeouts, error handling), then move up to medium / high only as your real quality requirement demands.

Integrate with low first

For new integrations, start with quality=low + a preset size to validate the full call chain (auth, params, timeouts, error handling). low is several times faster than high, so functional issues surface quickly without being masked by long latency.

Prefer preset sizes

The 8 official presets are tuned for stable speed and quality. Reserve custom sizes for genuinely unusual aspect ratios.

Match quality to scenario

Drafts / batch → low; daily / final → medium; text, fine textures, print → high. Note that low ↔ high is more than visual fidelity — it’s also a step change in inference complexity, so latency scales accordingly.

Choose JPEG output

For final display, output_format=jpeg + output_compression=85 is faster than PNG and roughly half the size.

Lock high for text scenarios

Text rendering is a key strength but lower tiers can still blur. Lock quality=high for signage and poster scenarios.

Prepare reference images

Each image up to 50MB (compress to within 1.5MB in practice); PNG/JPEG/WebP supported; up to 16 images; reference order with “image 1 / image 2” in the prompt.

Tier your client timeout (high → 600s safety net)

The two parameters that dominate latency are quality and size — especially quality. Configure client timeouts per tier:

quality	Recommended client timeout	Observed latency
`low`	≥ 120 seconds	typically 10–40 seconds
`medium`	≥ 240 seconds	typically 30–90 seconds
`high`	≥ 600 seconds (safety net)	2K/4K runs 3–5 minutes; long tail observed at 235+ seconds

For high mode, set 600s as the safety-net timeout to absorb queueing, long-tail variance, and upstream jitter. Show progress in the UI; consider a task queue server-side.

Migration notes

Migrating from gpt-image-1.5: drop input_fidelity (forced high-fidelity, will error if passed); avoid background: transparent (not supported).

Errors & Retries

Status	Meaning	Suggested action
`400`	Invalid parameters (size constraint violation, unsupported field, etc.)	Validate against size constraints; do not pass `input_fidelity` / `background: transparent`
`401`	Invalid token	Check Bearer Token
`403`	Content moderation block	Adjust prompt or pass `moderation: low`
`429`	Rate limit / insufficient balance	Exponential backoff
`5xx`	Gateway / backend error	Retry 1–2 times
Timeout	Long tail	Tier client timeout by `quality`: `low` ≥ 120s / `medium` ≥ 240s / `high` ≥ 600s (high + 2K/4K runs 3–5 minutes; long tail observed at 235+ seconds)

Client recommendations:

Tier request timeout by quality: low ≥ 120 seconds / medium ≥ 240 seconds / high ≥ 600 seconds (safety net — observed 3–5 minutes; configuring around 120s/360s causes many false timeouts)
Integrate with quality=low first, then move up to medium / high as real quality needs demand
Exponential backoff for 5xx and timeouts (suggest 2 retries)
Log x-request-id header for support

FAQ

Do I need to add the data:image/png;base64, prefix to b64_json?

Yes. gpt-image-2 returns a raw base64 string (no prefix), unlike gpt-image-2-all. Two client patterns:

Write file: base64.b64decode(b64_str) → write to disk
Browser render: img.src = 'data:image/png;base64,' + b64_str (prepend manually)

If your code assumes the 1.5-era “already prefixed” behavior, you’ll get a corrupted data URL — handle this explicitly.

Why does passing input_fidelity return 400?

gpt-image-2 forces high-fidelity processing of reference images and no longer accepts input_fidelity. When migrating from 1.5, just remove this field — no replacement needed.

What if I need a transparent background?

gpt-image-2 does not support background: transparent (will error). Two workarounds:

Set background to opaque (or omit) and key out transparency yourself with PIL / sharp / online tools
Temporarily fall back to gpt-image-1.5 for scenarios that genuinely need transparency

How many images per call?

1 image (n=1). For N images, issue N parallel requests. Each is independently token-billed.

Why is 2K/4K so slow?

Higher resolution and higher quality require more output image tokens, which naturally takes longer. We’ve seen quality=high + high resolution take ≈ 235 seconds (~4 minutes) per image in real customer integrations, and 3840×2160 + high long-tail can stretch close to 5 minutes. Recommendations:

Integrate with quality=low first to validate the call chain, then move up as real quality needs demand
Tier client timeout by quality: low ≥ 120s / medium ≥ 240s / high ≥ 600s (safety net)
Show “generating” progress in the UI
Use 1024×1024 / 1536×1024 1K presets when 4K isn’t needed

Why are edit requests more expensive than text-to-image?

Because gpt-image-2 auto-enables high-fidelity processing of reference images, the references themselves convert to large input token counts via the Vision pricing rules. Edit input tokens are noticeably higher than text-to-image — budget accordingly.

What are the image count and size limits for the edit endpoint?

The gpt-image-2 image edit endpoint (/v1/images/edits) supports up to 16 reference images:

multipart/form-data file upload: each image must be under 50MB, formats png / jpg / webp
base64 data URL: the field length limit is about 20MiB (schema maxLength: 20971520 — a string-field limit, not the same as the 50MB multipart cap), so keep original images within 15MB
mask file: separately limited to PNG under 4MB

Practical advice: don’t max out multiple large images at once — oversized request bodies tend to fail at the gateway / timeout layer. Compressing each image to within 1.5MB is the most reliable, and output quality is unrelated to input file size.

How do I prepare a mask file?

Same size as the original, PNG format, under 4MB
Must have alpha channel: transparent (alpha=0) = inpaint area, opaque = preserve
Only applies to the first image
Mask is a “soft guide” — the model may extend or contract around the masked region

gpt-image-2 vs gpt-image-2-all: which to pick?

Pick	When
gpt-image-2 (Official)	Need precise size/quality control, must match OpenAI official exactly, want 4K output, need mask inpainting
gpt-image-2-all (Reverse)	Want flat $0.03/image, 30–60s render, minimal parameters, strong consistency / Chinese text

Can I use the official OpenAI SDK directly?

Yes — zero code change. Point base_url to https://api.apiyi.com/v1 and set api_key to your APIYI token:

from openai import OpenAI
client = OpenAI(api_key="sk-your-key", base_url="https://api.apiyi.com/v1")
resp = client.images.generate(model="gpt-image-2", prompt="...", size="2048x1152", quality="high")

Can I cancel a generation in progress?

No. gpt-image-2 uses OpenAI’s official synchronous endpoint — once a request is submitted, it runs to completion with no “cancel” signal. Even if the client disconnects, the server still finishes generation and bills normally. Configure client-side timeouts carefully — do not assume “disconnect = no charge”.

Is there a rate limit (RPM)?

Default 100 RPM (100 requests per minute). Actual usable RPM is also dynamically adjusted by overall platform concurrency. If your workload needs more, contact us with your estimated QPS / RPM and we can provision additional capacity.

Does it support async invocation?

No. gpt-image-2 strictly mirrors the OpenAI official API — synchronous only. The request blocks until the result is returned (high + 4K realistically 1–2 minutes). If you need an async queue or callback mechanism:

Wrap it yourself with a task queue (Celery / BullMQ, etc.) at the business layer
Or use gpt-image-2-all — generates in 30–60s, easier to poll from the front end

Do failed generations get billed?

No. OpenAI’s built-in content moderation rejects unsafe / malformed requests with a 400 error, and no charge is incurred. Typical response:

{
  "status_code": 400,
  "error": {
    "message": "Your request was rejected by the safety system. ...",
    "type": "shell_api_error",
    "code": "moderation_blocked"
  }
}

Other zero-cost errors: 401 (invalid token), 429 (rate limit). Token billing only kicks in once the request actually reaches the model generation stage (i.e., 200 + b64_json received).

⚖️ Official vs Reverse Comparison - Side-by-side selection guide
Text-to-Image Playground - /v1/images/generations interactive testing
Image Edit Playground - /v1/images/edits multi-image fusion + mask
Deep Dive: gpt-image-2 Launch - News article
Full Integration Doc - Complete API reference
GPT-Image-2-All (Reverse-Engineered) - Cheaper, faster alternative
Community: Luck GPT-Image 2 ComfyUI Nodes - Call gpt-image-2 directly in ComfyUI (mask / 5 reference images / custom sizes)
Community: APIYI GPT-Image 2 Skills - Invoke from Codex CLI / Cursor / Gemini CLI and other AI coding tools with one sentence
API Manual - General usage guide

gpt-image-2 is OpenAI’s official flagship, billed by token. If you prioritize flat pricing ($0.03/image) and faster generation (30–60s), see gpt-image-2-all.

​Overview

Text-to-Image API

Image Edit API

​Why Choose APIYI’s GPT-image-2 Official Relay?

Official Channel · Same as Official

No Concurrency Limits

Same Price + Up to 15% Off

Global Zero-Barrier Access

Full Model Lineup

Professional Enterprise Support

​Core Features

Any Resolution (incl. 4K)

Auto High-Fidelity

20-30% Cheaper

Chinese + Text Rendering

Multi-Image Fusion (up to 16)

Mask Inpainting

Multiple Output Formats

OpenAI SDK Direct

​Pricing

​Group Setup

​Technical Specifications

​Endpoints

​Size Reference

​Preset Sizes

​Custom Size Constraints

​Best Practices

​Errors & Retries

​FAQ

​Related Docs

Overview

Why Choose APIYI’s GPT-image-2 Official Relay?

Core Features

Pricing

Group Setup

Technical Specifications

Endpoints

Size Reference

Preset Sizes

Custom Size Constraints

Best Practices

Errors & Retries

FAQ

Related Docs