Overview
gpt-image-2 is OpenAI’s latest flagship image generation model — the upgrade togpt-image-1.5. Core upgrades: any valid resolution (incl. 2K / 3840×2160 4K), auto high-fidelity on reference images, 20-30% cheaper at the same tier. APIYI’s gateway is fully compatible with the OpenAI Images API — point the official OpenAI SDK’s base_url here for zero-code direct connection.
Text-to-Image API
/v1/images/generations — generate images from text prompts with size / quality / output_format control.Image Edit API
/v1/images/edits — multipart upload of reference images (up to 5) + edit/fusion instructions, with mask inpainting support.Why Choose APIYI’s GPT-image-2 Official Relay?
Built on OpenAI’s official channel, deeply optimized for enterprise production workloads across reliability, cost, and integration experience:Official Channel · Same as Official
No Concurrency Limits
Same Price + Up to 15% Off
Global Zero-Barrier Access
api.apiyi.com from domestic data centers, home broadband, or overseas nodes — stable latency, no cross-border re-architecture.Full Model Lineup
gpt-image-2-all ($0.03/image flat), or the cost-leader Nano Banana Pro / 2 — mix and match per scenario.Professional Enterprise Support
Core Features
Any Resolution (incl. 4K)
Auto High-Fidelity
input_fidelity (will error).20-30% Cheaper
Chinese + Text Rendering
high quality.Multi-Image Fusion (up to 5)
image[] array accepts up to 5 reference images. Use “image 1 / image 2 / image 3” in the prompt to reference them by upload order.Mask Inpainting
Multiple Output Formats
output_compression for jpeg/webp to control file size.OpenAI SDK Direct
base_url to https://api.apiyi.com/v1 and call directly with the official OpenAI SDK — zero-code migration.Pricing
Token-metered (sum of input text + input image + output image tokens). Official per-image pricing reference:| Quality | 1024×1024 | 1024×1536 | 1536×1024 |
|---|---|---|---|
| Low | $0.006 | $0.005 | $0.005 |
| Medium | $0.053 | $0.041 | $0.041 |
| High | $0.211 | $0.165 | $0.165 |
- 2K / 4K has no fixed per-image price — billed by actual input + output tokens
- Edit requests have noticeably higher input tokens than text-to-image due to forced high-fidelity
- Streaming (
stream: true+partial_images: N) costs an extra 100 output image tokens per partial - Compared to
gpt-image-1.5at the same size and quality,gpt-image-2is about 20-30% cheaper
Technical Specifications
| Dimension | Value |
|---|---|
| Model name | gpt-image-2 |
| Speed | ~120 seconds (4K high quality approaches 2 min) |
| Output resolution | Any valid size (1K/2K/4K, max 3840×2160) |
| Quality tiers | auto / low / medium / high |
| Output formats | png (default) / jpeg / webp |
| Chinese prompts | ✅ Native |
| Per call | 1 image (n=1) |
| Reference image limit | 5 (image[]) |
| Mask inpainting | ✅ Supported (alpha channel required) |
| Transparent background | ❌ Not supported (background: transparent errors) |
| Response field | b64_json (raw base64, no prefix) |
Endpoints
| Endpoint | Purpose | Content-Type |
|---|---|---|
POST /v1/images/generations | Text-to-image | application/json |
POST /v1/images/edits | Reference editing / multi-image fusion / mask inpainting | multipart/form-data |
Size Reference
Preset Sizes
| size | Meaning | Pixels |
|---|---|---|
auto | Adaptive (default) | Model decides |
1024x1024 | Square 1:1 | 1K |
1536x1024 | Landscape 3:2 | 1K |
1024x1536 | Portrait 2:3 | 1K |
2048x2048 | Square 1:1 | 2K |
2048x1152 | Landscape 16:9 | 2K |
3840x2160 | Landscape 16:9 | 4K |
2160x3840 | Portrait 9:16 | 4K |
Custom Size Constraints
gpt-image-2 accepts any valid size that satisfies all of:
- Max edge ≤ 3840px
- Both edges are multiples of 16
- Aspect ratio ≤ 3:1
- Total pixels ∈ [655,360, 8,294,400] (~0.65MP to ~8.3MP)
1600x1200, 1792x1024, 2048x1536, 3200x1800
Invalid examples: 1000x1000 (not multiple of 16), 4000x4000 (over max), 3840x1000 (ratio > 3:1)
Best Practices
Prefer preset sizes
Match quality to scenario
low; daily / final → medium; text, fine textures, print → high.Choose JPEG output
output_format=jpeg + output_compression=85 is faster than PNG and roughly half the size.Lock high for text scenarios
quality=high for signage and poster scenarios.Prepare reference images
Timeout ≥ 360 seconds
quality=high + 2K/4K realistically takes several minutes. Configuring around the “~120s” headline figure causes many false timeouts. Set 360s as a conservative baseline; show progress in the UI; consider a task queue server-side.Errors & Retries
| Status | Meaning | Suggested action |
|---|---|---|
400 | Invalid parameters (size constraint violation, unsupported field, etc.) | Validate against size constraints; do not pass input_fidelity / background: transparent |
401 | Invalid token | Check Bearer Token |
403 | Content moderation block | Adjust prompt or pass moderation: low |
429 | Rate limit / insufficient balance | Exponential backoff |
5xx | Gateway / backend error | Retry 1–2 times |
| Timeout | Long tail | Client timeout ≥ 360 seconds (high + 2K/4K can run 3-5 minutes) |
- Request timeout starts at 360 seconds (conservative baseline;
quality=high+ 2K/4K can take 3-5 minutes, and configuring around the ~120s figure causes many false timeouts) - Exponential backoff for 5xx and timeouts (suggest 2 retries)
- Log
x-request-idheader for support
FAQ
Do I need to add the data:image/png;base64, prefix to b64_json?
Do I need to add the data:image/png;base64, prefix to b64_json?
gpt-image-2 returns a raw base64 string (no prefix), unlike gpt-image-2-all. Two client patterns:- Write file:
base64.b64decode(b64_str)→ write to disk - Browser render:
img.src = 'data:image/png;base64,' + b64_str(prepend manually)
Why does passing input_fidelity return 400?
Why does passing input_fidelity return 400?
gpt-image-2 forces high-fidelity processing of reference images and no longer accepts input_fidelity. When migrating from 1.5, just remove this field — no replacement needed.What if I need a transparent background?
What if I need a transparent background?
gpt-image-2 does not support background: transparent (will error). Two workarounds:- Set
backgroundtoopaque(or omit) and key out transparency yourself with PIL / sharp / online tools - Temporarily fall back to
gpt-image-1.5for scenarios that genuinely need transparency
How many images per call?
How many images per call?
n=1). For N images, issue N parallel requests. Each is independently token-billed.Why is 2K/4K so slow?
Why is 2K/4K so slow?
3840×2160 + quality=high realistically approaches 2 minutes. Recommendations:- Client timeout ≥ 360 seconds (conservative)
- Show “generating” progress in the UI
- Use 1024×1024 / 1536×1024 1K presets when 4K isn’t needed
Why are edit requests more expensive than text-to-image?
Why are edit requests more expensive than text-to-image?
gpt-image-2 auto-enables high-fidelity processing of reference images, the references themselves convert to large input token counts via the Vision pricing rules. Edit input tokens are noticeably higher than text-to-image — budget accordingly.How do I prepare a mask file?
How do I prepare a mask file?
- Same size and format as the original, ≤ 50MB
- Must have alpha channel: transparent (alpha=0) = inpaint area, opaque = preserve
- Only applies to the first image
- Mask is a “soft guide” — the model may extend or contract around the masked region
gpt-image-2 vs gpt-image-2-all: which to pick?
gpt-image-2 vs gpt-image-2-all: which to pick?
| Pick | When |
|---|---|
| gpt-image-2 (Official) | Need precise size/quality control, must match OpenAI official exactly, want 4K output, need mask inpainting |
| gpt-image-2-all (Reverse) | Want flat $0.03/image, ~30s render, minimal parameters, strong consistency / Chinese text |
Can I use the official OpenAI SDK directly?
Can I use the official OpenAI SDK directly?
base_url to https://api.apiyi.com/v1 and set api_key to your APIYI token:Can I cancel a generation in progress?
Can I cancel a generation in progress?
gpt-image-2 uses OpenAI’s official synchronous endpoint — once a request is submitted, it runs to completion with no “cancel” signal. Even if the client disconnects, the server still finishes generation and bills normally. Configure client-side timeouts carefully — do not assume “disconnect = no charge”.Is there a rate limit (RPM)?
Is there a rate limit (RPM)?
Does it support async invocation?
Does it support async invocation?
gpt-image-2 strictly mirrors the OpenAI official API — synchronous only. The request blocks until the result is returned (high + 4K realistically 1–2 minutes). If you need an async queue or callback mechanism:- Wrap it yourself with a task queue (Celery / BullMQ, etc.) at the business layer
- Or use
gpt-image-2-all— generates in ~30s, easier to poll from the front end
Do failed generations get billed?
Do failed generations get billed?
400 error, and no charge is incurred. Typical response:401 (invalid token), 429 (rate limit). Token billing only kicks in once the request actually reaches the model generation stage (i.e., 200 + b64_json received).Related Docs
- ⚖️ Official vs Reverse Comparison - Side-by-side selection guide
- Text-to-Image Playground -
/v1/images/generationsinteractive testing - Image Edit Playground -
/v1/images/editsmulti-image fusion + mask - Deep Dive: gpt-image-2 Launch - News article
- Full Integration Doc - Complete API reference
- GPT-Image-2-All (Reverse-Engineered) - Cheaper, faster alternative
- Community: Luck GPT-Image 2 ComfyUI Nodes - Call
gpt-image-2directly in ComfyUI (mask / 5 reference images / custom sizes) - Community: APIYI GPT-Image 2 Skills - Invoke from Codex CLI / Cursor / Gemini CLI and other AI coding tools with one sentence
- API Manual - General usage guide
gpt-image-2 is OpenAI’s official flagship, billed by token. If you prioritize flat pricing ($0.03/image) and faster generation (~30s), see gpt-image-2-all.