Native Tool Image Generation

Overview

Beyond the standalone text-to-image / image-edit endpoints, APIYI also supports the OpenAI Responses API’s native image_generation tool: the main model gpt-5.5 decides on its own when to draw, internally selects a GPT Image model, and returns the image as base64 in the response output array.

Verified working (2026-06-17): gpt-5.5 + POST /v1/responses + tools: [{"type": "image_generation"}] returns a valid base64 PNG. Both image paths route directly through OpenAI’s official upstream.

Which should you use? For the vast majority of “I just want an image” cases, prefer the standalone /v1/images/generations endpoint — it bills purely by actual usage, which is cheaper and more controllable. Only use the native tool method on this page when your pipeline must go through Responses (e.g. letting gpt-5.5 autonomously decide whether to draw inside an Agent conversation). It adds a fixed tool-call fee of roughly $0.20 per image.

Comparison of the two methods

Aspect	Native tool method (this page)	images API
Channel	Direct OpenAI official forwarding	Direct OpenAI official forwarding
Endpoint	`/v1/responses`	`/v1/images/generations`, `/v1/images/edits`
Tool	`image_generation`	None (pass prompt directly)
Billing	Usage-based + tool-call fee	Usage-based
Billing detail	Text/image input-output priced same as official; fixed tool-call fee ≈ $0.20/call	Text/image input-output priced same as official
Best for	Cases that require Responses (e.g. Agent autonomy)	Most image scenarios — more reasonable billing

Core difference: the native tool method adds a fixed ≈$0.20 tool fee per image, while the images API bills purely by actual usage — so it is cheaper in most cases.

Minimal request

cURL

curl https://api.apiyi.com/v1/responses \
  -H "Authorization: Bearer $APIYI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "Generate an image of a gray tabby cat hugging an otter with an orange scarf",
    "tools": [
      { "type": "image_generation" }
    ]
  }'

Python (requests)

import base64, requests

resp = requests.post(
    "https://api.apiyi.com/v1/responses",
    headers={
        "Authorization": "Bearer $APIYI_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-5.5",
        "input": "Generate an image of a gray tabby cat hugging an otter with an orange scarf",
        "tools": [{"type": "image_generation"}],
    },
    timeout=300,           # Generation is slow; allow plenty of timeout (~60-90s per image)
)
data = resp.json()

# Pull the image tool result out of the output array
for item in data["output"]:
    if item.get("type") == "image_generation_call":
        raw = base64.b64decode(item["result"])   # result field is a base64 image
        with open("output.png", "wb") as f:
            f.write(raw)
        print("Saved output.png,", len(raw), "bytes")

Optional parameters go inside the tools item: {"type": "image_generation", "output_format": "png|jpeg|webp", "size": "1024x1024", ...}. Omit them to use the defaults (png).

Response structure (key fields)

On success (HTTP 200), the response body contains:

{
  "id": "resp_...",
  "model": "gpt-5.5-2026-04-23",
  "status": "completed",
  "output": [
    {
      "type": "image_generation_call",   // <- key: the tool actually fired
      "result": "<a very long base64 PNG string>"  // <- the image itself, base64, png by default
    },
    { "type": "message", "content": [ /* may be empty; image responses don't always include text */ ] }
  ],
  "usage": { "input_tokens": 2347, "output_tokens": 74 }
}

How to tell whether an image was actually produced:

✅ Success: output contains type="image_generation_call", and result decodes to a valid image starting with \x89PNG.
⚠️ Silently stripped: HTTP 200 but no image_generation_call in output, only text (common when a channel doesn’t support the tool).
❌ Error: non-200, or returns unknown tool / no available channels, etc. For the latter two, fall back to /v1/images/generations.

💰 Billing

Take one real call as an example (input 2347 tokens, output 74 tokens, generating one 1122×1402 PNG). The final charge = $0.213954, which is correct. Breakdown:

Component	Quota calculation	USD
Text portion	`(input 2347 + output 74×completion multiplier 6) × input multiplier 2.5` = 6977.5 quota	≈ $0.014
Image tool portion	≈ 100,000 quota (per image, independent of tokens)	≈ $0.20 / image
Total	106,977 quota	$0.213954

Conversion: 500,000 quota = \$1 (derived from 106977 quota = \$0.213954).

A display quirk in the console detail page (explain this to customers proactively)On APIYI’s “conditional billing detail” page:

The top section only shows the text portion of the math (base cost = (2347 + 74×6) × 2.5 = 6977.50);
The image tool-call charge (≈100,000 quota / ≈$0.20) shows up as a blank row in the detail list — it isn’t rendered;
but it is correctly counted in the bottom-line “final quota 106977 / $0.213954”.

Conclusion: billing is normal and accurate — the detail UI simply fails to display the “image tool” row, so the line items don’t sum to the final total. When explaining to customers, emphasize: the total is correct; the difference is this image’s tool fee (≈$0.20/image), just not itemized separately.

Cost notes

The generation fee is fixed per image (≈$0.20/image) and does not vary with prompt length; the text token cost is small by comparison.
Each image takes ~60-90s; set a client timeout of ≥300s.
If you only need an image and don’t need the model to decide autonomously, the standalone /v1/images/generations endpoint is likely cheaper and more controllable.

Troubleshooting

Symptom	Likely cause	Fix
200 but no `image_generation_call`	Current channel doesn’t support the tool (silently stripped)	Switch key/channel, or use `/v1/images/generations`
`no available channels`	No matching channel under the key’s group	Switch to a key group with GPT/image channels
Request timeout	Generation is slow	Set client timeout to 300s
`result` doesn’t decode to PNG	Output format changed / channel anomaly	Check `output_format`, verify the magic bytes

GPT-Image-2 Overview - Model overview and pricing
Text-to-Image API Reference - /v1/images/generations, the default choice for most cases
Image Edit API Reference - /v1/images/edits, reference-image edits / multi-image fusion / mask

Basics

Basic API

Image API (Official)

Video API (Official)

Multimodal Understanding API

Text Generation

Retrieval & Safety

Overview

Comparison of the two methods

Minimal request

cURL

Python (requests)

Response structure (key fields)

💰 Billing

Cost notes

Troubleshooting

​Overview

​Comparison of the two methods

​Minimal request

​cURL

​Python (requests)

​Response structure (key fields)

​💰 Billing

​Cost notes

​Troubleshooting

​Related docs

Overview

Comparison of the two methods

Minimal request

cURL

Python (requests)

Response structure (key fields)

💰 Billing

Cost notes

Troubleshooting

Related docs