Overview
Beyond the standalone text-to-image / image-edit endpoints, APIYI also supports the OpenAI Responses API’s native image_generation tool: the main model gpt-5.5 decides on its own when to draw, internally selects a GPT Image model, and returns the image as base64 in the response output array.
Verified working (2026-06-17): gpt-5.5 + POST /v1/responses + tools: [{"type": "image_generation"}] returns a valid base64 PNG. Both image paths route directly through OpenAI’s official upstream.
Which should you use? For the vast majority of “I just want an image” cases, prefer the standalone /v1/images/generations endpoint — it bills purely by actual usage, which is cheaper and more controllable. Only use the native tool method on this page when your pipeline must go through Responses (e.g. letting gpt-5.5 autonomously decide whether to draw inside an Agent conversation). It adds a fixed tool-call fee of roughly $0.20 per image.
Comparison of the two methods
| Aspect | Native tool method (this page) | images API |
|---|
| Channel | Direct OpenAI official forwarding | Direct OpenAI official forwarding |
| Endpoint | /v1/responses | /v1/images/generations, /v1/images/edits |
| Tool | image_generation | None (pass prompt directly) |
| Billing | Usage-based + tool-call fee | Usage-based |
| Billing detail | Text/image input-output priced same as official; fixed tool-call fee ≈ $0.20/call | Text/image input-output priced same as official |
| Best for | Cases that require Responses (e.g. Agent autonomy) | Most image scenarios — more reasonable billing |
Core difference: the native tool method adds a fixed ≈$0.20 tool fee per image, while the images API bills purely by actual usage — so it is cheaper in most cases.
Minimal request
cURL
curl https://api.apiyi.com/v1/responses \
-H "Authorization: Bearer $APIYI_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"input": "Generate an image of a gray tabby cat hugging an otter with an orange scarf",
"tools": [
{ "type": "image_generation" }
]
}'
Python (requests)
import base64, requests
resp = requests.post(
"https://api.apiyi.com/v1/responses",
headers={
"Authorization": "Bearer $APIYI_KEY",
"Content-Type": "application/json",
},
json={
"model": "gpt-5.5",
"input": "Generate an image of a gray tabby cat hugging an otter with an orange scarf",
"tools": [{"type": "image_generation"}],
},
timeout=300, # Generation is slow; allow plenty of timeout (~60-90s per image)
)
data = resp.json()
# Pull the image tool result out of the output array
for item in data["output"]:
if item.get("type") == "image_generation_call":
raw = base64.b64decode(item["result"]) # result field is a base64 image
with open("output.png", "wb") as f:
f.write(raw)
print("Saved output.png,", len(raw), "bytes")
Optional parameters go inside the tools item: {"type": "image_generation", "output_format": "png|jpeg|webp", "size": "1024x1024", ...}. Omit them to use the defaults (png).
Response structure (key fields)
On success (HTTP 200), the response body contains:
{
"id": "resp_...",
"model": "gpt-5.5-2026-04-23",
"status": "completed",
"output": [
{
"type": "image_generation_call", // <- key: the tool actually fired
"result": "<a very long base64 PNG string>" // <- the image itself, base64, png by default
},
{ "type": "message", "content": [ /* may be empty; image responses don't always include text */ ] }
],
"usage": { "input_tokens": 2347, "output_tokens": 74 }
}
How to tell whether an image was actually produced:
- ✅ Success:
output contains type="image_generation_call", and result decodes to a valid image starting with \x89PNG.
- ⚠️ Silently stripped: HTTP 200 but no
image_generation_call in output, only text (common when a channel doesn’t support the tool).
- ❌ Error: non-200, or returns
unknown tool / no available channels, etc. For the latter two, fall back to /v1/images/generations.
💰 Billing
Take one real call as an example (input 2347 tokens, output 74 tokens, generating one 1122×1402 PNG). The final charge = $0.213954, which is correct. Breakdown:
| Component | Quota calculation | USD |
|---|
| Text portion | (input 2347 + output 74×completion multiplier 6) × input multiplier 2.5 = 6977.5 quota | ≈ $0.014 |
| Image tool portion | ≈ 100,000 quota (per image, independent of tokens) | ≈ $0.20 / image |
| Total | 106,977 quota | $0.213954 |
Conversion: 500,000 quota = \$1 (derived from 106977 quota = \$0.213954).
A display quirk in the console detail page (explain this to customers proactively)On APIYI’s “conditional billing detail” page:
- The top section only shows the text portion of the math (
base cost = (2347 + 74×6) × 2.5 = 6977.50);
- The image tool-call charge (≈100,000 quota / ≈$0.20) shows up as a blank row in the detail list — it isn’t rendered;
- but it is correctly counted in the bottom-line “final quota 106977 / $0.213954”.
Conclusion: billing is normal and accurate — the detail UI simply fails to display the “image tool” row, so the line items don’t sum to the final total. When explaining to customers, emphasize: the total is correct; the difference is this image’s tool fee (≈$0.20/image), just not itemized separately.
Cost notes
- The generation fee is fixed per image (≈$0.20/image) and does not vary with prompt length; the text token cost is small by comparison.
- Each image takes ~60-90s; set a client timeout of ≥300s.
- If you only need an image and don’t need the model to decide autonomously, the standalone
/v1/images/generations endpoint is likely cheaper and more controllable.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|
200 but no image_generation_call | Current channel doesn’t support the tool (silently stripped) | Switch key/channel, or use /v1/images/generations |
no available channels | No matching channel under the key’s group | Switch to a key group with GPT/image channels |
| Request timeout | Generation is slow | Set client timeout to 300s |
result doesn’t decode to PNG | Output format changed / channel anomaly | Check output_format, verify the magic bytes |