A one-stop guide to model selection, billing, endpoints, development formats, and FAQs for the Nano Banana series (Pro / 2 / Gen 1), helping developers get started with the Gemini image generation API.
Follow the source image ratio: simply omit aspectRatio; in multi-image editing scenarios, the size of the last image takes precedence
Resolution imageSize: supports 1K / 2K / 4K
Nano Banana (Gen 1) supports 1K only
Nano Banana 2 adds 512px
When using the same code to call the first-gen gemini-2.5-flash-image, you must remove the imageSize parameter (it does not support 2K / 4K), otherwise the call will fail.
A single image cannot exceed 7MB (Google’s rule); if imported via Google Cloud Storage, the per-file limit is 30MB
Up to 14 images per prompt
Supported MIME types: image/png, image/jpeg, image/webp, image/heic, image/heif (the jpg format is already supported by API易)
Base64 size inflation: converting an image to Base64 increases its size by about 33.3% (a 7MB image becomes about 9.3MB)
API易 limit: the total volume of images uploaded in a single request must be under 100MB — all calls are synchronous, and oversized payloads can cause memory blowup
Best practice: apply lossless compression to images before sending them to the API, to avoid oversized resolutions slowing down requests.Google official spec reference (please copy and visit it yourself): docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro-image
In addition to Base64, the Gemini native endpoint also supports passing image URLs (image hosts / OSS addresses) directly via fileData.fileUri, eliminating the need for local encoding.
URL upload has strict requirements on image hosts and OSS addresses: if the address is not on a global CDN (e.g., Tencent Cloud Object Storage defaults to a China-only CDN), Google’s servers very likely cannot reach the image, causing the request to fail (typical symptom: the image is not referenced in the output).If possible, prefer Base64 upload for better stability — from the platform’s perspective, this is the most operationally invested, most reliable path.
URL upload only works on the Gemini native endpoint; OpenAI-compatible mode does not support URL upload and requires Base64.
#!/usr/bin/env python3# -*- coding: utf-8 -*-"""Gemini 3 Pro Image - Image editing (minimal file_uri version)Purpose: only for a quick check that the endpoint works"""import requestsimport base64import jsonfrom pathlib import Pathfrom datetime import datetime# ============================================================================# Configuration# ============================================================================API_KEY = "sk-"API_URL = "https://api.apiyi.com/v1beta/models/gemini-3-pro-image-preview:generateContent"# Image URLIMAGE_URL = "https://raw.githubusercontent.com/apiyi-api/ai-pics/refs/heads/main/1762260696217_dd0352c1f9604540.png"IMAGE_MIME_TYPE = "image/png"# Edit instructionsEDIT_PROMPT = "Change the person's clothes to a blue jacket and hair to a purple gradient; keep pose, gaze direction, and other structural features unchanged."SYSTEM_PROMPT = "You are a professional expert in image description and generation. Your task is to produce high-quality image prompts with rich detail and a clear artistic style, or to make accurate, creative edits to existing images, based on the user's request."# Output parametersASPECT_RATIO = "9:16"RESOLUTION = "4K"MAX_OUTPUT_TOKENS = 8000OUTPUT_FILE = f"minimal_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png"# ============================================================================# Core# ============================================================================def main(): print("=" * 60) print("Testing file_uri endpoint") print("=" * 60) print(f"Image URL: {IMAGE_URL[:80]}...") print(f"Edit prompt: {EDIT_PROMPT}") print(f"Output params: {RESOLUTION}, {ASPECT_RATIO}") print("-" * 60) # Build the request body # Note: fileData, mimeType, fileUri must be in camelCase payload = { "generationConfig": { "responseModalities": ["IMAGE", "TEXT"], "imageConfig": { "imageSize": RESOLUTION, "aspectRatio": ASPECT_RATIO }, "maxOutputTokens": MAX_OUTPUT_TOKENS }, "contents": [ { "role": "model", "parts": [{"text": SYSTEM_PROMPT}] }, { "role": "user", "parts": [ { "fileData": { # camelCase: fileData (not file_data) "mimeType": IMAGE_MIME_TYPE, # camelCase: mimeType "fileUri": IMAGE_URL # camelCase: fileUri } }, {"text": EDIT_PROMPT} ] } ] } # Send the request print("\nSending request...") try: response = requests.post( API_URL, json=payload, headers={ "Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}" }, timeout=300 ) print(f"Response status: {response.status_code}") if response.status_code != 200: print(f"❌ Error: {response.text}") return # Parse the response data = response.json() print("✅ Response received") # Save full response for debugging with open(OUTPUT_FILE + ".response.json", "w", encoding="utf-8") as f: json.dump(data, f, indent=2, ensure_ascii=False) print(f"📄 Response saved: {OUTPUT_FILE}.response.json") # Extract and print text parts = data["candidates"][0]["content"]["parts"] for part in parts: if "text" in part: print(f"\n💬 Text response: {part['text']}") # Save image for part in parts: if "inlineData" in part or "inline_data" in part: image_data = part.get("inlineData", part.get("inline_data", {})).get("data") if image_data: image_bytes = base64.b64decode(image_data) with open(OUTPUT_FILE, "wb") as f: f.write(image_bytes) print(f"\n✅ Image saved: {OUTPUT_FILE}") print(f"📦 File size: {len(image_bytes) / 1024:.1f} KB") print(f"🔗 File path: {Path(OUTPUT_FILE).resolve()}") return print("⚠️ No image data found in the response") except requests.Timeout: print("❌ Request timed out") except Exception as e: print(f"❌ Error: {e}")if __name__ == "__main__": main() print("\n" + "=" * 60) print("Test finished") print("=" * 60)
fileData, mimeType, and fileUri must be in camelCase (not file_data / file_uri); otherwise the parameters are ignored and the image will not be referenced.
Synchronous call duration: Pro / 2 at 4K take a reasonable generation time of approx. 30–150s
Disconnecting on timeout still incurs charges: for example, if generation takes 120s but the client sets the timeout to 100s and disconnects, you are still charged
429 / 503 are not charged: failed requests are not billed (we try not to keep customers waiting or stuck without an image)
Content-safety refusals still incur charges: when a customer’s input has content-safety issues and Google refuses to generate the image, a status code 200 is still charged — see error handling and the guarantee plan below
4K image generation takes longer overall, involving stages such as image upload, API processing, and Base64 image download (our backend bills by API processing time). Under normal conditions, 4K takes about 50s (excluding polling), but if the client sets the timeout too short, it will disconnect prematurely before generation completes and report an error:
API Connection Error: HTTPSConnectionPool(host='api.apiyi.com', port=443): Read timed out. (read timeout=120)
To be safer, we recommend setting the timeout by resolution:
Want to upload images via URL? The Gemini native endpoint supports passing an image URL through fileData.fileUri; however, OpenAI-compatible mode does not support URL upload, so use Base64 instead. See the code examples and caveats in URL Image Input above.
Want to get a download URL directly (instead of Base64)? Use the NB-OSS group — see Nano Banana OSS Group.