What Are the API Concurrency Limits?

Short Answer

Concurrency limits vary by model type - text models have the highest concurrency, while image models have moderate controls.

Important NoteConcurrency limits apply to individual models, not your entire account. For example, Nano Banana Pro has 30 concurrent requests, which doesn’t affect other models’ concurrency.

Concurrency Limits by Model Type

Text Models

Default: 50 req/sec

✅ High concurrency support
✅ Suitable for batch processing
🔓 Higher quotas available

Async Video Models

Default: High concurrency

✅ Asynchronous processing
✅ Large-scale call support
📊 Ideal for batch video generation

Image Models

Default: 30 req/sec

⚠️ Concurrency controlled
📦 Base64 large data transfer
🔓 Adjustable upon request

Why Do Image Models Have Concurrency Controls?

Technical ReasonImage generation APIs use Base64 encoding to transfer image data, resulting in large request payloads (typically 500KB-5MB per request). To ensure service stability and response speed, moderate concurrency control is necessary.Example: Nano Banana Pro has a default of 30 concurrent requests, sufficient for most use cases.

How Concurrency is Calculated

Per Individual Model

Concurrency limits apply to each specific model, not your entire account:

Scenario	Concurrency Calculation
Same model calls	Subject to that model’s limit (e.g., Nano Banana Pro: 30)
Different model calls	Each model calculated independently
Multiple tokens	Each token has independent concurrency

Real ExampleIf you simultaneously use:

Nano Banana Pro (image): 30 concurrent
GPT-4o mini (text): 50 concurrent
FLUX.1 Pro (image): 30 concurrent

Total available concurrency: 110+ requests (independent per model)

How to Request Higher Concurrency?

Individual Users

Assess Your Needs

Determine your required concurrency level and use case

Contact Support

Reach out via WeChat 8765058 to explain your needs

Technical Review

We’ll evaluate based on your use case and historical data

Quota Adjustment

Upon approval, we’ll adjust concurrency limits for your token

Enterprise Customers

Dedicated Line ServiceEnterprise customers can apply for dedicated line services with:

🚀 Higher Concurrency Quotas: Customized for business needs
🔒 Isolated Resource Pools: Unaffected by public traffic
⚡ Priority Scheduling: Guaranteed response speed
📞 Dedicated Support: One-on-one service

FAQ

Why do text models have higher concurrency than image models?

Text models have smaller request/response payloads (typically a few KB), while image models transfer Base64-encoded image data (typically 500KB-5MB). Concurrency control ensures overall service quality.

How can I check my current concurrency quota?

You can check via:

Backend console token configuration
Rate Limit information in API response headers
Contact customer service for specific quotas

What happens if I exceed concurrency limits?

When exceeding limits, the API returns 429 Too Many Requests. Recommendations:

Implement request queue management
Add retry mechanism (exponential backoff)
Apply for higher concurrency quota

Are concurrency limits shared across different tokens?

No. Each token has independent concurrency quotas without interference. For higher total concurrency, create multiple tokens to distribute requests.

Is there an extra fee for adjusting concurrency quotas?

Generally, reasonable concurrency adjustments are free of charge. However, extremely high concurrency or dedicated line services may involve enterprise custom plans - please consult customer service for details.

Concurrency Optimization Tips

Use Request Queues

Implement local queue management to control simultaneous requests and avoid limits

Error Retry Mechanism

Use exponential backoff retry strategy when encountering 429 errors

Multiple Token Distribution

Create multiple tokens to distribute requests across different tokens, increasing total concurrency

Prioritize Async Processing

For non-real-time scenarios, prefer async APIs (like video generation)

Contact Us

To request higher concurrency quotas or inquire about enterprise dedicated line services:

Email Support

[email protected]Describe your concurrency needs in detail

WeChat Support

8765058Quick response, real-time communication

@apiyicomInstant messaging, efficient answers

Please provide when requesting:

📊 Use Case: Specific application (e.g., e-commerce batch image generation, content moderation)
📈 Expected Concurrency: Required concurrency level
🕐 Peak Hours: Primary usage time periods
📜 Historical Data: Current call volume and frequency

This information helps us provide the most suitable concurrency quota plan for you.

Models & API

Tokens & Logs

Billing & Security

Network & Connection

Account & Login

What Are the API Concurrency Limits?

Short Answer

Concurrency Limits by Model Type

Text Models

Async Video Models

Image Models

Why Do Image Models Have Concurrency Controls?

How Concurrency is Calculated

Per Individual Model

How to Request Higher Concurrency?

Individual Users

Enterprise Customers

FAQ

Concurrency Optimization Tips

Use Request Queues

Error Retry Mechanism

Multiple Token Distribution

Prioritize Async Processing

Contact Us

Email Support

WeChat Support

Telegram

Models & API

Tokens & Logs

Billing & Security

Network & Connection

Account & Login

​Short Answer

​Concurrency Limits by Model Type

Text Models

Async Video Models

Image Models

​Why Do Image Models Have Concurrency Controls?

​How Concurrency is Calculated

​Per Individual Model

​How to Request Higher Concurrency?

​Individual Users

​Enterprise Customers

​FAQ

​Concurrency Optimization Tips

Use Request Queues

Error Retry Mechanism

Multiple Token Distribution

Prioritize Async Processing

​Contact Us

Email Support

WeChat Support

Telegram

Short Answer

Concurrency Limits by Model Type

Why Do Image Models Have Concurrency Controls?

How Concurrency is Calculated

Per Individual Model

How to Request Higher Concurrency?

Individual Users

Enterprise Customers

FAQ

Concurrency Optimization Tips

Contact Us