Short Answer
Concurrency limits vary by model type - text models have the highest concurrency, while image models have moderate controls.Important NoteConcurrency limits apply to individual models, not your entire account. For example, Nano Banana Pro has 30 concurrent requests, which doesn’t affect other models’ concurrency.
Concurrency Limits by Model Type
Text Models
Default: 50 req/sec
- ✅ High concurrency support
- ✅ Suitable for batch processing
- 🔓 Higher quotas available
Async Video Models
Default: High concurrency
- ✅ Asynchronous processing
- ✅ Large-scale call support
- 📊 Ideal for batch video generation
Image Models
Default: 30 req/sec
- ⚠️ Concurrency controlled
- 📦 Base64 large data transfer
- 🔓 Adjustable upon request
Why Do Image Models Have Concurrency Controls?
How Concurrency is Calculated
Per Individual Model
Concurrency limits apply to each specific model, not your entire account:| Scenario | Concurrency Calculation |
|---|---|
| Same model calls | Subject to that model’s limit (e.g., Nano Banana Pro: 30) |
| Different model calls | Each model calculated independently |
| Multiple tokens | Each token has independent concurrency |
How to Request Higher Concurrency?
Individual Users
1
Assess Your Needs
Determine your required concurrency level and use case
2
Contact Support
Reach out via WeChat 8765058 to explain your needs
3
Technical Review
We’ll evaluate based on your use case and historical data
4
Quota Adjustment
Upon approval, we’ll adjust concurrency limits for your token
Enterprise Customers
Dedicated Line ServiceEnterprise customers can apply for dedicated line services with:
- 🚀 Higher Concurrency Quotas: Customized for business needs
- 🔒 Isolated Resource Pools: Unaffected by public traffic
- ⚡ Priority Scheduling: Guaranteed response speed
- 📞 Dedicated Support: One-on-one service
FAQ
Why do text models have higher concurrency than image models?
Why do text models have higher concurrency than image models?
Text models have smaller request/response payloads (typically a few KB), while image models transfer Base64-encoded image data (typically 500KB-5MB). Concurrency control ensures overall service quality.
How can I check my current concurrency quota?
How can I check my current concurrency quota?
You can check via:
- Backend console token configuration
- Rate Limit information in API response headers
- Contact customer service for specific quotas
What happens if I exceed concurrency limits?
What happens if I exceed concurrency limits?
When exceeding limits, the API returns
429 Too Many Requests. Recommendations:- Implement request queue management
- Add retry mechanism (exponential backoff)
- Apply for higher concurrency quota
Are concurrency limits shared across different tokens?
Are concurrency limits shared across different tokens?
Is there an extra fee for adjusting concurrency quotas?
Is there an extra fee for adjusting concurrency quotas?
Generally, reasonable concurrency adjustments are free of charge. However, extremely high concurrency or dedicated line services may involve enterprise custom plans - please consult customer service for details.
Concurrency Optimization Tips
Use Request Queues
Implement local queue management to control simultaneous requests and avoid limits
Error Retry Mechanism
Use exponential backoff retry strategy when encountering 429 errors
Multiple Token Distribution
Create multiple tokens to distribute requests across different tokens, increasing total concurrency
Prioritize Async Processing
For non-real-time scenarios, prefer async APIs (like video generation)
Contact Us
To request higher concurrency quotas or inquire about enterprise dedicated line services:Email Support
[email protected]Describe your concurrency needs in detail
WeChat Support
8765058Quick response, real-time communication
Telegram
@apiyicomInstant messaging, efficient answers