Skip to main content

Short Answer

Concurrency limits vary by model type - text models have the highest concurrency, while image models have moderate controls.
Important NoteConcurrency limits apply to individual models, not your entire account. For example, Nano Banana Pro has 30 concurrent requests, which doesn’t affect other models’ concurrency.

Concurrency Limits by Model Type

Text Models

Default: 50 req/sec
  • ✅ High concurrency support
  • ✅ Suitable for batch processing
  • 🔓 Higher quotas available

Async Video Models

Default: High concurrency
  • ✅ Asynchronous processing
  • ✅ Large-scale call support
  • 📊 Ideal for batch video generation

Image Models

Default: 30 req/sec
  • ⚠️ Concurrency controlled
  • 📦 Base64 large data transfer
  • 🔓 Adjustable upon request

Why Do Image Models Have Concurrency Controls?

Technical ReasonImage generation APIs use Base64 encoding to transfer image data, resulting in large request payloads (typically 500KB-5MB per request). To ensure service stability and response speed, moderate concurrency control is necessary.Example: Nano Banana Pro has a default of 30 concurrent requests, sufficient for most use cases.

How Concurrency is Calculated

Per Individual Model

Concurrency limits apply to each specific model, not your entire account:
ScenarioConcurrency Calculation
Same model callsSubject to that model’s limit (e.g., Nano Banana Pro: 30)
Different model callsEach model calculated independently
Multiple tokensEach token has independent concurrency
Real ExampleIf you simultaneously use:
  • Nano Banana Pro (image): 30 concurrent
  • GPT-4o mini (text): 50 concurrent
  • FLUX.1 Pro (image): 30 concurrent
Total available concurrency: 110+ requests (independent per model)

How to Request Higher Concurrency?

Individual Users

1

Assess Your Needs

Determine your required concurrency level and use case
2

Contact Support

Reach out via WeChat 8765058 to explain your needs
3

Technical Review

We’ll evaluate based on your use case and historical data
4

Quota Adjustment

Upon approval, we’ll adjust concurrency limits for your token

Enterprise Customers

Dedicated Line ServiceEnterprise customers can apply for dedicated line services with:
  • 🚀 Higher Concurrency Quotas: Customized for business needs
  • 🔒 Isolated Resource Pools: Unaffected by public traffic
  • Priority Scheduling: Guaranteed response speed
  • 📞 Dedicated Support: One-on-one service
Contact us to learn about enterprise service plans.

FAQ

Text models have smaller request/response payloads (typically a few KB), while image models transfer Base64-encoded image data (typically 500KB-5MB). Concurrency control ensures overall service quality.
You can check via:
  • Backend console token configuration
  • Rate Limit information in API response headers
  • Contact customer service for specific quotas
When exceeding limits, the API returns 429 Too Many Requests. Recommendations:
  • Implement request queue management
  • Add retry mechanism (exponential backoff)
  • Apply for higher concurrency quota
No. Each token has independent concurrency quotas without interference. For higher total concurrency, create multiple tokens to distribute requests.
Generally, reasonable concurrency adjustments are free of charge. However, extremely high concurrency or dedicated line services may involve enterprise custom plans - please consult customer service for details.

Concurrency Optimization Tips

Use Request Queues

Implement local queue management to control simultaneous requests and avoid limits

Error Retry Mechanism

Use exponential backoff retry strategy when encountering 429 errors

Multiple Token Distribution

Create multiple tokens to distribute requests across different tokens, increasing total concurrency

Prioritize Async Processing

For non-real-time scenarios, prefer async APIs (like video generation)

Contact Us

To request higher concurrency quotas or inquire about enterprise dedicated line services:

Email Support

[email protected]Describe your concurrency needs in detail

WeChat Support

8765058Quick response, real-time communication

Telegram

@apiyicomInstant messaging, efficient answers
Please provide when requesting:
  • 📊 Use Case: Specific application (e.g., e-commerce batch image generation, content moderation)
  • 📈 Expected Concurrency: Required concurrency level
  • 🕐 Peak Hours: Primary usage time periods
  • 📜 Historical Data: Current call volume and frequency
This information helps us provide the most suitable concurrency quota plan for you.