Short Answer
APIYI currently does not support cache billing. This is because APIYI uses a distributed account pool relay station model, where requests are distributed across multiple upstream accounts, while caching is account-specific and cannot be shared across accounts.Why Doesn’t APIYI Support Caching?
How the Relay Station Works
As an AI model relay platform, APIYI uses the following architecture to improve concurrency and service stability:Account Pool Mechanism
Multiple Upstream Account PoolsAPIYI maintains multiple upstream accounts (OpenAI, Claude, etc.), intelligently distributing requests across different accounts
Load Balancing
Dynamic Request DistributionEach API call may be assigned to a different upstream account, improving concurrent processing capacity
How Caching Works
Large language model caching mechanisms (like Prompt Caching) are account-specific:First Request
User sends request through Account A, upstream API (like OpenAI) caches the prompt to Account A’s cache space
Why Can’t APIYI Support Caching?
Core Issue: The relay station’s account pool mechanism conflicts with cache’s account-binding characteristics
What If I Need Caching Features?
Option 1: Use Official Direct API (Recommended)
If your business specifically requires caching (e.g., long context, repeated prompts), we recommend:Official Direct API
Use Official Website APIs
- Use OpenAI, Claude, DeepSeek, etc. official APIs directly
- Ensures all requests use the same account
- Can properly benefit from cache billing discounts
- Overseas credit card payment
- Network access restrictions
- Account registration barriers
Option 2: Evaluate Cache Benefits
Before switching to official APIs, evaluate cache benefits:Which scenarios have significant cache benefits?
Which scenarios have significant cache benefits?
High-Benefit Scenarios:
- 📄 Long System Prompts: If your system prompt is lengthy (thousands of tokens) and reused in every request
- 📚 Long Context RAG: Retrieval-Augmented Generation (RAG) scenarios with large document content in each request
- 🔁 Repeated Calls: Frequently calling identical or similar prompts in short timeframes
- 💬 Multi-turn Conversations: Long conversation history passed repeatedly
- 💬 Short Prompts: Very short system prompts (dozens of tokens)
- 🔀 Diverse Requests: Each request has different prompts
- ⏰ Infrequent Calls: Long intervals between requests (cache may expire)
How to calculate cache savings?
How to calculate cache savings?
Cache Savings Formula:Example (Claude Sonnet 4):
Evaluation Recommendations:
| Scenario | Normal Input Price | Cache Input Price | Savings |
|---|---|---|---|
| Claude Sonnet 4 | $3/M tokens | $0.30/M tokens | 90% |
| System prompt 5000 tokens | $0.015 | $0.0015 | Save $0.0135 |
| 1000 calls/day | $15/day | $1.5/day | Monthly save $405 |
- Switch if monthly savings exceed official API’s additional costs and operational overhead
- Continue using APIYI if monthly savings less than $50 (no payment/network hassles)
What are APIYI's advantages over official APIs?
What are APIYI's advantages over official APIs?
APIYI Advantages (without caching):✅ Convenient Payment:
- Supports Alipay, WeChat Pay
- RMB pricing (1:7 favorable exchange rate)
- No overseas credit card needed
- First-time + tiered bonuses (10%-20%)
- Overall discount up to 20% off official prices
- Domestic direct connection, no proxy needed
- China-optimized premium network, fast speeds
- 200+ models with unified API format
- One-click model switching
- OpenAI SDK compatible
- Account pool improves concurrency
- Automatic failover switching
- Professional technical support
Option 3: Hybrid Approach
Choose flexibly based on business scenarios:Cache-Sensitive Scenarios
Use Official Direct API
- Long context RAG
- Fixed system prompts
- Multi-turn conversation apps
General Call Scenarios
Use APIYI
- Short prompt tasks
- Diverse requests
- Infrequent call scenarios
Models Supporting Caching
The following model official APIs support cache billing (for reference):| Model Provider | Cache Feature Name | Savings | Official Docs |
|---|---|---|---|
| Claude | Prompt Caching | 90% | docs.anthropic.com/en/docs/build-with-claude/prompt-caching |
| DeepSeek | Cache Prefix | 95% | api-docs.deepseek.com/quick_start/pricing |
| Kimi | Context Caching | 85% | platform.moonshot.cn/docs/pricing |
| Gemini | Context Caching | 75% | ai.google.dev/gemini-api/docs/caching |
Note: Above documentation links are in plain text format. Please copy manually to browser to access.
Frequently Asked Questions
Why does the relay station use an account pool mechanism?
Why does the relay station use an account pool mechanism?
Advantages of Account Pool:
- Improved Concurrency: Single accounts have API rate limits (like OpenAI’s RPM/TPM), multiple accounts can exceed single account limits
- Enhanced Stability: When one account has issues, automatically switch to others, avoiding service interruptions
- Cost Optimization: Different accounts may have different pricing or quotas, flexible scheduling reduces costs
- Risk Mitigation: Distributing requests across multiple accounts reduces risk of single account throttling or banning
Can you bind my API Key to a fixed upstream account?
Can you bind my API Key to a fixed upstream account?
Currently not supported.Reasons:
- Binding to fixed accounts loses account pool advantages (concurrency, stability)
- Single account rate limits may not meet your concurrency needs
- Technically complex and increases operational costs
Will APIYI support caching in the future?
Will APIYI support caching in the future?
We understand caching’s importance for certain business scenarios.Technical Challenges:
- Need to completely change account pool allocation mechanism
- Need to track each user’s cache state
- Need to ensure consecutive requests use same upstream account
- Provide “fixed account mode” option (optional feature)
- Users can choose whether to enable caching (sacrificing some concurrency)
How do I determine if my business needs caching?
How do I determine if my business needs caching?
Typical Signals Needing Caching:✅ Your system prompt exceeds 5000 tokens
✅ Each request includes large amounts of repeated context (like RAG documents)
✅ Daily call count exceeds 1000 times
✅ Calculated monthly cache savings exceeds $50Typical Signals Not Needing Caching:❌ System prompt under 1000 tokens
❌ Request content is diverse, rarely repeated
❌ Call frequency is low (under 100 times per day)
❌ More concerned with payment convenience and network stabilityEvaluation Method:
- Review your current API call logs
- Calculate average input tokens per request
- Calculate cacheable portions (like system prompts, fixed context)
- Use above formula to calculate potential savings
Related Documentation
Top-up Promotions
Learn about APIYI’s top-up bonuses, enjoy 20% off without caching
Model Selection Guide
Learn how to choose the right model, optimize costs and performance
API Concurrency Limits
Learn about APIYI’s concurrency capabilities and rate limits
Call Log Query
View your API call logs, analyze token consumption
Summary
APIYI does not support cache billing because:- ✅ Relay station uses account pool mechanism to improve concurrency and stability
- ❌ Caching is account-bound, cannot hit across accounts
- Option 1: Use official direct API (suitable for high-frequency, long context scenarios)
- Option 2: Evaluate cache benefits, weigh costs (consider switching if monthly savings $50+)
- Option 3: Hybrid approach (official API for cache scenarios, APIYI for others)
- 💰 Top-up bonuses from 20% off
- 💳 Convenient payment (Alipay/WeChat)
- 🌐 Domestic direct connection, no proxy needed
- 🚀 200+ models unified interface
Contact Us
Enterprise WeChat
Email Inquiry
Customer Service: [email protected]Business Cooperation: [email protected]
