跳转到主要内容

Short Answer

APIYI currently does not support cache billing. This is because APIYI uses a distributed account pool relay station model, where requests are distributed across multiple upstream accounts, while caching is account-specific and cannot be shared across accounts.
Important Notice: If your business heavily relies on caching features (such as context caching for DeepSeek, Kimi, etc.), we recommend using the official API directly.

Why Doesn’t APIYI Support Caching?

How the Relay Station Works

As an AI model relay platform, APIYI uses the following architecture to improve concurrency and service stability:

Account Pool Mechanism

Multiple Upstream Account PoolsAPIYI maintains multiple upstream accounts (OpenAI, Claude, etc.), intelligently distributing requests across different accounts

Load Balancing

Dynamic Request DistributionEach API call may be assigned to a different upstream account, improving concurrent processing capacity

How Caching Works

Large language model caching mechanisms (like Prompt Caching) are account-specific:
1

First Request

User sends request through Account A, upstream API (like OpenAI) caches the prompt to Account A’s cache space
2

Cache Billing

Upstream API bills Account A for caching (usually 50%-90% cheaper than normal input)
3

Subsequent Requests

If subsequent requests still use Account A, cache hits occur and discounted cache pricing applies

Why Can’t APIYI Support Caching?

Core Issue: The relay station’s account pool mechanism conflicts with cache’s account-binding characteristics
Scenario Example:
1st Request:
- User request → APIYI → Assigned to Upstream Account A
- Upstream Account A caches the prompt, charged $0.10

2nd Request (same prompt):
- User request → APIYI → Assigned to Upstream Account B
- Upstream Account B has no cache, needs reprocessing, charged $1.00 (no cache discount)

Result: Cache miss, user cannot benefit from cache pricing
Why Cache Fails:
  • Cache is account-specific, not user or API Key specific
  • APIYI backend has multiple accounts distributing requests, cannot guarantee consecutive requests use the same upstream account
  • Even if first request establishes cache on Account A, second request may be assigned to Account B, causing cache miss

What If I Need Caching Features?

If your business specifically requires caching (e.g., long context, repeated prompts), we recommend:

Official Direct API

Use Official Website APIs
  • Use OpenAI, Claude, DeepSeek, etc. official APIs directly
  • Ensures all requests use the same account
  • Can properly benefit from cache billing discounts
Note: Official APIs require handling:
  • Overseas credit card payment
  • Network access restrictions
  • Account registration barriers

Option 2: Evaluate Cache Benefits

Before switching to official APIs, evaluate cache benefits:
High-Benefit Scenarios:
  • 📄 Long System Prompts: If your system prompt is lengthy (thousands of tokens) and reused in every request
  • 📚 Long Context RAG: Retrieval-Augmented Generation (RAG) scenarios with large document content in each request
  • 🔁 Repeated Calls: Frequently calling identical or similar prompts in short timeframes
  • 💬 Multi-turn Conversations: Long conversation history passed repeatedly
Low-Benefit Scenarios:
  • 💬 Short Prompts: Very short system prompts (dozens of tokens)
  • 🔀 Diverse Requests: Each request has different prompts
  • Infrequent Calls: Long intervals between requests (cache may expire)
Cache Savings Formula:
Per-Request Savings = (Normal Input Price - Cache Input Price) × Cached Tokens Count

Monthly Savings = Per-Request Savings × Cache Hit Count × 30 Days
Example (Claude Sonnet 4):
ScenarioNormal Input PriceCache Input PriceSavings
Claude Sonnet 4$3/M tokens$0.30/M tokens90%
System prompt 5000 tokens$0.015$0.0015Save $0.0135
1000 calls/day$15/day$1.5/dayMonthly save $405
Evaluation Recommendations:
  • Switch if monthly savings exceed official API’s additional costs and operational overhead
  • Continue using APIYI if monthly savings less than $50 (no payment/network hassles)
APIYI Advantages (without caching):Convenient Payment:
  • Supports Alipay, WeChat Pay
  • RMB pricing (1:7 favorable exchange rate)
  • No overseas credit card needed
Top-up Bonuses:
  • First-time + tiered bonuses (10%-20%)
  • Overall discount up to 20% off official prices
No Network Restrictions:
  • Domestic direct connection, no proxy needed
  • China-optimized premium network, fast speeds
Unified Interface:
  • 200+ models with unified API format
  • One-click model switching
  • OpenAI SDK compatible
Stable & Reliable:
  • Account pool improves concurrency
  • Automatic failover switching
  • Professional technical support
See Top-up Promotions for details

Option 3: Hybrid Approach

Choose flexibly based on business scenarios:

Cache-Sensitive Scenarios

Use Official Direct API
  • Long context RAG
  • Fixed system prompts
  • Multi-turn conversation apps

General Call Scenarios

Use APIYI
  • Short prompt tasks
  • Diverse requests
  • Infrequent call scenarios

Models Supporting Caching

The following model official APIs support cache billing (for reference):
Model ProviderCache Feature NameSavingsOfficial Docs
ClaudePrompt Caching90%docs.anthropic.com/en/docs/build-with-claude/prompt-caching
DeepSeekCache Prefix95%api-docs.deepseek.com/quick_start/pricing
KimiContext Caching85%platform.moonshot.cn/docs/pricing
GeminiContext Caching75%ai.google.dev/gemini-api/docs/caching
Note: Above documentation links are in plain text format. Please copy manually to browser to access.

Frequently Asked Questions

Advantages of Account Pool:
  1. Improved Concurrency: Single accounts have API rate limits (like OpenAI’s RPM/TPM), multiple accounts can exceed single account limits
  2. Enhanced Stability: When one account has issues, automatically switch to others, avoiding service interruptions
  3. Cost Optimization: Different accounts may have different pricing or quotas, flexible scheduling reduces costs
  4. Risk Mitigation: Distributing requests across multiple accounts reduces risk of single account throttling or banning
This is the core competitiveness of relay platforms and the foundation for APIYI’s high concurrency and stable service.
Currently not supported.Reasons:
  • Binding to fixed accounts loses account pool advantages (concurrency, stability)
  • Single account rate limits may not meet your concurrency needs
  • Technically complex and increases operational costs
If you truly need a fixed account (like for caching), we recommend using official APIs directly.
We understand caching’s importance for certain business scenarios.Technical Challenges:
  • Need to completely change account pool allocation mechanism
  • Need to track each user’s cache state
  • Need to ensure consecutive requests use same upstream account
Possible Solutions:
  • Provide “fixed account mode” option (optional feature)
  • Users can choose whether to enable caching (sacrificing some concurrency)
This feature is currently under evaluation. Updates will be announced in AI Radar.If you have strong caching requirements, please contact our business team to discuss custom solutions.
Typical Signals Needing Caching:✅ Your system prompt exceeds 5000 tokens ✅ Each request includes large amounts of repeated context (like RAG documents) ✅ Daily call count exceeds 1000 times ✅ Calculated monthly cache savings exceeds $50Typical Signals Not Needing Caching:❌ System prompt under 1000 tokens ❌ Request content is diverse, rarely repeated ❌ Call frequency is low (under 100 times per day) ❌ More concerned with payment convenience and network stabilityEvaluation Method:
  1. Review your current API call logs
  2. Calculate average input tokens per request
  3. Calculate cacheable portions (like system prompts, fixed context)
  4. Use above formula to calculate potential savings

Top-up Promotions

Learn about APIYI’s top-up bonuses, enjoy 20% off without caching

Model Selection Guide

Learn how to choose the right model, optimize costs and performance

API Concurrency Limits

Learn about APIYI’s concurrency capabilities and rate limits

Call Log Query

View your API call logs, analyze token consumption

Summary

APIYI does not support cache billing because:
  • ✅ Relay station uses account pool mechanism to improve concurrency and stability
  • ❌ Caching is account-bound, cannot hit across accounts
If you need caching features:
  • Option 1: Use official direct API (suitable for high-frequency, long context scenarios)
  • Option 2: Evaluate cache benefits, weigh costs (consider switching if monthly savings $50+)
  • Option 3: Hybrid approach (official API for cache scenarios, APIYI for others)
APIYI Advantages (non-cache scenarios):
  • 💰 Top-up bonuses from 20% off
  • 💳 Convenient payment (Alipay/WeChat)
  • 🌐 Domestic direct connection, no proxy needed
  • 🚀 200+ models unified interface
For more questions, please contact us!

Contact Us

Enterprise WeChat

Enterprise WeChat QR CodeScan QR code or Click to contact supportCaching feature consultation, technical support

Email Inquiry

Customer Service: [email protected]Business Cooperation: [email protected]