Does APIYI Support Cache Billing?

Short Answer

APIYI currently does not support cache billing. This is because APIYI uses a distributed account pool relay station model, where requests are distributed across multiple upstream accounts, while caching is account-specific and cannot be shared across accounts.

Important Notice: If your business heavily relies on caching features (such as context caching for DeepSeek, Kimi, etc.), we recommend using the official API directly.

Why Doesn’t APIYI Support Caching?

How the Relay Station Works

As an AI model relay platform, APIYI uses the following architecture to improve concurrency and service stability:

Account Pool Mechanism

Multiple Upstream Account PoolsAPIYI maintains multiple upstream accounts (OpenAI, Claude, etc.), intelligently distributing requests across different accounts

Load Balancing

Dynamic Request DistributionEach API call may be assigned to a different upstream account, improving concurrent processing capacity

How Caching Works

Large language model caching mechanisms (like Prompt Caching) are account-specific:

First Request

User sends request through Account A, upstream API (like OpenAI) caches the prompt to Account A’s cache space

Cache Billing

Upstream API bills Account A for caching (usually 50%-90% cheaper than normal input)

Subsequent Requests

If subsequent requests still use Account A, cache hits occur and discounted cache pricing applies

Why Can’t APIYI Support Caching?

Core Issue: The relay station’s account pool mechanism conflicts with cache’s account-binding characteristics

Scenario Example:

1st Request:
- User request → APIYI → Assigned to Upstream Account A
- Upstream Account A caches the prompt, charged $0.10

2nd Request (same prompt):
- User request → APIYI → Assigned to Upstream Account B
- Upstream Account B has no cache, needs reprocessing, charged $1.00 (no cache discount)

Result: Cache miss, user cannot benefit from cache pricing

Why Cache Fails:

Cache is account-specific, not user or API Key specific
APIYI backend has multiple accounts distributing requests, cannot guarantee consecutive requests use the same upstream account
Even if first request establishes cache on Account A, second request may be assigned to Account B, causing cache miss

What If I Need Caching Features?

Option 1: Use Official Direct API (Recommended)

If your business specifically requires caching (e.g., long context, repeated prompts), we recommend:

Official Direct API

Use Official Website APIs

Use OpenAI, Claude, DeepSeek, etc. official APIs directly
Ensures all requests use the same account
Can properly benefit from cache billing discounts

Note: Official APIs require handling:

Overseas credit card payment
Network access restrictions
Account registration barriers

Option 2: Evaluate Cache Benefits

Before switching to official APIs, evaluate cache benefits:

Which scenarios have significant cache benefits?

High-Benefit Scenarios:

📄 Long System Prompts: If your system prompt is lengthy (thousands of tokens) and reused in every request
📚 Long Context RAG: Retrieval-Augmented Generation (RAG) scenarios with large document content in each request
🔁 Repeated Calls: Frequently calling identical or similar prompts in short timeframes
💬 Multi-turn Conversations: Long conversation history passed repeatedly

Low-Benefit Scenarios:

💬 Short Prompts: Very short system prompts (dozens of tokens)
🔀 Diverse Requests: Each request has different prompts
⏰ Infrequent Calls: Long intervals between requests (cache may expire)

How to calculate cache savings?

Cache Savings Formula:

Per-Request Savings = (Normal Input Price - Cache Input Price) × Cached Tokens Count

Monthly Savings = Per-Request Savings × Cache Hit Count × 30 Days

Example (Claude Sonnet 4):

Scenario	Normal Input Price	Cache Input Price	Savings
Claude Sonnet 4	$3/M tokens	$0.30/M tokens	90%
System prompt 5000 tokens	$0.015	$0.0015	Save $0.0135
1000 calls/day	$15/day	$1.5/day	Monthly save $405

Evaluation Recommendations:

Switch if monthly savings exceed official API’s additional costs and operational overhead
Continue using APIYI if monthly savings less than $50 (no payment/network hassles)

What are APIYI's advantages over official APIs?

APIYI Advantages (without caching):✅ Convenient Payment:

Supports Alipay, WeChat Pay
RMB pricing (1:7 favorable exchange rate)
No overseas credit card needed

✅ Top-up Bonuses:

First-time + tiered bonuses (10%-20%)
Overall discount up to 20% off official prices

✅ No Network Restrictions:

Domestic direct connection, no proxy needed
China-optimized premium network, fast speeds

✅ Unified Interface:

200+ models with unified API format
One-click model switching
OpenAI SDK compatible

✅ Stable & Reliable:

Account pool improves concurrency
Automatic failover switching
Professional technical support

See Top-up Promotions for details

Option 3: Hybrid Approach

Choose flexibly based on business scenarios:

Cache-Sensitive Scenarios

Use Official Direct API

Long context RAG
Fixed system prompts
Multi-turn conversation apps

General Call Scenarios

Use APIYI

Short prompt tasks
Diverse requests
Infrequent call scenarios

Models Supporting Caching

The following model official APIs support cache billing (for reference):

Model Provider	Cache Feature Name	Savings	Official Docs
Claude	Prompt Caching	90%	`docs.anthropic.com/en/docs/build-with-claude/prompt-caching`
DeepSeek	Cache Prefix	95%	`api-docs.deepseek.com/quick_start/pricing`
Kimi	Context Caching	85%	`platform.moonshot.cn/docs/pricing`
Gemini	Context Caching	75%	`ai.google.dev/gemini-api/docs/caching`

Note: Above documentation links are in plain text format. Please copy manually to browser to access.

Frequently Asked Questions

Why does the relay station use an account pool mechanism?

Advantages of Account Pool:

Improved Concurrency: Single accounts have API rate limits (like OpenAI’s RPM/TPM), multiple accounts can exceed single account limits
Enhanced Stability: When one account has issues, automatically switch to others, avoiding service interruptions
Cost Optimization: Different accounts may have different pricing or quotas, flexible scheduling reduces costs
Risk Mitigation: Distributing requests across multiple accounts reduces risk of single account throttling or banning

This is the core competitiveness of relay platforms and the foundation for APIYI’s high concurrency and stable service.

Can you bind my API Key to a fixed upstream account?

Currently not supported.Reasons:

Binding to fixed accounts loses account pool advantages (concurrency, stability)
Single account rate limits may not meet your concurrency needs
Technically complex and increases operational costs

If you truly need a fixed account (like for caching), we recommend using official APIs directly.

Will APIYI support caching in the future?

We understand caching’s importance for certain business scenarios.Technical Challenges:

Need to completely change account pool allocation mechanism
Need to track each user’s cache state
Need to ensure consecutive requests use same upstream account

Possible Solutions:

Provide “fixed account mode” option (optional feature)
Users can choose whether to enable caching (sacrificing some concurrency)

This feature is currently under evaluation. Updates will be announced in AI Radar.If you have strong caching requirements, please contact our business team to discuss custom solutions.

How do I determine if my business needs caching?

Typical Signals Needing Caching:✅ Your system prompt exceeds 5000 tokens ✅ Each request includes large amounts of repeated context (like RAG documents) ✅ Daily call count exceeds 1000 times ✅ Calculated monthly cache savings exceeds $50Typical Signals Not Needing Caching:❌ System prompt under 1000 tokens ❌ Request content is diverse, rarely repeated ❌ Call frequency is low (under 100 times per day) ❌ More concerned with payment convenience and network stabilityEvaluation Method:

Review your current API call logs
Calculate average input tokens per request
Calculate cacheable portions (like system prompts, fixed context)
Use above formula to calculate potential savings

Top-up Promotions

Learn about APIYI’s top-up bonuses, enjoy 20% off without caching

Model Selection Guide

Learn how to choose the right model, optimize costs and performance

API Concurrency Limits

Learn about APIYI’s concurrency capabilities and rate limits

Call Log Query

View your API call logs, analyze token consumption

Summary

APIYI does not support cache billing because:

✅ Relay station uses account pool mechanism to improve concurrency and stability
❌ Caching is account-bound, cannot hit across accounts

If you need caching features:

Option 1: Use official direct API (suitable for high-frequency, long context scenarios)
Option 2: Evaluate cache benefits, weigh costs (consider switching if monthly savings $50+)
Option 3: Hybrid approach (official API for cache scenarios, APIYI for others)

APIYI Advantages (non-cache scenarios):

💰 Top-up bonuses from 20% off
💳 Convenient payment (Alipay/WeChat)
🌐 Domestic direct connection, no proxy needed
🚀 200+ models unified interface

For more questions, please contact us!

Contact Us

Enterprise WeChat

Scan QR code or Click to contact supportCaching feature consultation, technical support

Email Inquiry

Customer Service: [email protected]Business Cooperation: [email protected]

产品基础

基础 API

视频 API

图片 API

多模态理解 API

文本 API

Does APIYI Support Cache Billing?

Short Answer

Why Doesn’t APIYI Support Caching?

How the Relay Station Works

Account Pool Mechanism

Load Balancing

How Caching Works

Why Can’t APIYI Support Caching?

What If I Need Caching Features?

Option 1: Use Official Direct API (Recommended)

Official Direct API

Option 2: Evaluate Cache Benefits

Option 3: Hybrid Approach

Cache-Sensitive Scenarios

General Call Scenarios

Models Supporting Caching

Frequently Asked Questions

Top-up Promotions

Model Selection Guide

API Concurrency Limits

Call Log Query

Summary

Contact Us

Enterprise WeChat

Email Inquiry

产品基础

基础 API

视频 API

图片 API

多模态理解 API

文本 API

​Short Answer

​Why Doesn’t APIYI Support Caching?

​How the Relay Station Works

Account Pool Mechanism

Load Balancing

​How Caching Works

​Why Can’t APIYI Support Caching?

​What If I Need Caching Features?

​Option 1: Use Official Direct API (Recommended)

Official Direct API

​Option 2: Evaluate Cache Benefits

​Option 3: Hybrid Approach

Cache-Sensitive Scenarios

General Call Scenarios

​Models Supporting Caching

​Frequently Asked Questions

​Related Documentation

Top-up Promotions

Model Selection Guide

API Concurrency Limits

Call Log Query

​Summary

​Contact Us

Enterprise WeChat

Email Inquiry

Short Answer

Why Doesn’t APIYI Support Caching?

How the Relay Station Works

How Caching Works

Why Can’t APIYI Support Caching?

What If I Need Caching Features?

Option 1: Use Official Direct API (Recommended)

Option 2: Evaluate Cache Benefits

Option 3: Hybrid Approach

Models Supporting Caching

Frequently Asked Questions

Related Documentation

Summary

Contact Us