Short Answer
Yes. APIYI’s Claude, OpenAI, Gemini, DeepSeek, Qwen, and Grok channels all support cache billing: cache-related request parameters are forwarded upstream as-is, cache-hit fields come back to you untouched, and the billing dashboard lists cached usage as separate line items at the official discount rates — no middleware-specific adaptation needed in your code. Claude and OpenAI cache hits are stable and reliable (both have dedicated guides on this site — linked below). DeepSeek, Qwen, and Grok work well too. Gemini’s implicit caching is supported, but its hit rate is mediocre — don’t build your cost budget around Gemini caching.The Three Channels at a Glance
| OpenAI (gpt-5 series) | Claude | Gemini | |
|---|---|---|---|
| Trigger | Fully automatic, zero code | Manual cache_control markers | Implicit caching, auto-enabled |
| Minimum threshold | 1024 tokens | 1024–4096 tokens by model | 4096 (3 series) / 2048 (2.5 series) |
| Write fee | Free | 1.25× (5 min) / 2× (1 hour) | Free |
| Hit price | 0.1× of input | 0.1× of input | Per Google’s official discount |
| Real-world experience | ✅ Stable hits | ✅ Stable hits | ⚠️ Mediocre hit rate |
| Full guide | OpenAI Cache Billing | Claude Cache Billing | Gemini Cache Billing |
Channel Notes
OpenAI: fully automatic, zero effort
Keep a stable prefix of at least 1024 tokens and hits happen automatically — the matched portion bills at 10% of the input price, with no write fee, so the 2nd request is already pure savings. How to write requests that hit and how to useprompt_cache_key: see the OpenAI Prompt Caching Billing Guide.
Claude: manual markers, biggest savings
Addcache_control to the content blocks you want cached; hits bill at 0.1× (writes cost 1.25× / 2×). Essential for Claude Code, Cline, Cursor, and other heavy workloads. Note it only works in the Anthropic native format (/v1/messages) — calling Claude through the OpenAI-compatible format gets no cache discount. See the Claude Prompt Caching Billing Guide.
Gemini: supported, but keep expectations low
APIYI auto-enables implicit context caching for the Gemini native format, and hits bill at Google’s official discount. In practice, however, Gemini’s cache hit rate is clearly below Claude / OpenAI (upstream implicit caching behavior is not controllable). Our advice:- Treat the cache discount as a nice-to-have bonus — estimate costs at the uncached price
- For cache-sensitive workloads with long, frequent prefixes, prefer the OpenAI or Claude channels
Other Channels: DeepSeek / Qwen / Grok
Caching on these channels is fully automatic (no markers needed), works normally through APIYI, and performs well in practice:| Channel | Trigger | Hit discount (official) |
|---|---|---|
| DeepSeek | Automatic, prefix matching | Hits save over 90% — the steepest discount of the lot |
| Qwen | Implicit caching, auto-enabled, prefix of at least 1024 tokens | Hits bill at the official discounted rate |
| Grok (xAI) | Automatic, prefix matching | Hits save roughly 75% or more (varies by model) |
How to Confirm a Hit
Check the cache fields in the responseusage:
| Channel | Hit field |
|---|---|
OpenAI /v1/chat/completions | usage.prompt_tokens_details.cached_tokens |
OpenAI /v1/responses | usage.input_tokens_details.cached_tokens |
Claude /v1/messages | usage.cache_read_input_tokens |
| Gemini native format | usageMetadata.cachedContentTokenCount |
| DeepSeek | usage.prompt_cache_hit_tokens / prompt_cache_miss_tokens |
| Qwen / Grok (OpenAI-compatible format) | usage.prompt_tokens_details.cached_tokens |
Things to Watch
- Caches are isolated per model: switching models (even within the same series) shares nothing
- For vendors not listed above (Kimi, etc.), go by the cache fields actually returned in your call logs
- Official mechanism details:
platform.openai.com/docs/guides/prompt-caching,docs.claude.com/en/docs/build-with-claude/prompt-caching,ai.google.dev/gemini-api/docs/caching,api-docs.deepseek.com/quick_start/pricing,docs.x.ai/developers/advanced-api-usage/prompt-caching
Related Docs
OpenAI Cache Billing Guide
Automatic caching: 1024-token threshold, 90%-off hits, prompt_cache_key routing
Claude Cache Billing Guide
Where to place cache_control, break-even math, multi-turn techniques
Model Multipliers
How console group multipliers convert to USD prices
Call Logs
Inspect per-request token usage and cache billing details
Contact Us
WeCom Support
Support: [email protected]Business: [email protected]
