Skip to main content

Short Answer

Yes. APIYI’s Claude, OpenAI, Gemini, DeepSeek, Qwen, and Grok channels all support cache billing: cache-related request parameters are forwarded upstream as-is, cache-hit fields come back to you untouched, and the billing dashboard lists cached usage as separate line items at the official discount rates — no middleware-specific adaptation needed in your code. Claude and OpenAI cache hits are stable and reliable (both have dedicated guides on this site — linked below). DeepSeek, Qwen, and Grok work well too. Gemini’s implicit caching is supported, but its hit rate is mediocre — don’t build your cost budget around Gemini caching.

The Three Channels at a Glance

OpenAI (gpt-5 series)ClaudeGemini
TriggerFully automatic, zero codeManual cache_control markersImplicit caching, auto-enabled
Minimum threshold1024 tokens1024–4096 tokens by model4096 (3 series) / 2048 (2.5 series)
Write feeFree1.25× (5 min) / 2× (1 hour)Free
Hit price0.1× of input0.1× of inputPer Google’s official discount
Real-world experience✅ Stable hits✅ Stable hits⚠️ Mediocre hit rate
Full guideOpenAI Cache BillingClaude Cache BillingGemini Cache Billing

Channel Notes

OpenAI: fully automatic, zero effort

Keep a stable prefix of at least 1024 tokens and hits happen automatically — the matched portion bills at 10% of the input price, with no write fee, so the 2nd request is already pure savings. How to write requests that hit and how to use prompt_cache_key: see the OpenAI Prompt Caching Billing Guide.

Claude: manual markers, biggest savings

Add cache_control to the content blocks you want cached; hits bill at 0.1× (writes cost 1.25× / 2×). Essential for Claude Code, Cline, Cursor, and other heavy workloads. Note it only works in the Anthropic native format (/v1/messages) — calling Claude through the OpenAI-compatible format gets no cache discount. See the Claude Prompt Caching Billing Guide.

Gemini: supported, but keep expectations low

APIYI auto-enables implicit context caching for the Gemini native format, and hits bill at Google’s official discount. In practice, however, Gemini’s cache hit rate is clearly below Claude / OpenAI (upstream implicit caching behavior is not controllable). Our advice:
  • Treat the cache discount as a nice-to-have bonus — estimate costs at the uncached price
  • For cache-sensitive workloads with long, frequent prefixes, prefer the OpenAI or Claude channels

Other Channels: DeepSeek / Qwen / Grok

Caching on these channels is fully automatic (no markers needed), works normally through APIYI, and performs well in practice:
ChannelTriggerHit discount (official)
DeepSeekAutomatic, prefix matchingHits save over 90% — the steepest discount of the lot
QwenImplicit caching, auto-enabled, prefix of at least 1024 tokensHits bill at the official discounted rate
Grok (xAI)Automatic, prefix matchingHits save roughly 75% or more (varies by model)
Raising hit rates works the same way as with OpenAI: stable content first, volatile content last — keep timestamps and random IDs out of the start of the prompt. The “stable prefix” playbook in the OpenAI Cache Billing Guide applies directly.

How to Confirm a Hit

Check the cache fields in the response usage:
ChannelHit field
OpenAI /v1/chat/completionsusage.prompt_tokens_details.cached_tokens
OpenAI /v1/responsesusage.input_tokens_details.cached_tokens
Claude /v1/messagesusage.cache_read_input_tokens
Gemini native formatusageMetadata.cachedContentTokenCount
DeepSeekusage.prompt_cache_hit_tokens / prompt_cache_miss_tokens
Qwen / Grok (OpenAI-compatible format)usage.prompt_tokens_details.cached_tokens
A value above 0 means a hit. In the dashboard call logs, cached usage appears as separate discounted line items you can verify directly.

Things to Watch

Caching follows the call format: calling Claude models through the OpenAI-compatible format (/v1/chat/completions) cannot get Claude’s cache discount — use the native /v1/messages format for heavy Claude usage.
  • Caches are isolated per model: switching models (even within the same series) shares nothing
  • For vendors not listed above (Kimi, etc.), go by the cache fields actually returned in your call logs
  • Official mechanism details: platform.openai.com/docs/guides/prompt-caching, docs.claude.com/en/docs/build-with-claude/prompt-caching, ai.google.dev/gemini-api/docs/caching, api-docs.deepseek.com/quick_start/pricing, docs.x.ai/developers/advanced-api-usage/prompt-caching

OpenAI Cache Billing Guide

Automatic caching: 1024-token threshold, 90%-off hits, prompt_cache_key routing

Claude Cache Billing Guide

Where to place cache_control, break-even math, multi-turn techniques

Model Multipliers

How console group multipliers convert to USD prices

Call Logs

Inspect per-request token usage and cache billing details

Contact Us

WeCom Support

WeCom support QR codeScan to add, or contact supportCache billing questions and technical support

Email