Examples use the endpoint
https://api.apiyi.com and your APIYI token. Models referenced: gpt-5.4-mini, deepseek-v4-pro, gemini-3.5-flash, claude-sonnet-4-6.Core principle: maintain history yourself
One sentence covers it: the model is stateless; you (the client) maintain the history and resend all of it every turn.OpenAI compatible mode (works across models)
The most universal approach, endpoint/v1/chat/completions. History lives in the messages array, each entry carrying a role (system / user / assistant). Switching the model string lets the same code drive different models (gpt, deepseek, claude, gemini…).
Handling history for reasoning models
Reasoning models likedeepseek-v4-pro return an extra reasoning_content (the chain of thought) field.
For more on parsing reasoning-model responses, see Reasoning Model Output.
OpenAI native format (Responses API)
Endpoint/v1/responses. For multi-turn, pass the full history as the input array (each entry with role / content) — the same self-managed approach as compatible mode:
Gemini native format
Endpoint/v1beta/models/{model}:generateContent. History lives in the contents array. Note the roles are user / model (not assistant), and each entry’s content goes in parts.
Gemini 3-series responses attach a
thoughtSignature to parts. For plain text multi-turn, passing back just text is enough to retain context (and cheaper on tokens); only scenarios needing strict reasoning continuity, like function calling, require passing thoughtSignature back verbatim — the official SDK handles this automatically. See Gemini Native Calls and Function Calling.Anthropic native format
Endpoint/v1/messages. History lives in the messages array with roles user / assistant; content can be a plain string. Note that max_tokens is required.
Comparison of the four formats
| Aspect | OpenAI compatible | OpenAI native (Responses) | Gemini native | Anthropic native |
|---|---|---|---|---|
| Endpoint | /v1/chat/completions | /v1/responses | /v1beta/…:generateContent | /v1/messages |
| History field | messages | input | contents | messages |
| Roles | system/user/assistant | user/assistant | user/model | user/assistant |
| Content form | content string | content string | parts: [{text}] | content string |
| History owner | you | you | you | you |
| Server-side state | none | ⚠️ unavailable | none | none |
| Cross-model | ✅ change model | OpenAI only | Gemini only | Claude only |
FAQ
Does a longer conversation cost more?
Does a longer conversation cost more?
Yes. Every turn resends the full history, so input tokens grow with the number of turns and cost rises accordingly. The main way to save is context caching: an identical history prefix automatically hits the cache rate (far below list price). See OpenAI caching, Claude caching, Gemini caching.
How many turns should I keep? What if I exceed the context window?
How many turns should I keep? What if I exceed the context window?
No hard rule, but longer history is costlier and can exceed the model’s context window. Common strategies: (1) sliding window — keep only the last N turns; (2) summary compression — condense earlier turns into a paragraph in the system prompt; (3) always keep the system instruction plus the most recent turns. Balance against how much “memory” your use case needs.
Where does the system / system instruction go?
Where does the system / system instruction go?
OpenAI compatible and Anthropic: at the front of the conversation (compatible uses
role:"system"; Anthropic uses the top-level system field or the first message). Gemini: use config.system_instruction. The system instruction only needs to be set once — no need to re-append it each turn.Should I pass a reasoning model's thinking (reasoning_content) back?
Should I pass a reasoning model's thinking (reasoning_content) back?
No. The thinking is an intermediate product of the turn; keep only the final
content in history (for Gemini, only text). Passing thinking back wastes tokens and some upstreams reject it. Gemini’s thoughtSignature in function-calling is the exception — the official SDK handles it automatically.Can the server remember the conversation so I don't resend history?
Can the server remember the conversation so I don't resend history?
On APIYI this is not recommended. OpenAI Responses’
previous_response_id is not guaranteed to work through the gateway (tested: no memory). Use client-side self-managed history everywhere — it’s the most stable and consistent across models.Related links
- Call basics: OpenAI Compatible Mode · OpenAI Native Calls · Gemini Native Calls · Claude API Basics
- Response parsing: OpenAI Handling Responses · Reasoning Model Output · Claude Streaming & Responses · Gemini Streaming & Responses
- Models & pricing: Models & Pricing Overview
- Get / manage tokens:
https://api.apiyi.com/token