Skip to main content
LLMs have no memory of their own — a model doesn’t remember what you said a moment ago. A “multi-turn conversation” really just means sending the full conversation history with every request. This guide explains how each of the four call formats on APIYI maintains that history, and the pitfalls to watch for.
Examples use the endpoint https://api.apiyi.com and your APIYI token. Models referenced: gpt-5.4-mini, deepseek-v4-pro, gemini-3.5-flash, claude-sonnet-4-6.

Core principle: maintain history yourself

One sentence covers it: the model is stateless; you (the client) maintain the history and resend all of it every turn.
Turn 1: send [user Q1]                              → get [reply 1]
Turn 2: send [user Q1, reply 1, user Q2]            → get [reply 2]
Turn 3: send [user Q1, reply 1, user Q2, reply 2, user Q3] → get [reply 3]
Each new turn, append the previous user message and model reply to the end of the history array, then send the whole thing. The only differences between formats are what the history array is called and how roles are written.
On APIYI, always use the “maintain history yourself” approach. Do not rely on any server-side conversation state (such as OpenAI Responses’ previous_response_id) — it is not guaranteed to work through the gateway, as detailed in the OpenAI native section below.

OpenAI compatible mode (works across models)

The most universal approach, endpoint /v1/chat/completions. History lives in the messages array, each entry carrying a role (system / user / assistant). Switching the model string lets the same code drive different models (gpt, deepseek, claude, gemini…).
from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY", base_url="https://api.apiyi.com/v1")

messages = [{"role": "system", "content": "You are a friendly assistant."}]

def chat(user_input, model="gpt-5.4-mini"):
    messages.append({"role": "user", "content": user_input})
    resp = client.chat.completions.create(model=model, messages=messages)
    reply = resp.choices[0].message.content
    messages.append({"role": "assistant", "content": reply})  # append reply to history
    return reply

print(chat("My name is Alice and I'm 28. Please remember."))
print(chat("How old am I? And plus 5?"))   # remembers → 28, 33
One codebase, many models: change model to deepseek-v4-pro, claude-sonnet-4-6, gemini-3.5-flash, or any other model — the multi-turn logic stays identical. See the Models & Pricing Overview.

Handling history for reasoning models

Reasoning models like deepseek-v4-pro return an extra reasoning_content (the chain of thought) field.
Keep only content in the history — do not pass reasoning_content back. The thinking is just an intermediate product of the current turn; passing it back wastes tokens and violates upstream rules (DeepSeek’s direct API even returns a 400 for it). When appending to history, take only content:
messages.append({"role": "assistant", "content": resp.choices[0].message.content})
# do NOT include resp.choices[0].message.reasoning_content
For more on parsing reasoning-model responses, see Reasoning Model Output.

OpenAI native format (Responses API)

Endpoint /v1/responses. For multi-turn, pass the full history as the input array (each entry with role / content) — the same self-managed approach as compatible mode:
from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY", base_url="https://api.apiyi.com/v1")

resp = client.responses.create(
    model="gpt-5.4-mini",
    input=[
        {"role": "user", "content": "Remember the codeword: purple elephant."},
        {"role": "assistant", "content": "Got it, the codeword is purple elephant."},
        {"role": "user", "content": "What's the codeword?"},
    ],
)
print(resp.output_text)   # The codeword is: purple elephant.
Do not rely on server-side state like previous_response_id / conversation / store. Tested through the APIYI gateway: passing previous_response_id does not error (returns 200), but the next turn does not remember the previous one, and GET /v1/responses/{id} is unavailable. So on APIYI, use the Responses API with self-managed history (the input array) as shown above.

Gemini native format

Endpoint /v1beta/models/{model}:generateContent. History lives in the contents array. Note the roles are user / model (not assistant), and each entry’s content goes in parts.
from google import genai

client = genai.Client(api_key="YOUR_API_KEY",
                      http_options={"base_url": "https://api.apiyi.com"})

contents = [
    {"role": "user", "parts": [{"text": "Remember the codeword: purple elephant."}]},
    {"role": "model", "parts": [{"text": "Got it: purple elephant."}]},
    {"role": "user", "parts": [{"text": "What's the codeword?"}]},
]
resp = client.models.generate_content(model="gemini-3.5-flash", contents=contents)
print(resp.text)   # The codeword is: purple elephant.
Even simpler: the official google-genai SDK’s client.chats.create(...) maintains the contents history for you — just call send_message, no manual stitching.
Gemini 3-series responses attach a thoughtSignature to parts. For plain text multi-turn, passing back just text is enough to retain context (and cheaper on tokens); only scenarios needing strict reasoning continuity, like function calling, require passing thoughtSignature back verbatim — the official SDK handles this automatically. See Gemini Native Calls and Function Calling.

Anthropic native format

Endpoint /v1/messages. History lives in the messages array with roles user / assistant; content can be a plain string. Note that max_tokens is required.
import requests

def chat(messages):
    r = requests.post(
        "https://api.apiyi.com/v1/messages",
        headers={
            "content-type": "application/json",
            "anthropic-version": "2023-06-01",
            "x-api-key": "YOUR_API_KEY",
        },
        json={"model": "claude-sonnet-4-6", "max_tokens": 200, "messages": messages},
        timeout=60,
    )
    return "".join(b["text"] for b in r.json()["content"] if b["type"] == "text")

messages = [{"role": "user", "content": "Remember the codeword: purple elephant."}]
reply = chat(messages)
messages.append({"role": "assistant", "content": reply})       # append reply
messages.append({"role": "user", "content": "What's the codeword?"})
print(chat(messages))   # The codeword is: purple elephant.
You can also use the official anthropic SDK by pointing base_url at https://api.apiyi.com. The response is a content block array — parsing details in Claude Streaming & Responses.

Comparison of the four formats

AspectOpenAI compatibleOpenAI native (Responses)Gemini nativeAnthropic native
Endpoint/v1/chat/completions/v1/responses/v1beta/…:generateContent/v1/messages
History fieldmessagesinputcontentsmessages
Rolessystem/user/assistantuser/assistantuser/modeluser/assistant
Content formcontent stringcontent stringparts: [{text}]content string
History owneryouyouyouyou
Server-side statenone⚠️ unavailablenonenone
Cross-model✅ change modelOpenAI onlyGemini onlyClaude only
Choosing: want one codebase across vendors → prefer OpenAI compatible mode; need a vendor’s native-only features (Gemini thought signatures / code execution, Claude thinking blocks & caching, OpenAI built-in tools) → use that native format.

FAQ

Yes. Every turn resends the full history, so input tokens grow with the number of turns and cost rises accordingly. The main way to save is context caching: an identical history prefix automatically hits the cache rate (far below list price). See OpenAI caching, Claude caching, Gemini caching.
No hard rule, but longer history is costlier and can exceed the model’s context window. Common strategies: (1) sliding window — keep only the last N turns; (2) summary compression — condense earlier turns into a paragraph in the system prompt; (3) always keep the system instruction plus the most recent turns. Balance against how much “memory” your use case needs.
OpenAI compatible and Anthropic: at the front of the conversation (compatible uses role:"system"; Anthropic uses the top-level system field or the first message). Gemini: use config.system_instruction. The system instruction only needs to be set once — no need to re-append it each turn.
No. The thinking is an intermediate product of the turn; keep only the final content in history (for Gemini, only text). Passing thinking back wastes tokens and some upstreams reject it. Gemini’s thoughtSignature in function-calling is the exception — the official SDK handles it automatically.
On APIYI this is not recommended. OpenAI Responses’ previous_response_id is not guaranteed to work through the gateway (tested: no memory). Use client-side self-managed history everywhere — it’s the most stable and consistent across models.