The request side (base_url, auth, switching models) is covered in Compatible Mode Calls. This page is purely about the response side: how to parse what comes back.
Two modes, one endpoint
The same/v1/chat/completions endpoint; only the stream flag changes the shape:
stream: false (default) | stream: true | |
|---|---|---|
| Shape | A single JSON object | An SSE stream (many data: lines) |
| Top-level type | chat.completion | chat.completion.chunk |
| Get the text | choices[0].message.content | Accumulate each choices[0].delta.content |
| Use case | Backends, batch jobs, full result | Chat UIs, token-by-token rendering |
Non-streaming response
Stable structure — just readchoices[0].message.content:
Streaming response (SSE)
Streaming pushes chunks as Server-Sent Events, one per line asdata: {...}, ending with data: [DONE]:
delta.content:
Integration notes: a few differences, handled uniformly
Streaming details vary slightly between models, but following the rules below lets one code path cover them all.| Difference | What you see | Uniform handling |
|---|---|---|
| Final-chunk choices | May be [] empty, or non-empty | Check choices is non-empty before reading delta |
finish_reason mid-value | Usually null; Claude uses "" (empty string) | Detect end with finish_reason === "stop" |
usage location | Empty-choices chunk / non-empty chunk / same chunk as stop | Try all three; record whenever present |
| Chunk granularity | Per-token (OpenAI) or per-sentence (Gemini/Claude) | Irrelevant — just accumulate |
| First role-declaration chunk | Some send an empty-content chunk declaring role | Skip when content is empty; don’t treat as text |
| Vendor-private fields | obfuscation, system_fingerprint, first_token_return_time, etc. | Ignore — never depend on them |
Robust reference parser
When you handle the raw SSE yourself (no SDK), this covers every difference above:Reasoning models (grok, qwen, glm, etc.) first stream
delta.reasoning_content (the chain of thought), then delta.content (the answer). The parser above reads only content, so the thinking is skipped automatically. To display the thinking, see Reasoning Model Output.Usage and billing
usagecomes back inline in non-streaming responses; in streaming it arrives in a trailing chunk (location per the table above — “record whenever present”).- Field breakdowns differ: the OpenAI family adds
completion_tokens_details, Gemini/Claude addinput_tokens/output_tokens, reasoning models addreasoning_tokens. Rely on the three standard fields:prompt_tokens/completion_tokens/total_tokens.
Related links
- Same group: Compatible Mode Calls · Reasoning Model Output · Native Calls
- Models & pricing: Models & Pricing Overview
- Get / manage tokens:
https://api.apiyi.com/token