/v1beta generateContent), the response uses Google’s candidates / parts structure, different from OpenAI compatible mode. This page explains how to parse both non-streaming (generateContent) and streaming (streamGenerateContent).
The request side (base_url is
https://api.apiyi.com without /v1, x-goog-api-key auth, thinking_level control) is covered in the Gemini Native Format Guide. This page is purely about the response side. Examples use the lightweight model gemini-3.1-flash-lite.Non-streaming response
Endpoint…:generateContent. The answer lives in candidates[0].content.parts[]:
parts and concatenating each text:
finishReason is uppercase STOP (not OpenAI’s lowercase stop); other values include MAX_TOKENS and SAFETY. A part may contain only thoughtSignature and no text, so filter with if "text" in p when iterating, or you’ll hit a KeyError.thoughtSignature
Gemini 3-series models attach athoughtSignature (encrypted reasoning state) to parts — in testing, even the lightweight gemini-3.1-flash-lite returns it.
- Single turn: not needed; ignore it.
- Multi-turn / function calling: pass the previous response’s
thoughtSignatureback verbatim in the next turn’scontentsso the model can continue its reasoning chain. The officialgoogle-genaiSDK handles this automatically; when hand-writing REST, don’t drop the field. See Gemini Function Calling.
Streaming response (SSE)
Endpoint…:streamGenerateContent. Each line is data: {...}, and each chunk’s increment is in candidates[0].content.parts[0].text:
usageMetadata is present in every chunk and is cumulative (candidatesTokenCount grows with output) — just take the last chunk’s value; no manual summing needed.Key differences from OpenAI compatible mode
| Aspect | Gemini native (/v1beta) | OpenAI compatible (/v1/chat/completions) |
|---|---|---|
| base_url | https://api.apiyi.com (no /v1) | https://api.apiyi.com/v1 |
| Auth header | x-goog-api-key | Authorization: Bearer |
| Answer location | candidates[0].content.parts[].text | choices[0].message.content |
| Stream increment | each chunk’s parts[].text | choices[0].delta.content |
| Stream terminator | finishReason == "STOP", no [DONE] | data: [DONE] |
| Finish reason | uppercase STOP / MAX_TOKENS | lowercase stop |
| Thought signature | ✅ thoughtSignature (pass back across turns) | ❌ not exposed |
| usage | usageMetadata (cumulative each stream chunk) | usage (once, at stream tail) |
Usage and billing
thoughtsTokenCount(thinking tokens) is billed at the output rate; usethinking_levelto cap it and save cost.- For the cache-hit field (
cachedContentTokenCount) discount, see Gemini Cache Billing. - The full field reference is in the “Usage fields” section of the Gemini Native Format Guide.
Related links
- Same group: Gemini Native Format Guide · Multimodal & Code Execution · Function Calling
- Compatible-format counterpart: OpenAI Compatible Mode: Handling Responses
- Get / manage tokens:
https://api.apiyi.com/token