/v1/messages), the response is completely different from OpenAI compatible mode: the answer is a typed content block array, and streaming uses Anthropic’s named-event SSE protocol. This page explains how to parse both modes.
The request side (endpoint,
anthropic-version header, x-api-key auth, effort / thinking params) is covered in Claude API Basics and the Claude Effort & Thinking Guide. This page is purely about the response side. Examples use the lightweight model claude-haiku-4-5-20251001.Non-streaming response
The top level is amessage object, and the answer lives in the content array, split into blocks by type:
content array — you can’t read a single string field like in OpenAI:
stop_reason values: end_turn (normal), max_tokens (cut off by max_tokens — the text may be empty; raise the limit), stop_sequence, tool_use (wants to call a tool). With thinking on, the content array gains a type: "thinking" block placed before the text block.Streaming response (named-event SSE)
Claude streaming uses the Anthropic event protocol: each message has anevent: name plus a data: payload, and you dispatch by event type rather than treating every chunk identically as in OpenAI.
| Event | Role |
|---|---|
message_start | Message skeleton; usage.input_tokens and initial output_tokens are here |
content_block_start | A content block begins (index + block type text / thinking) |
content_block_delta | Increment; text is delta.text where delta.type == "text_delta" |
content_block_stop | The current block ends |
message_delta | Final stop_reason + the cumulative output_tokens are here |
message_stop | The whole message ends (no [DONE]; this event is the terminator) |
text_delta inside content_block_delta:
With thinking (adaptive thinking) on, a
type: "thinking" block appears first; its increments are thinking_delta, and a signature_delta (thinking-block signature) appears before the block closes. To display thinking, render thinking_delta and text_delta separately. See the Claude Effort & Thinking Guide.Key differences from OpenAI compatible mode
| Aspect | Claude native (/v1/messages) | OpenAI compatible (/v1/chat/completions) |
|---|---|---|
| Answer location | content block array, typed | choices[0].message.content string |
| Streaming protocol | Named events (event: + data:) | Homogeneous chunk objects |
| Stream terminator | message_stop event, no [DONE] | data: [DONE] |
| Increment field | content_block_delta.delta.text | choices[0].delta.content |
| usage fields | input_tokens / output_tokens (split across message_start and message_delta) | prompt_tokens / completion_tokens / total_tokens |
| Finish reason | stop_reason (end_turn, etc.) | finish_reason (stop, etc.) |
max_tokens | Required | Optional |
Usage and billing
- Non-streaming:
usagecomes back with the result, includinginput_tokens,output_tokens,cache_creation_input_tokens,cache_read_input_tokens. - Streaming:
input_tokensis inmessage_start, and the finaloutput_tokensis inmessage_delta— merge both. - For the cache-hit field (
cache_read_input_tokens) discount and usage, see Claude Cache Billing.
Related links
- Same group: Claude API Basics · Claude Cache Billing · Claude Effort & Thinking Guide
- Compatible-format counterpart: OpenAI Compatible Mode: Handling Responses
- Get / manage tokens:
https://api.apiyi.com/token