output_config.effort (effort level) and thinking (adaptive thinking).
For channels, billing, and basic onboarding, see the Claude API Basics page first.
Applicable models: Claude Opus 4.8 / 4.7 / 4.6, Sonnet 4.6, etc. This page uses Opus 4.8 as the example.
Request structure
Endpoint and headers
| Header | Value | Notes |
|---|---|---|
content-type | application/json | Fixed |
anthropic-version | 2023-06-01 | Anthropic native version header, required |
x-api-key | your-apiyi-key | Anthropic native auth |
When API易 routes to Bedrock, the client still uses the Anthropic native format (
x-api-key + /v1/messages); the gateway handles translation to Bedrock’s bedrock-2023-05-31 internally. You do not need to set anthropic_version: bedrock-2023-05-31.Minimal request body
effort levels
effort controls how many tokens Claude is willing to spend to produce a result, trading off thoroughness against speed/cost. It affects all token consumption: the answer, tool calls, and extended thinking.
Request body with effort
Level overview
| Level | Description | Typical use |
|---|---|---|
low | Cheapest. Major token savings, slightly lower capability. | Simple tasks, high concurrency, sub-agents |
medium | Balanced. Moderate token savings. | A reasonable default for most agentic workflows |
high | Default. High capability. | Complex reasoning, hard coding, quality-sensitive tasks |
xhigh | Long-horizon extended capability, between high and max. | Long coding / agentic tasks (over 30 minutes) |
max | Unconstrained peak capability. | Truly frontier problems, deepest reasoning |
Adaptive thinking
Opus 4.7 / 4.8 use adaptive thinking: the model decides when and how much to think, with effort controlling depth.thinking.type: "adaptive"— enables adaptive thinking (omit it and the model won’t think).thinking.display: "summarized"— returns thinking summary blocks in the response; drop it if you don’t need to surface them.- How effort relates to thinking:
high/xhigh/maxalmost always think deeply;low/mediummay skip thinking on simple problems.
Parsing the response
The responsecontent is an array of blocks, distinguished by type:
usage field:
If
stop_reason is max_tokens, the output was truncated by max_tokens (thinking can easily fill the budget at high effort), and the answer text may be empty — just raise max_tokens.Full runnable example
Notes for the Bedrock route
| Item | Notes |
|---|---|
output_config | Must be passed through. If the gateway has a delete output_config override rule, effort gets silently dropped (returns 200 but has no effect). |
effort position | Inside the top-level output_config, never inside thinking. |
| Beta header | Not needed for effort; not needed for adaptive thinking either. |
temperature / top_p | Opus 4.7 / 4.8 with adaptive thinking should use default sampling; the gateway typically strips these two params for these models — that’s expected, and the client need not set them. |
| Invalid effort values | Bedrock degrades gracefully on unknown values (returns 200), no 400. So you can’t tell whether effort passes through by “does invalid error out” — check whether token counts diverge across levels instead. |
References
- Anthropic — Effort docs:
platform.claude.com/docs/en/build-with-claude/effort - AWS Bedrock — Adaptive thinking:
docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-adaptive-thinking.html - AWS Bedrock — Claude Opus 4.8:
docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-8.html