Skip to main content
This page covers how to call Claude with the Anthropic native Messages API (routed through the API易 gateway to AWS Bedrock), and the correct usage of output_config.effort (effort level) and thinking (adaptive thinking). For channels, billing, and basic onboarding, see the Claude API Basics page first.
Applicable models: Claude Opus 4.8 / 4.7 / 4.6, Sonnet 4.6, etc. This page uses Opus 4.8 as the example.

Request structure

Endpoint and headers

POST https://api.apiyi.com/v1/messages
HeaderValueNotes
content-typeapplication/jsonFixed
anthropic-version2023-06-01Anthropic native version header, required
x-api-keyyour-apiyi-keyAnthropic native auth
When API易 routes to Bedrock, the client still uses the Anthropic native format (x-api-key + /v1/messages); the gateway handles translation to Bedrock’s bedrock-2023-05-31 internally. You do not need to set anthropic_version: bedrock-2023-05-31.

Minimal request body

{
  "model": "claude-opus-4-8",
  "max_tokens": 16000,
  "messages": [
    { "role": "user", "content": "Your question" }
  ]
}

effort levels

effort controls how many tokens Claude is willing to spend to produce a result, trading off thoroughness against speed/cost. It affects all token consumption: the answer, tool calls, and extended thinking.
Key rules
  1. effort must go in a top-level standalone output_config object — not inside thinking. Misplacing it raises a ValidationException / 400.
  2. No beta header needed. Effort is now open to all supported models; anthropic-beta: effort-2025-11-24 is no longer required.
  3. The default is high; setting "high" behaves the same as omitting effort entirely.

Request body with effort

{
  "model": "claude-opus-4-8",
  "max_tokens": 16000,
  "output_config": {
    "effort": "medium"
  },
  "messages": [
    { "role": "user", "content": "Analyze the tradeoffs of microservices vs. a monolith" }
  ]
}

Level overview

LevelDescriptionTypical use
lowCheapest. Major token savings, slightly lower capability.Simple tasks, high concurrency, sub-agents
mediumBalanced. Moderate token savings.A reasonable default for most agentic workflows
highDefault. High capability.Complex reasoning, hard coding, quality-sensitive tasks
xhighLong-horizon extended capability, between high and max.Long coding / agentic tasks (over 30 minutes)
maxUnconstrained peak capability.Truly frontier problems, deepest reasoning
Opus 4.8 recommendation: start coding / agentic work at xhigh, use high for other intelligence-sensitive tasks, and only drop to medium / low once your evals confirm quality holds.When running xhigh / max, set max_tokens high (64k as a starting point) to leave the model room for thinking + output.

Adaptive thinking

Opus 4.7 / 4.8 use adaptive thinking: the model decides when and how much to think, with effort controlling depth.
{
  "model": "claude-opus-4-8",
  "max_tokens": 16000,
  "thinking": {
    "type": "adaptive",
    "display": "summarized"
  },
  "output_config": {
    "effort": "xhigh"
  },
  "messages": [
    { "role": "user", "content": "Walk through and pinpoint the root cause of this production bug" }
  ]
}
  • thinking.type: "adaptive" — enables adaptive thinking (omit it and the model won’t think).
  • thinking.display: "summarized" — returns thinking summary blocks in the response; drop it if you don’t need to surface them.
  • How effort relates to thinking: high / xhigh / max almost always think deeply; low / medium may skip thinking on simple problems.
Opus 4.7 / 4.8 do not support thinking.type: "enabled" + budget_tokens (it returns 400). Use adaptive + effort instead.

Parsing the response

The response content is an array of blocks, distinguished by type:
for block in data["content"]:
    if block["type"] == "thinking":
        print("[Thinking summary]", block["thinking"])
    elif block["type"] == "text":
        print("[Answer]", block["text"])
Token usage is in the usage field:
{
  "usage": {
    "input_tokens": 164,
    "output_tokens": 11056,
    "service_tier": "standard"
  }
}
If stop_reason is max_tokens, the output was truncated by max_tokens (thinking can easily fill the budget at high effort), and the answer text may be empty — just raise max_tokens.

Full runnable example

import os
import requests
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("APIYI_API_KEY")
BASE_URL = "https://api.apiyi.com"

resp = requests.post(
    f"{BASE_URL}/v1/messages",
    headers={
        "content-type": "application/json",
        "anthropic-version": "2023-06-01",
        "x-api-key": API_KEY,
    },
    json={
        "model": "claude-opus-4-8",
        "max_tokens": 16000,
        "thinking": {"type": "adaptive", "display": "summarized"},
        "output_config": {"effort": "xhigh"},
        "messages": [
            {"role": "user", "content": "Analyze the tradeoffs of microservices vs. a monolith"}
        ],
    },
    timeout=300,
)

data = resp.json()
print("status:", resp.status_code, "| usage:", data.get("usage"))
for block in data.get("content", []):
    if block.get("type") == "thinking":
        print("\n[Thinking summary]\n", block.get("thinking", ""))
    elif block.get("type") == "text":
        print("\n[Answer]\n", block.get("text", ""))
curl https://api.apiyi.com/v1/messages \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "x-api-key: your-apiyi-key" \
  -d '{
    "model": "claude-opus-4-8",
    "max_tokens": 16000,
    "thinking": { "type": "adaptive" },
    "output_config": { "effort": "xhigh" },
    "messages": [{ "role": "user", "content": "Analyze the tradeoffs of microservices vs. a monolith" }]
  }'

Notes for the Bedrock route

ItemNotes
output_configMust be passed through. If the gateway has a delete output_config override rule, effort gets silently dropped (returns 200 but has no effect).
effort positionInside the top-level output_config, never inside thinking.
Beta headerNot needed for effort; not needed for adaptive thinking either.
temperature / top_pOpus 4.7 / 4.8 with adaptive thinking should use default sampling; the gateway typically strips these two params for these models — that’s expected, and the client need not set them.
Invalid effort valuesBedrock degrades gracefully on unknown values (returns 200), no 400. So you can’t tell whether effort passes through by “does invalid error out” — check whether token counts diverge across levels instead.

References

  • Anthropic — Effort docs: platform.claude.com/docs/en/build-with-claude/effort
  • AWS Bedrock — Adaptive thinking: docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-adaptive-thinking.html
  • AWS Bedrock — Claude Opus 4.8: docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-8.html