Call /v1/responses through APIYI: state management, reasoning controls, built-in tools, and semantic streaming events. OpenAI’s recommended endpoint for new projects.
/v1/responses is OpenAI’s current flagship native endpoint. In OpenAI’s own words: “While Chat Completions remains supported, Responses is recommended for all new projects.” APIYI fully supports this endpoint — just point base_url at https://api.apiyi.com/v1.This page is based on the official OpenAI documentation (developers.openai.com/api/docs, as of June 2026). All examples are copy-paste ready.
Compared with Chat Completions, OpenAI cites three hard numbers:
Better reasoning: the same reasoning model scores about 3% higher on SWE-bench via Responses (reasoning state persists across turns)
Cheaper input: cache utilization is 40%–80% higher than Chat Completions (OpenAI internal testing), which directly cuts your input bill
More tools: built-in tools like web_search and code_interpreter are Responses-only
When Chat Completions is still the right choice: you rely on existing frameworks (LangChain and most clients default to /v1/chat/completions), or you want one codebase that also calls Claude, Gemini, and other non-OpenAI models — see Compatible Mode.
What is deprecated is the Assistants API (scheduled to shut down on August 26, 2026 (UTC)), not Chat Completions. Both endpoints remain supported long-term; new features simply land on Responses first.
curl https://api.apiyi.com/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "gpt-5.4", "input": "Introduce yourself in one sentence", "instructions": "You are a concise assistant" }'
Prefer response.output_text over hand-written output[0].content[0].text — for reasoning models, the first item in output is often a reasoning item, not a message, so hard-coded indexing breaks.
output is an array of items. The three common types: reasoning (reasoning summary), message (text reply), and function_call (a function call request). A trimmed example:
Create a conversation object and attach requests to the same conversation. Not subject to the 30-day response retention window — good for long-lived sessions.
store defaults to true: response objects are retained server-side for 30 days so previous_response_id can reference them. For data-residency-sensitive workloads, pass store: false explicitly — but that response can no longer be chained.
Chaining does not reduce input billing: all prior context pulled in via previous_response_id is still billed as input tokens in full. Long conversations save money through cache discounts (the historical prefix auto-hits the 0.1× cache rate), not through chaining itself — see Cache Billing.
Built-in tools are a Responses-only capability — declare them in tools and OpenAI executes them server-side:
Tool
type
Description
Web search
web_search
Model searches the web autonomously
File search
file_search
Query uploaded vector stores
Code interpreter
code_interpreter
Run Python in a sandbox
Computer use
computer_use
Drive a virtual desktop
Remote MCP
mcp
Connect to remote MCP servers
Image generation
image_generation
Inline image generation
Tool search
tool_search
Dynamic retrieval over large tool sets (gpt-5.4 and later)
Minimal web_search example:
response = client.responses.create( model="gpt-5.4", input="What are today's major AI news stories?", tools=[{"type": "web_search"}])print(response.output_text)
Built-in tools execute on OpenAI’s side; pass-through support per tool on the APIYI channel should be confirmed by testing. Custom function calling is fully supported — see Function Calling.
gpt-5.4-pro and gpt-5.5-pro are deep-reasoning models for professional workloads ($30 / $180 per million tokens, svip group only) and are, in practice, available via /v1/responses only. A single request can take minutes — pair them with background: true:
# Submit a background taskresponse = client.responses.create( model="gpt-5.4-pro", input="Do a deep review of this architecture proposal: ...", background=True)# Poll for the resultimport timewhile response.status in ("queued", "in_progress"): time.sleep(10) response = client.responses.retrieve(response.id)print(response.output_text)
Pro models are expensive and slow — the trade is “minutes of waiting for a more reliable answer”. For everyday development use gpt-5.4 / gpt-5.5; don’t reach for Pro without a clear deep-reasoning need.