developers.openai.com/api/docs/guides/function-calling, as of June 2026). Examples for both endpoints are copy-paste ready.
The Full Call Loop
Model returns a call
When the model decides to call, it returns the function name and JSON arguments
Execute locally
Your code parses the arguments and actually runs the function (query a DB, hit an external API…)
Key Format Differences Between the Two Endpoints
Same feature, different field formats on/v1/chat/completions vs /v1/responses — the most common integration trap:
| Chat Completions | Responses | |
|---|---|---|
| Tool definition | Nested: {"type": "function", "function": {name, parameters, ...}} | Flat: {"type": "function", "name": ..., "parameters": ...} |
| Call output | message.tool_calls[] (with id) | Top-level output item: {"type": "function_call", "call_id", "name", "arguments"} |
| Result return | {"role": "tool", "tool_call_id": ..., "content": ...} | {"type": "function_call_output", "call_id": ..., "output": ...} |
| strict mode | Set "strict": true explicitly | Server normalizes schemas to strict where possible |
Full Example: Chat Completions
A weather lookup through the complete define → call → execute → return loop:Full Example: Responses
Note the three differences: tool definitions are flat, calls come back as top-levelfunction_call items, and results return as function_call_output. With previous_response_id, the second request doesn’t need to resend the full history:
strict Mode (Structured Outputs)
strict: true guarantees the model’s arguments conform exactly to your JSON Schema — no hallucinated or missing fields. Three requirements:
- The schema must include
"additionalProperties": false - Every field must appear in
required(express optionality with"type": ["string", "null"]) - Only the supported JSON Schema subset (primitive types, enum, arrays, nested objects, …)
parallel_tool_calls and tool_choice
Parallel calls
parallel_tool_calls defaults to on: the model may request several functions in one turn (e.g. weather for Beijing and Shanghai simultaneously). Execute each, then return all results before the next request — every result must pair with its call_id (responses) or tool_call_id (chat).
tool_choice strategies
| Value | Behavior |
|---|---|
"auto" (default) | Model decides whether and what to call |
"required" | Must call at least one function |
{"type": "function", "name": "get_weather"} | Force a specific function |
"none" | No calls — text only |
allowed_tools subsets
When you have many tools but want to expose only some this turn, use theallowed_tools form of tool_choice to restrict the callable subset — it doesn’t modify the tools list itself, so it doesn’t break the stable prefix for caching:
Function Calls in Streaming
Chat Completions: assemble by index
Function arguments stream in fragments. Accumulate thearguments string per index, then json.loads after the stream ends:
Responses: listen for semantic events
response.function_call_arguments.delta events carry argument increments, and response.function_call_arguments.done delivers the complete arguments — no manual index assembly.
Best Practices and Pitfalls
Writing good tool definitions:- Names and descriptions are written for the model: spell out “when to call me”, e.g.
"Get real-time weather; call only when the user explicitly asks about weather" - Narrow parameters with enum: if values are enumerable, don’t use free-form strings — it eliminates most hallucinated arguments
- Keep tool definitions early in the prompt and stable: tools participate in the cache prefix; stable definitions mean 90%-off input (see Cache Billing)
- Cap your agent loop: set a max number of rounds so the model can’t burn money cycling call → return → call
| Symptom | Fix |
|---|---|
arguments isn’t valid JSON | Turn on strict: true — solves it at the root |
| Model calls a nonexistent function | Tighten with tool_choice; check whether descriptions mislead |
call_id mismatch after parallel calls | Every result must pair one-to-one with its call_id / tool_call_id — one missing pair fails the request |
| Parameter errors from mixed formats | Check the difference table above; match definition shape (nested/flat) to the endpoint |
Model Support and Selection
The entire gpt-5 series supports function calling. By scenario:| Scenario | Recommended model | Why |
|---|---|---|
| Everyday agents / tool use | gpt-5.4 ($2.50 / $15.00 per 1M) | Best capability-to-cost balance |
| High-frequency lightweight routing | gpt-5.4-mini ($0.75 / $4.50 per 1M) | Cheap; plenty for simple dispatch |
| Complex multi-step reasoning agents | gpt-5.5 ($5.00 / $30.00 per 1M) | Steadier on long planning chains |
Related Links
- This group: Native Calls · Compatible Mode · Cache Billing
- Get / manage tokens:
https://api.apiyi.com/token - Official OpenAI docs:
developers.openai.com/api/docs/guides/function-calling