/v1/chat/completions is the de facto standard interface of the LLM industry — virtually every framework, client, and SDK supports it out of the box. Through APIYI, this single endpoint reaches OpenAI, Claude, Gemini, DeepSeek, and 400+ models in total; switching models is just swapping a string.
Which endpoint to pick: using existing frameworks/clients, or want one codebase across multiple vendors → compatible mode (this page); need built-in tools (web search, code interpreter), state management, or Pro-series models → Native Calls (/v1/responses). OpenAI’s official stance on Chat Completions: supported long-term, but Responses is recommended for new projects.
Quick Start
curl https://api.apiyi.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-5.4",
"messages": [
{"role": "user", "content": "Introduce yourself in one sentence"}
]
}'
One Interface, Every Provider
This is the biggest payoff of compatible mode: switching models means changing a string — not a line of code.
def ask(message: str, model: str) -> str:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": message}]
)
return response.choices[0].message.content
print(ask("Explain quantum entanglement", "gpt-5.4")) # OpenAI
print(ask("Explain quantum entanglement", "claude-sonnet-4-6")) # Anthropic
print(ask("Explain quantum entanglement", "gemini-3-pro-preview")) # Google
print(ask("Explain quantum entanglement", "deepseek-chat")) # DeepSeek
Full model names and prices: Models & Pricing. Note: calling Claude through the compatible format forfeits Claude’s Prompt Cache discount — for heavy Claude usage, use Claude Native Calls.
SDK Setup per Language
Every official SDK supports a custom base_url — configure once and go.
Python
from openai import OpenAI, AsyncOpenAI
# Synchronous client
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.apiyi.com/v1"
)
# Async client
async_client = AsyncOpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.apiyi.com/v1"
)
Or use environment variables for zero in-code config:
export OPENAI_API_KEY="YOUR_API_KEY"
export OPENAI_BASE_URL="https://api.apiyi.com/v1"
from openai import OpenAI
client = OpenAI() # reads the environment automatically
Node.js / TypeScript
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'https://api.apiyi.com/v1'
});
const response = await openai.chat.completions.create({
model: 'gpt-5.4-mini',
messages: [{ role: 'user', content: 'Hello!' }],
temperature: 0.7
});
.NET
dotnet add package OpenAI
using OpenAI;
using OpenAI.Chat;
var client = new OpenAIClient(
new System.ClientModel.ApiKeyCredential("YOUR_API_KEY"),
new OpenAIClientOptions { Endpoint = new Uri("https://api.apiyi.com/v1") }
);
var chatClient = client.GetChatClient("gpt-5.4");
var response = await chatClient.CompleteChatAsync("Hello!");
Console.WriteLine(response.Value.Content[0].Text);
Use the official OpenAI Go SDK (github.com/openai/openai-go):
go get github.com/openai/openai-go
package main
import (
"context"
"fmt"
"github.com/openai/openai-go"
"github.com/openai/openai-go/option"
)
func main() {
client := openai.NewClient(
option.WithAPIKey("YOUR_API_KEY"),
option.WithBaseURL("https://api.apiyi.com/v1"),
)
completion, err := client.Chat.Completions.New(context.TODO(), openai.ChatCompletionNewParams{
Model: "gpt-5.4",
Messages: []openai.ChatCompletionMessageParamUnion{
openai.UserMessage("Hello!"),
},
})
if err != nil {
panic(err)
}
fmt.Println(completion.Choices[0].Message.Content)
}
Java
Use the official OpenAI Java SDK (com.openai:openai-java):
<dependency>
<groupId>com.openai</groupId>
<artifactId>openai-java</artifactId>
<version>LATEST</version>
</dependency>
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.chat.completions.ChatCompletion;
import com.openai.models.chat.completions.ChatCompletionCreateParams;
OpenAIClient client = OpenAIOkHttpClient.builder()
.apiKey("YOUR_API_KEY")
.baseUrl("https://api.apiyi.com/v1")
.build();
ChatCompletionCreateParams params = ChatCompletionCreateParams.builder()
.model("gpt-5.4")
.addUserMessage("Hello!")
.build();
ChatCompletion completion = client.chat().completions().create(params);
System.out.println(completion.choices().get(0).message().content().orElse(""));
Legacy projects on third-party libraries (Go’s sashabaranov/go-openai, Java’s theokanning packages) still work after changing the base_url, but we recommend migrating to the official SDKs above — third-party libraries lag on new parameters such as reasoning_effort.
Common Features
Streaming
stream = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Write a short poem about autumn"}],
stream=True
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Reasoning control
On Chat Completions, use the top-level reasoning_effort parameter (different from the nested form on Responses):
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational"}],
reasoning_effort="high" # none / low / medium / high / xhigh
)
gpt-5 series reasoning models do not support temperature / top_p on this endpoint either — passing them raises an error.
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}
]
)
Embeddings
response = client.embeddings.create(
model="text-embedding-3-small",
input="Text to embed"
)
embedding = response.data[0].embedding
Error Handling and Retries
The official SDKs retry automatically (2 attempts by default, on 429 / 5xx / connection errors) — prefer that over hand-rolled loops:
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.apiyi.com/v1",
max_retries=3, # built-in exponential backoff
timeout=60.0
)
For finer control, catch by exception type:
from openai import (
APIError,
APIConnectionError,
RateLimitError,
InternalServerError,
)
try:
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError:
print("Rate limited — retry later")
except APIConnectionError:
print("Connection error — check network/proxy")
except InternalServerError:
print("Upstream error — worth retrying")
except APIError as e:
print(f"API error: {e}")
Capability Boundaries of Compatible Mode
| Capability | Compatible mode | Notes |
|---|
| Chat / streaming / multimodal input | ✅ | Fully supported |
| Function calling (FC) | ✅ | See Function Calling |
| Prompt cache discount | ✅ | Automatic for OpenAI models — see Cache Billing |
| Built-in tools (web search, code interpreter, …) | ❌ | Native Calls only |
| State management (previous_response_id) | ❌ | Native only; here you assemble history yourself |
verbosity output control | ❌ | Native only |
| Pro-series models (gpt-5.4-pro, …) | ❌ | In practice, native calls only |
Migrating from OpenAI Direct
Already on OpenAI’s official service? Migration is two steps with zero code changes:
- Change base_url and key
# Before
client = OpenAI(api_key="sk-...")
# After
client = OpenAI(
api_key="YOUR_APIYI_KEY",
base_url="https://api.apiyi.com/v1"
)
- Or change environment variables only (code untouched)
export OPENAI_API_KEY="YOUR_APIYI_KEY"
export OPENAI_BASE_URL="https://api.apiyi.com/v1"
Method calls, parameter formats, and response structures all stay identical.