Skip to main content
Kimi K2.5 is Moonshot AI’s native multimodal flagship, released on January 27, 2026. It focuses on Visual Coding and autonomous Agent Swarm orchestration, with a 256K context window offered at no premium. API易 integrates it via an Alibaba Cloud official-transfer channel for production-grade stability. The base group rate is 0.88× of official pricing, and stacking deposit bonuses (from $100 deposit → $10 free and up) brings the effective cost below 80% of official pricing.
Kimi K2.5 is live on API易: Alibaba Cloud official-transfer channel, OpenAI-compatible endpoint, model ID kimi-k2.5. Unlike Kimi’s official site, Thinking mode must be explicitly enabled via enable_thinking: true in the request body — by default the model runs in Instant mode.

Key Advantages

256K Context

256K tokens at no premium — fit an entire mid-size codebase or long document in a single call.

Thinking Mode

Enable deep reasoning with enable_thinking: true — built for complex planning, root-cause analysis, and agents.

Native Multimodal + Visual Coding

Understands images and code natively — excels at turning UI mockups, screenshots, and diagrams into runnable code.

Stable Alibaba Cloud Transfer

Routed through Alibaba Cloud’s official-transfer channel — enterprise-grade SLA under high concurrency.

Model Info

ParameterValue
Model IDkimi-k2.5
Context Window256,000 tokens
ModesInstant / Thinking / Agent / Agent Swarm
Thinking Toggleenable_thinking: true in request body (default false)
InputText + Image (native multimodal)
OutputText
Streaming✅ Supported
Function Calling / Tool Use✅ Supported
ChannelAlibaba Cloud Official Transfer
Kimi’s built-in $web_search tool is currently incompatible with Thinking mode. Per Moonshot’s guidance, disable enable_thinking when you need the web_search tool. This limitation matches the official platform.

Pricing

ItemOfficialAPI易 Group (0.88×)With Deposit Bonus (approx.)
Input$0.60 / 1M tokens$0.528 / 1M tokens~$0.48 / 1M tokens
Output$2.50 / 1M tokens$2.20 / 1M tokens~$2.00 / 1M tokens
Cache Hit (Input)$0.10 / 1M tokens$0.088 / 1M tokens
Pricing notes: API易 uses a 0.88× multiplier (88% of official list price) as the base group rate. Stacking onboarding / bulk deposit bonuses (e.g. $100 deposit → $10 free and up) brings the effective cost below 80% of official. See Deposit Promotions for details.

How to Enable Thinking Mode

The biggest difference from Kimi’s official site is that API易 defaults to Instant mode — you must explicitly enable Thinking via enable_thinking in the request body:
Use Caseenable_thinkingNotes
Daily chat / fast responsesfalse (default)Instant mode, lowest latency
Complex reasoning / code planning / RCAtrueThinking mode, emits reasoning trace
Agent with web_searchfalseOfficial limitation: web_search vs thinking are mutually exclusive

cURL Example (Thinking enabled)

curl --location 'https://api.apiyi.com/v1/chat/completions' \
  --header "Authorization: Bearer sk-xxxx" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "kimi-k2.5",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is 1+1?"
      }
    ],
    "enable_thinking": true
  }'

How to Call

Endpoint

https://api.apiyi.com/v1/chat/completions

Basic Usage (Instant Mode)

curl -X POST "https://api.apiyi.com/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      {"role": "user", "content": "Introduce yourself in one sentence."}
    ]
  }'

Advanced Usage (Thinking Mode)

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyi.com/v1"
)

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Analyze the time complexity of this code and suggest optimizations."}
    ],
    extra_body={
        "enable_thinking": True
    }
)

print(response.choices[0].message.content)

Streaming

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Write a short poem about spring."}],
    stream=True,
    extra_body={"enable_thinking": True}
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Request Parameters

NameTypeRequiredNotes
modelstringYesMust be kimi-k2.5
messagesarrayYesConversation messages
enable_thinkingbooleanNoEnable Thinking mode; default false
streambooleanNoStream output
temperaturenumberNoSampling temperature, 0–2
max_tokensintegerNoMax output tokens
toolsarrayNoFunction / tool list

Response Format

{
  "id": "chatcmpl-xxxxxxxx",
  "object": "chat.completion",
  "created": 1706300000,
  "model": "kimi-k2.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 12,
    "total_tokens": 36
  }
}

Best Practices

  1. Switch modes per task: Leave Instant mode on for daily chat and short generations; set enable_thinking: true for complex reasoning, code review, and agent planning.
  2. Use the 256K context: Fit a mid-size repo, full product docs, or long meeting transcripts in one call — at no premium.
  3. Multimodal visual coding: Send UI screenshots / design mockups and let K2.5 “read → plan → code” in one shot.
  4. Stretch the savings: Stack the $100+ deposit bonus with the 0.88× group rate — effective cost drops below 80% of official.
  5. Mind the web_search caveat: Disable enable_thinking if you need Moonshot’s built-in $web_search tool.

FAQ

Thinking mode is off by default. Make sure the request body includes "enable_thinking": true. With the OpenAI Python SDK, pass it inside extra_body; with the Node.js SDK you can pass it as a top-level field.
Yes — it’s the same upstream model, routed through Alibaba Cloud’s official-transfer channel. The only difference is that Thinking mode is off by default and must be opted into via enable_thinking.
When creating an API token in the API易 console, assign it to a group that includes Kimi K2.5 — billing automatically applies the 0.88× multiplier. Combined with deposit bonuses, total cost drops further. See Deposit Promotions.
Yes. Pass standard OpenAI-style tools definitions. Note that the official $web_search built-in tool is mutually exclusive with Thinking mode — use them in separate calls.
Thinking traces count as output tokens and are billed normally. Complex tasks may produce significantly more output tokens, so enable it only when you need the deeper reasoning.

API Manual

Complete API usage guide

Deposit Promotions

Stack bonuses to drive the price down further

Model Info

Browse all available models and groups

Use Cases

Client integration walkthroughs