Kimi K2.5 is live on API易: Alibaba Cloud official-transfer channel, OpenAI-compatible endpoint, model ID
kimi-k2.5. Unlike Kimi’s official site, Thinking mode must be explicitly enabled via enable_thinking: true in the request body — by default the model runs in Instant mode.Key Advantages
256K Context
256K tokens at no premium — fit an entire mid-size codebase or long document in a single call.
Thinking Mode
Enable deep reasoning with
enable_thinking: true — built for complex planning, root-cause analysis, and agents.Native Multimodal + Visual Coding
Understands images and code natively — excels at turning UI mockups, screenshots, and diagrams into runnable code.
Stable Alibaba Cloud Transfer
Routed through Alibaba Cloud’s official-transfer channel — enterprise-grade SLA under high concurrency.
Model Info
| Parameter | Value |
|---|---|
| Model ID | kimi-k2.5 |
| Context Window | 256,000 tokens |
| Modes | Instant / Thinking / Agent / Agent Swarm |
| Thinking Toggle | enable_thinking: true in request body (default false) |
| Input | Text + Image (native multimodal) |
| Output | Text |
| Streaming | ✅ Supported |
| Function Calling / Tool Use | ✅ Supported |
| Channel | Alibaba Cloud Official Transfer |
Pricing
| Item | Official | API易 Group (0.88×) | With Deposit Bonus (approx.) |
|---|---|---|---|
| Input | $0.60 / 1M tokens | $0.528 / 1M tokens | ~$0.48 / 1M tokens |
| Output | $2.50 / 1M tokens | $2.20 / 1M tokens | ~$2.00 / 1M tokens |
| Cache Hit (Input) | $0.10 / 1M tokens | $0.088 / 1M tokens | — |
Pricing notes: API易 uses a 0.88× multiplier (88% of official list price) as the base group rate. Stacking onboarding / bulk deposit bonuses (e.g. $100 deposit → $10 free and up) brings the effective cost below 80% of official. See Deposit Promotions for details.
How to Enable Thinking Mode
The biggest difference from Kimi’s official site is that API易 defaults to Instant mode — you must explicitly enable Thinking viaenable_thinking in the request body:
| Use Case | enable_thinking | Notes |
|---|---|---|
| Daily chat / fast responses | false (default) | Instant mode, lowest latency |
| Complex reasoning / code planning / RCA | true | Thinking mode, emits reasoning trace |
| Agent with web_search | false | Official limitation: web_search vs thinking are mutually exclusive |
cURL Example (Thinking enabled)
How to Call
Endpoint
Basic Usage (Instant Mode)
Advanced Usage (Thinking Mode)
Streaming
Request Parameters
| Name | Type | Required | Notes |
|---|---|---|---|
model | string | Yes | Must be kimi-k2.5 |
messages | array | Yes | Conversation messages |
enable_thinking | boolean | No | Enable Thinking mode; default false |
stream | boolean | No | Stream output |
temperature | number | No | Sampling temperature, 0–2 |
max_tokens | integer | No | Max output tokens |
tools | array | No | Function / tool list |
Response Format
Best Practices
- Switch modes per task: Leave Instant mode on for daily chat and short generations; set
enable_thinking: truefor complex reasoning, code review, and agent planning. - Use the 256K context: Fit a mid-size repo, full product docs, or long meeting transcripts in one call — at no premium.
- Multimodal visual coding: Send UI screenshots / design mockups and let K2.5 “read → plan → code” in one shot.
- Stretch the savings: Stack the $100+ deposit bonus with the 0.88× group rate — effective cost drops below 80% of official.
- Mind the web_search caveat: Disable
enable_thinkingif you need Moonshot’s built-in$web_searchtool.
FAQ
Why isn't my request using Thinking mode?
Why isn't my request using Thinking mode?
Thinking mode is off by default. Make sure the request body includes
"enable_thinking": true. With the OpenAI Python SDK, pass it inside extra_body; with the Node.js SDK you can pass it as a top-level field.Is API易's Kimi K2.5 the same model as Moonshot's?
Is API易's Kimi K2.5 the same model as Moonshot's?
Yes — it’s the same upstream model, routed through Alibaba Cloud’s official-transfer channel. The only difference is that Thinking mode is off by default and must be opted into via
enable_thinking.How does the 0.88× group rate work?
How does the 0.88× group rate work?
When creating an API token in the API易 console, assign it to a group that includes Kimi K2.5 — billing automatically applies the 0.88× multiplier. Combined with deposit bonuses, total cost drops further. See Deposit Promotions.
Does it support function calling / tool use?
Does it support function calling / tool use?
Yes. Pass standard OpenAI-style
tools definitions. Note that the official $web_search built-in tool is mutually exclusive with Thinking mode — use them in separate calls.Does Thinking mode cost extra?
Does Thinking mode cost extra?
Thinking traces count as output tokens and are billed normally. Complex tasks may produce significantly more output tokens, so enable it only when you need the deeper reasoning.
Related Resources
API Manual
Complete API usage guide
Deposit Promotions
Stack bonuses to drive the price down further
Model Info
Browse all available models and groups
Use Cases
Client integration walkthroughs