Key Highlights
- Two models launched:
deepseek-v4-pro(1.6T total / 49B active) anddeepseek-v4-flash(284B total / 13B active), both MoE - 1M context: Full 1,000,000-token context across the family, powered by a new Hybrid Attention architecture + DSA sparse attention
- Open-source SOTA: V4-Pro is the current best open-source model on Agentic Coding; scores 80.6 on SWE-Verified, matching Claude (80.8) and Gemini (80.6)
- Tunable thinking: Supports
reasoning_effortparameter (high / max); official guidance recommendsmaxfor complex agent scenarios - Dual API compatibility: Works with both the OpenAI ChatCompletions and Anthropic endpoints
- Friendly pricing: Flash at $0.14 in / $0.28 out per 1M tokens; Pro at $1.74 in / $3.48 out — same as official
- Recharge bonus: Stackable with API易 recharge promotions for an effective ~15% discount off the official list price
The version currently live on API易 is the Aliyun official relay channel. Release date: 2026-04-24 (official preview). Source: DeepSeek docs at
api-docs.deepseek.com/zh-cn/news/news260424.Background
A full year after DeepSeek-R1 shook the industry, DeepSeek returned on April 24, 2026 with its V4 preview release — launching a performance flagship V4-Pro and a speed/cost-optimized V4-Flash simultaneously. The headline technical advance in V4 is the Hybrid Attention Architecture: attention is compressed along the token dimension and combined with DSA sparse attention, making long-context inference both efficient and accurate. Paired with a 1M-token context window, this generation is purpose-built for agents and long-horizon reasoning. DeepSeek is candid about its positioning versus closed frontier models: V4-Pro trails only Gemini-Pro-3.1 on world knowledge, and the overall gap with GPT-5.4 / Gemini-Pro-3.1 is “about 3 to 6 months” — the strongest catch-up yet from the open-source camp.Deep Dive
The Two New Models
deepseek-v4-pro
Performance flagship1.6T total params / 49B active, MoE, 1M context. For complex agents, coding, math, STEM, and competitive-grade code. Agentic Coding is open-source SOTA.
deepseek-v4-flash
Speed + economy284B total / 13B active, MoE, 1M context. For high-throughput, latency-sensitive, cost-conscious workloads like chat, text ops, and batch tasks.
Benchmark Highlights
Based on official and third-party evaluations:| Dimension | DeepSeek-V4-Pro | Competitor reference |
|---|---|---|
| SWE-Verified (real software engineering) | 80.6 | Claude 80.8 / Gemini 80.6 |
| Agentic Coding | Open-source SOTA | Approaches Claude Opus 4.5 |
| World knowledge | Open-source leader | Only behind Gemini-Pro-3.1 |
| Math / STEM / Competition code | Beats every public open-source model | — |
| Overall gap vs. GPT-5.4 / Gemini-Pro-3.1 | ~3-6 months | — |
Architecture & Specs
Hybrid Attention Architecture
- Token-level compression: A new attention mechanism compresses along the token axis, drastically lowering long-context inference cost
- DSA sparse attention: Combined with sparse attention for better long-range dependency modeling
- MoE experts: V4-Pro activates ~3% (49B/1.6T); V4-Flash activates ~4.6% (13B/284B)
- 1M context: Full 1,000,000 tokens across the family — ideal for agents, repo-scale code, and long documents
Thinking Modes & reasoning_effort
V4 supports both non-thinking and thinking modes. In thinking mode,reasoning_effort is tunable:
high: standard deep reasoning, for most complex tasksmax: maximum reasoning budget, officially recommended for complex agent scenarios
In Practice
Recommended Use Cases
Agents & tool use
V4-Pro-Max is the strongest open-source agent base today — great for Claude Code, Cline, and custom agent pipelines
Repo-scale coding
SWE-Verified 80.6 + 1M context: load a mid-to-large repo in a single call
Long-document analysis
Reports, legal docs, papers — 1M context + compressed attention keep costs friendly
High-throughput economy
V4-Flash at $0.14 / 1M tokens input: ideal for support bots, classification, translation
Quickstart (OpenAI-compatible)
Economy Mode (Flash)
Anthropic Endpoint
Best Practices
- Model choice: Default to Flash; switch to Pro for agents / complex code / reasoning-heavy tasks
- Thinking effort: Disable thinking for simple tasks; use
reasoning_effort=maxfor heavy agent work - Long context: 1M is great, but input tokens are billed — pre-filter before feeding
- Streaming: Thinking mode may emit many intermediate tokens — stream on the client for better UX
Pricing & Availability
Price Sheet (USD / 1M tokens)
| Model | Billing | Prompt (input) | Completion (output) | Prompt multiplier | Completion multiplier |
|---|---|---|---|---|---|
deepseek-v4-flash | Pay-as-you-go - Chat | $0.1400 | $0.2800 | 0.07 | 2.0000 |
deepseek-v4-pro | Pay-as-you-go - Chat | $1.7400 | $3.4800 | 0.87 | 2.0000 |
API易’s list price exactly matches DeepSeek’s official pricing — no markup. The channel is currently Aliyun official relay, with stability on par with direct-to-official access.
Stack With Recharge Promotions
Recharge promotions bring effective cost down to roughly 85% of official. See:Recharge Promotions
View the latest recharge bonus tiers — larger top-ups earn higher bonus ratios
Summary & Recommendations
DeepSeek V4 is the strongest submission from the open-source camp in the past year:- ✅ Best open-source agent / coding model: V4-Pro is now the most capable open agent base — Claude-Sonnet-class performance at a fraction of the cost
- ✅ Best cost-performance for long docs: Flash at $0.14 / 1M tokens input + 1M context is arguably the price-performance ceiling for long-doc workloads
- ✅ Frictionless migration: OpenAI + Anthropic dual-endpoint support — change
base_urlandmodel, keep the rest
- A/B your existing DeepSeek-V3 / R1 traffic onto V4-Flash
- Upgrade agent / coding tasks to V4-Pro with
reasoning_effort=max - Stack API易 recharge bonuses to cut another ~15% off the cost
Sources & dates
- DeepSeek official:
api-docs.deepseek.com/zh-cn/news/news260424 - Third-party reports & reviews:
simonwillison.net/2026/Apr/24/deepseek-v4/,thenextweb.com,felloai.com/deepseek-v4/,techxplore.com,digitalapplied.com - Data retrieved: 2026-04-24