Highlights
- Aliyun official relay:
qwen3.6-max-previewandqwen3.6-flashare routed through Alibaba Cloud Bailian’s official channel — same stability as direct API access - Max tops coding leaderboards: Qwen3.6-Max-Preview claims #1 on six coding benchmarks (SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, SciCode), AIME 2025 93%, GPQA 86%, LiveCodeBench 79%
- Flash with 1M multimodal context: Qwen3.6-Flash is a 35B-A3B MoE supporting text / image / video input, native 256K context expandable to 1M tokens
- List price matches official: Max at $1.28 in / $7.68 out, Flash at $0.17 in / $1.02 out per 1M tokens
- Recharge promo ~15% off: List price equals the official rate; APIYI’s recharge bonus brings the effective price down to roughly 85% of list
- Billing: Pay-as-you-go - Chat, no resource pack required
Routing is via Alibaba Cloud Bailian’s official relay. Models are based on the Qwen3.6 family released in April 2026. Max-Preview is a preview build still under iteration; Flash is GA. Sources: Alibaba Cloud Bailian docs
help.aliyun.com/zh/model-studio/models, Qwen team blog qwen.ai/blog. Data retrieval date: 2026-04-27.Background
Qwen3.6 is the next-generation model family from Alibaba’s Tongyi Qianwen team, released across four tiers: Max (flagship), Plus (balanced), Flash (speed-first), and 35B-A3B (open-weight, local). Max-Preview debuted on Qwen Studio on 2026-04-20, and now ships alongside Flash through APIYI’s Aliyun official-relay group. Two practical wins for Chinese-context workloads: first, Max-Preview pushes domestic Coding/Agent benchmarks to a new high, posting 58.4 on SWE-bench Pro vs. previously-leading GLM-5.1’s 56.6. Second, Flash collapses the cost of high-frequency multimodal long-context workflows — at $0.17 per million input tokens, image / video + long-context pipelines finally have economics that scale.Deep dive
Core features
Max-Preview · coding flagship
A new ceiling for domestic coding models#1 on SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, SciCode — built for Agents and repo-scale tasks.
Flash · speed-first multimodal
35B-A3B MoE / 1M contextNative text / image / video input, 256K base context expandable to 1M, ~1/8 the unit price of Max.
Aliyun official relay
Same stability as officialRouted through Alibaba Cloud Bailian’s official channel; auth and rate-limit policies match the official portal, with low domestic latency.
Pay-as-you-go Chat
No resource pack requiredChat endpoint, billed per call. Combined with APIYI’s recharge bonus, the effective unit price lands around 85% of list.
Performance highlights (Qwen3.6-Max-Preview)
Numbers below come from the Qwen team’s official blog and third-party public benchmarks:| Benchmark | Qwen3.6-Max-Preview | GLM-5.1 | Qwen3.6-Plus |
|---|---|---|---|
| SWE-bench Pro (real-world SWE) | 58.4 | 56.6 | — |
| LiveCodeBench | 79% | — | — |
| AIME 2025 (math olympiad) | 93% | — | — |
| GPQA (science reasoning) | 86% | — | — |
| Terminal-Bench 2.0 | #1 | — | — |
| Coding benchmarks topped | 6 | — | — |
Specs
Model parameters
Qwen3.6-Max-Preview
- Model ID:
qwen3.6-max-preview - Architecture: dense large model (exact params undisclosed)
- Context: 262K tokens
- Input modalities: text
- Billing: pay-as-you-go - Chat
- Channel: Aliyun official relay
- Model ID:
qwen3.6-flash - Architecture: MoE, 35B total / 3B active (35B-A3B)
- Context: 256K base, expandable to 1M tokens
- Input modalities: text / image / video
- Billing: pay-as-you-go - Chat (tiered above 256K)
- Channel: Aliyun official relay
Use cases
Recommended scenarios
Coding Agent driver
Use
qwen3.6-max-preview as the primary model in Cursor / Claude Code / similar Agent workflows — SWE-bench Pro performance is on par with GPT-5 / Claude Opus flagships.High-volume multimodal
Run image / video understanding, long-document summarization, and bulk translation on
qwen3.6-flash. $0.17 / $1.02 per 1M tokens makes “scale-out” workflows finally pencil out.Long-context retrieval
Flash’s 256K → 1M expandable window suits “post-RAG full-document synthesis” pipelines, avoiding semantic breaks from chunking.
Domestic compliance first
The Aliyun relay channel is friendly to China-domestic compliance and data-residency-sensitive scenarios — a strong domestic alternative to GPT / Claude.
Code example
Best practices
- Task routing: default to Flash for general dialog and classification; only escalate to Max-Preview for coding / complex reasoning / Agent orchestration. You hold quality and minimize cost.
- Preview canary: Max-Preview is still iterating. For critical paths, run a canary with A/B comparison before flipping main traffic.
- Multimodal batching: Flash supports a 1M context, but extreme single-call lengths trigger tiered pricing. Slice ultra-long video first, then feed in chunks within 256K to control per-call cost.
Pricing & availability
Listed prices
| Model | Billing | Input | Output |
|---|---|---|---|
qwen3.6-max-preview | Pay-as-you-go - Chat | $1.28 / 1M tokens | $7.68 / 1M tokens |
qwen3.6-flash | Pay-as-you-go - Chat | $0.17 / 1M tokens | $1.02 / 1M tokens |
Listed prices match Alibaba Cloud’s official rates. APIYI’s current recharge bonus brings the effective unit price to roughly 85% of list.
Stack with APIYI’s recharge promo
Recharge promotion details: /en/faq/recharge-promotions Effective unit prices after the recharge bonus (~15% off):| Model | Effective input | Effective output |
|---|---|---|
qwen3.6-max-preview | ≈ $1.088 / 1M | ≈ $6.528 / 1M |
qwen3.6-flash | ≈ $0.1445 / 1M | ≈ $0.867 / 1M |
Wrap-up
Qwen3.6-Max-Preview and Qwen3.6-Flash are the latest reinforcements to APIYI’s Aliyun official-relay group: Max raises the bar for domestic coding/reasoning, Flash crushes the unit price of multimodal long-context workloads. Together they cover the full demand curve from heavy Agent workflows down to high-frequency dispatch.Sources: Alibaba Cloud Bailian docs (
help.aliyun.com/zh/model-studio/models), Qwen team blog (qwen.ai/blog), and Qwen3.6-Max-Preview evaluation report. Max-Preview release date: 2026-04-20. Article data retrieval date: 2026-04-27 (UTC+8).