Qwen3.6 Max-Preview & Flash Live on APIYI

Highlights

Aliyun official relay: qwen3.6-max-preview and qwen3.6-flash are routed through Alibaba Cloud Bailian’s official channel — same stability as direct API access
Max tops coding leaderboards: Qwen3.6-Max-Preview claims #1 on six coding benchmarks (SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, SciCode), AIME 2025 93%, GPQA 86%, LiveCodeBench 79%
Flash with 1M multimodal context: Qwen3.6-Flash is a 35B-A3B MoE supporting text / image / video input, native 256K context expandable to 1M tokens
List price matches official: Max at $1.28 in / $7.68 out, Flash at $0.17 in / $1.02 out per 1M tokens
Recharge promo ~15% off: List price equals the official rate; APIYI’s recharge bonus brings the effective price down to roughly 85% of list
Billing: Pay-as-you-go - Chat, no resource pack required

Routing is via Alibaba Cloud Bailian’s official relay. Models are based on the Qwen3.6 family released in April 2026. Max-Preview is a preview build still under iteration; Flash is GA. Sources: Alibaba Cloud Bailian docs help.aliyun.com/zh/model-studio/models, Qwen team blog qwen.ai/blog. Data retrieval date: 2026-04-27.

Background

Qwen3.6 is the next-generation model family from Alibaba’s Tongyi Qianwen team, released across four tiers: Max (flagship), Plus (balanced), Flash (speed-first), and 35B-A3B (open-weight, local). Max-Preview debuted on Qwen Studio on 2026-04-20, and now ships alongside Flash through APIYI’s Aliyun official-relay group. Two practical wins for Chinese-context workloads: first, Max-Preview pushes domestic Coding/Agent benchmarks to a new high, posting 58.4 on SWE-bench Pro vs. previously-leading GLM-5.1’s 56.6. Second, Flash collapses the cost of high-frequency multimodal long-context workflows — at $0.17 per million input tokens, image / video + long-context pipelines finally have economics that scale.

Deep dive

Core features

Max-Preview · coding flagship

A new ceiling for domestic coding models#1 on SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, SciCode — built for Agents and repo-scale tasks.

Flash · speed-first multimodal

35B-A3B MoE / 1M contextNative text / image / video input, 256K base context expandable to 1M, ~1/8 the unit price of Max.

Aliyun official relay

Same stability as officialRouted through Alibaba Cloud Bailian’s official channel; auth and rate-limit policies match the official portal, with low domestic latency.

Pay-as-you-go Chat

No resource pack requiredChat endpoint, billed per call. Combined with APIYI’s recharge bonus, the effective unit price lands around 85% of list.

Performance highlights (Qwen3.6-Max-Preview)

Numbers below come from the Qwen team’s official blog and third-party public benchmarks:

Benchmark	Qwen3.6-Max-Preview	GLM-5.1	Qwen3.6-Plus
SWE-bench Pro (real-world SWE)	58.4	56.6	—
LiveCodeBench	79%	—	—
AIME 2025 (math olympiad)	93%	—	—
GPQA (science reasoning)	86%	—	—
Terminal-Bench 2.0	#1	—	—
Coding benchmarks topped	6	—	—

Max-Preview is explicitly labeled Preview — the Qwen team has stated future revisions will keep gaining. For critical paths, run a small canary + A/B before flipping main traffic.

Specs

Model parameters

Qwen3.6-Max-Preview

Model ID: qwen3.6-max-preview
Architecture: dense large model (exact params undisclosed)
Context: 262K tokens
Input modalities: text
Billing: pay-as-you-go - Chat
Channel: Aliyun official relay

Qwen3.6-Flash

Model ID: qwen3.6-flash
Architecture: MoE, 35B total / 3B active (35B-A3B)
Context: 256K base, expandable to 1M tokens
Input modalities: text / image / video
Billing: pay-as-you-go - Chat (tiered above 256K)
Channel: Aliyun official relay

Qwen3.6-Flash uses tiered pricing on Alibaba Cloud — the total input-token count of a single request determines the unit-price tier for that request. APIYI’s listed price corresponds to the base tier; for super-long requests above 256K, watch the actual billed receipt.

Use cases

Recommended scenarios

Coding Agent driver

Use qwen3.6-max-preview as the primary model in Cursor / Claude Code / similar Agent workflows — SWE-bench Pro performance is on par with GPT-5 / Claude Opus flagships.

High-volume multimodal

Run image / video understanding, long-document summarization, and bulk translation on qwen3.6-flash. $0.17 / $1.02 per 1M tokens makes “scale-out” workflows finally pencil out.

Long-context retrieval

Flash’s 256K → 1M expandable window suits “post-RAG full-document synthesis” pipelines, avoiding semantic breaks from chunking.

Domestic compliance first

The Aliyun relay channel is friendly to China-domestic compliance and data-residency-sensitive scenarios — a strong domestic alternative to GPT / Claude.

Code example

import openai

client = openai.OpenAI(
    api_key="your-api-key",
    base_url="https://api.apiyi.com/v1"
)

# Coding flagship: run an Agent task on Max-Preview
resp = client.chat.completions.create(
    model="qwen3.6-max-preview",
    messages=[
        {"role": "system", "content": "You are a senior Python engineer. Return changes as a unified diff."},
        {"role": "user", "content": "Add type hints and fix any latent bugs in this snippet ..."}
    ]
)
print(resp.choices[0].message.content)

# Speed-first multimodal: run image + text on Flash
resp = client.chat.completions.create(
    model="qwen3.6-flash",
    messages=[
        {"role": "user", "content": [
            {"type": "text", "text": "Describe the key information in this image."},
            {"type": "image_url", "image_url": {"url": "https://your-image-url.png"}}
        ]}
    ]
)
print(resp.choices[0].message.content)

Best practices

Task routing: default to Flash for general dialog and classification; only escalate to Max-Preview for coding / complex reasoning / Agent orchestration. You hold quality and minimize cost.
Preview canary: Max-Preview is still iterating. For critical paths, run a canary with A/B comparison before flipping main traffic.
Multimodal batching: Flash supports a 1M context, but extreme single-call lengths trigger tiered pricing. Slice ultra-long video first, then feed in chunks within 256K to control per-call cost.

Pricing & availability

Listed prices

Model	Billing	Input	Output
`qwen3.6-max-preview`	Pay-as-you-go - Chat	$1.28 / 1M tokens	$7.68 / 1M tokens
`qwen3.6-flash`	Pay-as-you-go - Chat	$0.17 / 1M tokens	$1.02 / 1M tokens

Listed prices match Alibaba Cloud’s official rates. APIYI’s current recharge bonus brings the effective unit price to roughly 85% of list.

Recharge promotion details: /en/faq/recharge-promotions Effective unit prices after the recharge bonus (~15% off):

Model	Effective input	Effective output
`qwen3.6-max-preview`	≈ $1.088 / 1M	≈ $6.528 / 1M
`qwen3.6-flash`	≈ $0.1445 / 1M	≈ $0.867 / 1M

Wrap-up

Qwen3.6-Max-Preview and Qwen3.6-Flash are the latest reinforcements to APIYI’s Aliyun official-relay group: Max raises the bar for domestic coding/reasoning, Flash crushes the unit price of multimodal long-context workloads. Together they cover the full demand curve from heavy Agent workflows down to high-frequency dispatch.

Recommended strategy: Flash by default + Max-Preview on escalation. Route routine dialog and multimodal batches to Flash; escalate to Max-Preview for coding, Agent, and complex reasoning. Stack with the recharge bonus for a roughly 85%-of-list effective rate — the best price/performance combo currently available in the Aliyun official-relay channel.

Sources: Alibaba Cloud Bailian docs (help.aliyun.com/zh/model-studio/models), Qwen team blog (qwen.ai/blog), and Qwen3.6-Max-Preview evaluation report. Max-Preview release date: 2026-04-20. Article data retrieval date: 2026-04-27 (UTC+8).

Changelog

AI Radar

Qwen3.6 Max-Preview & Flash Live on APIYI

Highlights

Background

Deep dive

Core features

Max-Preview · coding flagship

Flash · speed-first multimodal

Aliyun official relay

Pay-as-you-go Chat

Performance highlights (Qwen3.6-Max-Preview)

Specs

Model parameters

Use cases

Recommended scenarios

Coding Agent driver

High-volume multimodal

Long-context retrieval

Domestic compliance first

Code example

Best practices

Pricing & availability

Listed prices

Wrap-up

Changelog

AI Radar

​Highlights

​Background

​Deep dive

​Core features

Max-Preview · coding flagship

Flash · speed-first multimodal

Aliyun official relay

Pay-as-you-go Chat

​Performance highlights (Qwen3.6-Max-Preview)

​Specs

Model parameters

​Use cases

​Recommended scenarios

Coding Agent driver

High-volume multimodal

Long-context retrieval

Domestic compliance first

​Code example

​Best practices

​Pricing & availability

​Listed prices

​Stack with APIYI’s recharge promo

​Wrap-up

Highlights

Background

Deep dive

Core features

Performance highlights (Qwen3.6-Max-Preview)

Specs

Use cases

Recommended scenarios

Code example

Best practices

Pricing & availability

Listed prices

Stack with APIYI’s recharge promo

Wrap-up