Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt

Use this file to discover all available pages before exploring further.

Key Highlights

  • Flash Beats Pro: 76.2% Terminal-Bench 2.1, 83.6% MCP Atlas, 1656 Elo GDPval-AA — surpasses Gemini 3.1 Pro
  • ~4x Faster: ~289 tokens/sec output, roughly 4x faster than comparable frontier models
  • ~Half the Cost: About 50% cheaper than Gemini 3.1 Pro on input and output
  • 1M Context: 1M-token input window, 64K-token output, native multimodal input
  • Default in Google Products: Already default in Gemini App, AI Mode Search and Antigravity
  • Live Now: Available on API易 from May 20, 2026 at official Google pricing, with up to 20% recharge bonus

Background

On May 19, 2026, Google announced the Gemini 3.5 family at Google I/O 2026, starting with the Flash variant. Unusually, Google led with Flash rather than Pro — and Gemini 3.5 Flash actually beats Google’s own flagship Gemini 3.1 Pro (released Feb 2026) on most coding and agentic benchmarks. According to Google’s official numbers, Gemini 3.5 Flash beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2% vs 70.3%), MCP Atlas (83.6% vs 78.2%), Finance Agent v2 (57.9% vs 43.0%) and GDPval-AA (1656 vs 1314 Elo). It also runs roughly 4x faster than comparable frontier models and is priced about half of 3.1 Pro. Google has made 3.5 Flash the default model across the Gemini App, AI Mode in Search, and Google Antigravity. Gemini 3.5 Pro is expected next month. API易 has integrated the model on day one — May 20, 2026 — at full Google parity pricing.

Key Features

🏆 Smarter than 3.1 Pro

Leads Gemini 3.1 Pro on Terminal-Bench 2.1, MCP Atlas, Finance Agent v2, and GDPval-AA — especially strong on tool use and agentic workflows.

⚡ ~4x Faster

~289 output tokens/sec, around 4x faster than other frontier models — ideal for high-throughput and real-time apps.

🧠 Native Multimodal

1M-token input, 64K-token output. Accepts text, images, audio and video. Leads on CharXiv Reasoning at 84.2%.

💰 ~Half the Price

About 50% cheaper than Gemini 3.1 Pro, plus extra savings via API易 recharge bonuses.

Benchmarks

BenchmarkGemini 3.5 FlashGemini 3.1 ProNotes
Terminal-Bench 2.176.2%70.3%Terminal-style agent coding
MCP Atlas83.6%78.2%Tool/MCP use (also beats Claude Opus 4.7 & GPT-5.5)
Finance Agent v257.9%43.0%Finance agentic workflows
GDPval-AA1656 Elo1314General task Elo
CharXiv Reasoning84.2%-Chart understanding
Source: Google official blog blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/ and Google DeepMind model card (published May 19, 2026).

Specs

ItemGemini 3.5 Flash
API modelgemini-3.5-flash (no preview suffix)
Context window1,000,000 tokens
Max output64,000 tokens
Input modalitiesText, image, audio, video
OutputText
ReasoningThinking with encrypted reasoning context preserved across API calls
ToolingStructured outputs, multimodal function responses, combined tool use
ExcludedComputer Use (other Gemini 3 capabilities are inherited)
ChannelsGemini API, AI Studio, Vertex AI, Antigravity, API易

Code Example

import openai

client = openai.OpenAI(
    api_key="your-apiyi-api-key",
    base_url="https://api.apiyi.com/v1"
)

response = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[
        {"role": "user", "content": "Plan an agent that automates daily sales reporting."}
    ],
)
print(response.choices[0].message.content)

Pricing

ItemPriceNotes
Input Tokens$1.50 / 1M tokensGlobal region
Output Tokens$9.00 / 1M tokensIncludes thinking tokens
Cached Input$0.15 / 1M tokensPrompt cache hit
Non-global region$1.65 / $9.90Input / Output
Pricing on API易 matches Google’s official rates. With our recharge bonus program (see Recharge Promotions), effective cost goes down to roughly $1.20 input / $7.20 output per 1M tokens — up to 20% off.

Availability

  • ✅ Gemini API / AI Studio (Google)
  • ✅ Vertex AI (enterprise)
  • ✅ Gemini App, AI Mode Search, Google Antigravity (default model)
  • API易 — stable direct access, up to 20% recharge bonus ⭐ Recommended

Recommendation

Gemini 3.5 Flash isn’t just another iteration — it’s the first time a Flash model beats the same-generation Pro across the board, while being ~50% cheaper and ~4x faster. For agentic, tool-using and high-throughput workloads, it’s the most obvious upgrade target available today.

Who should switch now?

  • Agent / MCP developers — best-in-class MCP Atlas score
  • Coding products — Terminal-Bench beats 3.1 Pro at half the price
  • High-concurrency apps — 4x speed materially cuts latency and cost
  • Long-context workloads — full 1M-token window retained

When to keep using other models

  • Need Computer Use? Stay on Gemini 3 Pro / 3.1 Pro for now
  • Ultra-low-cost chatbots may still compare Gemini 2.5 Flash-Lite — but for agentic tasks, 3.5 Flash wins on cost/quality

Get Started

  1. Sign up: api.apiyi.com
  2. Add credits: Up to 20% off via recharge bonuses
  3. Read the docs: Gemini API Guide
  4. Switch the model: Change model to gemini-3.5-flash

Sources:
  • Google blog (Gemini 3.5 launch): blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/
  • Google DeepMind model card: deepmind.google/models/model-cards/gemini-3-5-flash/
  • Gemini API pricing: ai.google.dev/gemini-api/docs/pricing
  • Data retrieved: May 20, 2026