Documentation Index
Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt
Use this file to discover all available pages before exploring further.
Key Highlights
- Now Generally Available: Google announced GA for Gemini 3.1 Flash Lite on May 8, 2026 (UTC+8) — production-ready
- Model Identifier Updated:
gemini-3.1-flash-lite-preview→gemini-3.1-flash-lite. Preview users should plan migration - Massive Speed Boost: 64% faster output than 2.5 Flash (381.9 vs 232.3 tokens/sec) and 2.5× faster time-to-first-token
- Official Pricing Parity: $0.25 per 1M input / $1.50 per 1M output — identical to Google’s official rates
- Stackable Top-up Promo: APIYi top-up bonuses bring effective price down to 79–85% of list (up to 21% off)
Background
On March 3, 2026, Google launched Gemini 3.1 Flash Lite Preview, targeting the “high-throughput agents + ultra-low latency” niche. During the two-month preview, agent-heavy customers like Latitude, Cartwheel, Whering, and HubX gave strong feedback on instruction-following precision, latency, unit cost, and multimodal stability. On May 8, 2026 (UTC+8), Google formally announced general availability. The model identifier dropped the-preview suffix and became gemini-3.1-flash-lite. With the API contract, behavior, and billing rules now stabilized, it’s safe to wire into production.
APIYi has synced GA via the official direct-connection (proxy) channel. Pricing matches Google’s official rates exactly, and stacking APIYi’s top-up bonus brings the effective price below list — making this one of the cheapest reliable ways to consume the Gemini 3.1 lightweight tier.
Detailed Analysis
What changed from Preview to GA
Model Identifier
- Old:
gemini-3.1-flash-lite-preview - New:
gemini-3.1-flash-lite - Old name still works, migration recommended
API Stability
- Frozen interface contract
- Stable rate limits and billing
- Safe for production traffic
Performance Polish
- Higher sustained output speed
- Lower TTFT
- More stable function calling & structured output
Ecosystem Maturity
- Batch API & Caching production-ready
- Thinking levels available in production
- Full multimodal input stable
Performance highlights (GA benchmarks)
Per Artificial Analysis and Google’s official numbers:| Metric | Gemini 3.1 Flash Lite | Gemini 2.5 Flash | Delta |
|---|---|---|---|
| Output speed (tokens/sec) | 381.9 | 232.3 | +64% |
| Time-to-first-token | 2.5× faster than 2.5 Flash | baseline | -60% |
| GPQA Diamond | 86.9% | — | Tier-leading |
| MMMU Pro (multimodal reasoning) | 76.8% | — | Tier-leading |
| Arena Elo | 1432 | — | — |
| Artificial Analysis Intelligence Index | 34 (median in price tier: 21) | — | Well above |
Technical specs
| Spec | Value |
|---|---|
| Model name | gemini-3.1-flash-lite |
| Context window | 1,048,576 tokens (1M+) |
| Max output | 65,536 tokens (64K) |
| Input modalities | Text, image, video, audio, PDF |
| Output modality | Text |
| Knowledge cutoff | January 2025 |
| Latest update | May 2026 |
| Thinking | ✅ adjustable levels |
| Function calling | ✅ |
| Structured output | ✅ |
| Code execution | ✅ |
| File search / URL context | ✅ |
| Search Grounding / Maps Grounding | ✅ |
| Batch API / Caching / Flex / Priority | ✅ |
| Channel | APIYi official direct connection |
Real-World Use Cases
Recommended scenarios
Production Agent Pipelines
- Tool calling / routing / multi-step orchestration
- High-concurrency lightweight decision nodes
- SLA-sensitive agent tasks needing stable APIs
High-Throughput Data Processing
- Tabular / form / PDF structured extraction
- Bulk content moderation, classification, tagging
- Mass log summarization & normalization
Low-Latency Interaction
- Real-time translation & interpretation assist
- UI generation, dashboard composition
- First-touch customer support, intent detection
Lightweight Multimodal Tasks
- Image / video understanding
- Audio transcription + key info extraction
- PDF parsing & field extraction
Code examples
Calling the GA build via APIYi:Best practices
Production rollout tips
- Smooth migration from Preview: replace
gemini-3.1-flash-lite-previewwithgemini-3.1-flash-lite; shadow-traffic compare before cutover - Thinking levels on demand: keep Thinking off for routing/classification to maximize speed; turn on for multi-step reasoning
- Prefer structured output: pair with
response_format={"type": "json_object"}for robust downstream parsing - Batch + Cache: route high-volume jobs through Batch API; enable Caching on repeated context for an extra 90% off cached input
- Watch verbosity: Flash Lite tends to be talkative — set
max_tokensexplicitly on cost-sensitive endpoints
Pricing & Availability
APIYi official-direct pricing
Identical to Google's official rates
| Type | Price |
|---|---|
| Text / image / video input | $0.250 per 1M tokens |
| Output | $1.500 per 1M tokens |
| Cached input | $0.025 per 1M tokens (~10% of list) |
- Official direct-connection (proxy) channel, stable
- Pricing identical to Google’s official rates
- Stack with Batch API for further savings
Stackable top-up promotion (79–85% of list)
APIYi runs ongoing top-up bonus promotions. Stacking these on top of the official-direct pricing brings effective cost for Gemini 3.1 Flash Lite down to 79–85% of list:| Tier | Bonus | Effective price |
|---|---|---|
| Starter | +18% | ~85% of list |
| Pro | +22% | ~82% of list |
| High-throughput | +27% | ~79% of list |
Summary & Recommendations
GA Gemini 3.1 Flash Lite pushes “speed / price / multimodality / agent capability” to the top of its tier all at once:- 64% faster output / 2.5× faster TTFT vs 2.5 Flash — visibly snappier on long agent loops
- GPQA Diamond 86.9% / MMMU Pro 76.8% — first-tier on reasoning and multimodal at this price
- $0.25 / $1.50 per 1M tokens, plus APIYi top-up promo down to ~79% of list
- Stable contract post-GA — production-ready
- Teams on Preview: cut over to the GA model name to lock in a stable contract
- High-volume agent teams: route routing / tool-calling / extraction nodes through Flash Lite; combine with Batch + Cache to drive unit cost to the floor
- Multimodal lightweight teams: cover text, image, video, audio, PDF with one model — less SDK sprawl
Sources & data freshness
- Google official blog:
blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/ - Google GA announcement:
cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available - Model docs:
ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-lite - Benchmarks: Artificial Analysis (
artificialanalysis.ai/models/gemini-3-1-flash-lite-preview) - Data captured: May 9, 2026 (UTC+8)
model field at gemini-3.1-flash-lite, and you’re live with a stable GA API, official-parity pricing, and top-up bonuses on top.