TL;DR
APIYI’s Gemini native endpoint fully supports Google’s official web search: use/v1beta generateContent with the google_search tool. gemini-3.5-flash, gemini-3.1-flash-lite, and gemini-3.1-pro-preview were all verified to genuinely search the web and return up-to-date, source-cited information. A default-group key works out of the box — no special activation needed.
Real-world availability (test data, 2026-06-11)
| Model | Web result | groundingMetadata | Searches per Q&A | Latency |
|---|---|---|---|---|
| gemini-3.5-flash | ✅ Real same-week news, multi-query cross-checking | ✅ Complete | 4–7 | 24–45s |
| gemini-3.1-flash-lite | ✅ Real same-week news | ✅ (occasionally missing, see notes) | 2 | ~5s |
| gemini-3.1-pro-preview | ✅ Real same-week news, precise search after deep thinking | ✅ | 1 | ~45s |
Quick start
cURL
Python (google-genai SDK)
How to verify the search actually ran
On success,candidates[0].groundingMetadata contains the fields below; if they are absent, no search happened:
| Field | Meaning |
|---|---|
webSearchQueries | Array of search queries the model actually executed (array length = number of searches) |
groundingChunks | Retrieved sources (URI + title) |
groundingSupports | Mapping between answer text segments and sources (startIndex/endIndex) |
searchEntryPoint | HTML/CSS for rendering the required Google Search Suggestions |
Billing (important)
Web search incurs a tool-call fee, made up of two parts:| Item | Price | Notes |
|---|---|---|
| Tool-call fee | $14 / 1,000 searches ($0.014 per search) | Tool name: google_search; billed by the number of searches actually executed, i.e. the length of groundingMetadata.webSearchQueries — one question may trigger multiple searches (measured: pro-preview 1, flash-lite 2, 3.5-flash 4–7) |
| Model token fee | Standard model price | Unlike OpenAI web search, retrieved content is NOT injected as input tokens (promptTokenCount stays nearly identical, 31–43 tokens measured); the bulk of the cost is thinking + output tokens (one deep web-grounded Q&A on 3.5-flash consumed 3,500–4,900 thought tokens, billed at the output rate) |
Reference total cost per web-grounded Q&A (search fee + tokens): flash-lite ≈ $0.03; 3.5-flash ≈ $0.08–0.16; 3.1-pro-preview ≈ $0.06. To control cost, constrain search behavior in the prompt (e.g. “search at most 2 times”) or pick a model that searches less.
Notes
- You must use the native endpoint: all search declarations on the OpenAI-compatible mode are silently ignored without errors. For OpenAI-SDK projects, switch to the google-genai SDK (
base_urlset tohttps://api.apiyi.com, without/v1). - Treat groundingMetadata as the source of truth: in testing, flash-lite occasionally (1 out of 4 runs) returned no groundingMetadata. For strict scenarios, validate the field’s presence and retry when missing.
- Give thinking models enough
maxOutputTokens(at least 4096 recommended): 3.5-flash / 3.1-pro-preview consume 1,900–4,900 thinking tokens when grounding; a small limit truncates the answer. - Both
{"google_search": {}}and camelCase{"googleSearch": {}}work; the legacygoogle_search_retrievalbelongs to the Gemini 1.5 era — usegoogle_searchfor all current models. - Web search can be combined with other tools such as URL Context (official Google docs:
ai.google.dev/gemini-api/docs/google-search).
FAQ
Q: How do I confirm the answer really used the web? A: Check thatcandidates[0].groundingMetadata exists, webSearchQueries is non-empty, and groundingChunks contains source URIs. Answer text alone without these fields means the model answered from training data.
Q: Do I need a different group or a special key?
A: No. For Gemini models, a default-group key can call web search directly (same as OpenAI web search; unlike Claude native search, which requires the ClaudeOfficial beta group).
Q: How do I see the search count, and does it vary by model?
A: Count the length of groundingMetadata.webSearchQueries. For the same question, it varies a lot: pro-preview 1, flash-lite 2, 3.5-flash 4–7.
Q: Which models are supported?
A: gemini-3.5-flash, gemini-3.1-flash-lite, and gemini-3.1-pro-preview are verified. Other Gemini 2.5+ models should also support the google_search tool in principle — run the verification check from the FAQ above before relying on it.
Related Docs
Gemini Native Calls
google-genai SDK setup, streaming, thinking control
Gemini Function Calling
Custom tool calls, composable with web search