Gemini Native Format - API易文档中心

In addition to the OpenAI-compatible format, APIYI also lets you call the API using Gemini’s official native format. This means you can migrate existing Gemini code seamlessly, or use the native request bodies of Google’s official Gemini SDK to interact with APIYI.

Advantages

Seamless compatibility: Use Gemini’s official request and response structures directly, with no conversion required.
Full feature coverage: Supports all native Gemini features, including multimodal input (text, image, video), function calling, code execution, and more.
Reasoning capabilities: Full support for chain-of-thought reasoning in the Gemini 2.5 series.
Easy migration: If you already have a Gemini project, you can switch to APIYI quickly and enjoy a more flexible service.

Configuration and usage

To use the Gemini native format, send API requests to the specific /v1beta/ endpoints.

Environment setup

We recommend using Google’s latest official google-genai Python SDK (the unified Gen AI SDK). The legacy google-generative-ai package was deprecated on November 30, 2025.

First, make sure the google-genai library is installed:

pip install google-genai

Basic configuration

Configure the APIYI service endpoint:

from google import genai

{/* Configure APIYI service */}
client = genai.Client(
    api_key="YOUR_API_KEY",  # Your APIYI key
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Generate content with a Gemini model */}
response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents='Your prompt here'
)
print(response.text)

Basic text generation

Non-streaming response

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Send the request */}
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents="Tell me a science fiction story about artificial intelligence."
)

{/* Print the result */}
print(response.text)

Streaming response

For long text generation, use streaming responses for a better user experience:

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Streaming generation */}
response = client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents="Please write a detailed article about quantum computing."
)

{/* Real-time output */}
for chunk in response:
    print(chunk.text, end='', flush=True)

Reasoning with the Gemini 2.5 series

The Gemini 2.5 series supports powerful chain-of-thought reasoning that can expose the model’s thought process.

Reasoning model types

gemini-2.5-flash: Hybrid reasoning model. You can adjust the reasoning depth with the thinking_budget parameter (range: 0-16384 tokens).
gemini-2.5-pro: Pure reasoning model. Chain-of-thought reasoning is always on and cannot be disabled.

Controlling the reasoning budget

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Configure the reasoning budget */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="How would you design an efficient distributed caching system? Please analyze the technical options in detail.",
    config={
        "thinking_budget": 8192,  # Reasoning budget: 0-16384
        "temperature": 1.0        # Temperature range: 0-2
    }
)

print(response.text)

Reasoning budget notes:

thinking_budget=0: Reasoning disabled, fastest response.
thinking_budget=1024-8192: Medium reasoning depth, balances speed and quality.
thinking_budget=16384: Maximum reasoning depth, best for complex problems.

Revealing the thought process

If you want to see the model’s thought process (thinking tokens), set include_thoughts=True:

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Enable the thought process display */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="Analyze the time complexity of the following code: def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)",
    config={
        "thinking_budget": 8192,
        "include_thoughts": True  # Show the thought process
    }
)

{/* Iterate over all parts, including the thought process */}
for part in response.candidates[0].content.parts:
    if hasattr(part, 'thought') and part.thought:
        print(f"💭 Thought process: {part.text}")
    else:
        print(f"📝 Final answer: {part.text}")

Billing note: Thinking tokens produced during reasoning count toward the output token cost. Using a high reasoning budget may increase your bill.

Multimodal input

Gemini models can process images, audio, video, and other media types.

Image processing

from google import genai
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Load a local image */}
img = Image.open('path/to/your/image.jpg')

{/* Multimodal request */}
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=[
        "Describe this image in detail, including the main elements, colors, and composition.",
        img
    ]
)

print(response.text)

Video processing

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Upload a video file */}
video_file = client.files.upload(path='path/to/video.mp4')

{/* Analyze the video content */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        "Summarize the main content and key points of this video.",
        video_file
    ]
)

print(response.text)

Audio processing

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Upload an audio file */}
audio_file = client.files.upload(path='path/to/audio.mp3')

{/* Transcribe and analyze the audio */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        "Transcribe this audio and summarize the main topics.",
        audio_file
    ]
)

print(response.text)

Media resolution optimization

To save on token costs, you can adjust the resolution of media files:

from google import genai
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Process the image at a lower resolution to reduce cost */}
img = Image.open('large_image.jpg')

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=[
        "What is the subject of this image?",
        img
    ],
    config={
        "response_modalities": ["TEXT"],
        "media_resolution": "MEDIA_RESOLUTION_LOW"  # LOW | MEDIUM | HIGH
    }
)

print(response.text)

Media file limits:

File size: under 20 MB.
Supported formats: images (JPG, PNG, WebP), audio (MP3, WAV), video (MP4, MOV).
Upload methods: use client.files.upload() or pass a PIL Image object directly.

Code execution

Gemini models can execute Python code automatically, which makes them ideal for data analysis scenarios.

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Data analysis example with the code execution tool enabled */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="""
Suppose I have the following sales data:
- Product A: 100 units at $50 each
- Product B: 200 units at $30 each
- Product C: 150 units at $40 each

Please calculate:
1. Total revenue
2. Average unit price
3. Draw a bar chart showing revenue distribution
""",
    config={'tools': [{'code_execution': {}}]}
)

{/* Inspect the executed code and its results */}
for part in response.candidates[0].content.parts:
    if hasattr(part, 'executable_code'):
        print(f"Executed code:\n{part.executable_code.code}")
    if hasattr(part, 'code_execution_result'):
        print(f"Execution result:\n{part.code_execution_result.output}")
    if hasattr(part, 'text'):
        print(f"Analysis:\n{part.text}")

Code execution limits:

Only Python code is supported.
The execution environment is a sandbox with no network or file system access.
Execution time is subject to a timeout.

Function calling (tool use)

The Gemini native format fully supports function calling, which lets the model call external tools.

Defining tools

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Define a weather lookup tool */}
tools = [
    {
        "function_declarations": [
            {
                "name": "get_current_weather",
                "description": "Get the current weather for a given city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, for example: Beijing, Shanghai"
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "Temperature unit"
                        }
                    },
                    "required": ["location"]
                }
            }
        ]
    }
]

Automatic tool calling

from google import genai

{/* Configure the tool-calling mode */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="What's the weather in Beijing right now? What's the temperature in Celsius?",
    config={
        'tools': tools,
        'tool_config': {'function_calling_config': {'mode': 'AUTO'}}
    }
)

{/* Check whether a tool needs to be called */}
function_call = response.candidates[0].content.parts[0].function_call

if function_call:
    print(f"Calling tool: {function_call.name}")
    print(f"Arguments: {dict(function_call.args)}")

    {/* Call your real weather API here */}
    def get_current_weather(location, unit="celsius"):
        # Replace this with a real weather API call
        return {
            "location": location,
            "temperature": 22,
            "unit": unit,
            "condition": "sunny"
        }

    {/* Get the tool execution result */}
    weather_data = get_current_weather(**dict(function_call.args))

    {/* Return the result to the model */}
    from google.genai import types

    response = client.models.generate_content(
        model='gemini-2.5-flash',
        contents=[
            types.Content(
                parts=[
                    types.Part(
                        function_response=types.FunctionResponse(
                            name=function_call.name,
                            response=weather_data
                        )
                    )
                ]
            )
        ]
    )

    print(f"Final answer: {response.text}")

Tool-calling modes:

mode: 'AUTO': The model decides whether to call a tool (recommended).
mode: 'ANY': The model is forced to call a tool.
mode: 'NONE': Tool calling is disabled.

Context caching

APIYI automatically enables implicit context caching for the Gemini native format, which can significantly reduce the cost of repeated conversations.

Caching mechanism

Enabled automatically: No manual configuration is required; context is cached automatically.
Cache pricing: Cached tokens are billed at 25% of the normal input price.
Expiration: Caches expire automatically after a certain period.

Detecting cache hits

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Send a request */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="Hello, please introduce quantum computing."
)

{/* Check for cache hits */}
usage = response.usage_metadata
if hasattr(usage, 'cached_content_token_count'):
    print(f"Cached tokens: {usage.cached_content_token_count}")
    print(f"Input tokens: {usage.prompt_token_count}")
    print(f"Output tokens: {usage.candidates_token_count}")

Caching benefits:

For long-context conversations, you can save up to 75% on input token costs.
Suitable for multi-turn dialogues, document analysis, code review, and similar scenarios.
Cache hits are not guaranteed; actual savings depend on the scenario.

Token usage tracking

Every API call returns detailed token usage information.

Retrieving usage statistics

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="Explain the basic principles of machine learning."
)

{/* Get usage metadata */}
usage = response.usage_metadata

print(f"Prompt tokens: {usage.prompt_token_count}")
print(f"Output tokens: {usage.candidates_token_count}")
print(f"Total tokens: {usage.total_token_count}")

{/* If cache hits occurred */}
if hasattr(usage, 'cached_content_token_count'):
    print(f"Cached tokens: {usage.cached_content_token_count}")

{/* If there are reasoning tokens (Gemini 2.5 series) */}
if hasattr(usage, 'thoughts_token_count'):
    print(f"Reasoning tokens: {usage.thoughts_token_count}")

Token field reference

Field	Description	Billing
`prompt_token_count`	Number of input prompt tokens	Billed at the input price
`candidates_token_count`	Number of output content tokens	Billed at the output price
`cached_content_token_count`	Number of tokens served from cache	Billed at 25% of the input price
`thoughts_token_count`	Number of tokens used during reasoning	Billed at the output price
`total_token_count`	Total token count	—

Things to keep in mind

API key

Make sure you use your APIYI key, not a Google AI Studio key.

Endpoint configuration

The Gemini native format uses https://api.apiyi.com as the base URL, which is compatible with Google’s official REST API format.

Model names

Use Gemini’s official model names directly, such as gemini-3-pro-preview and gemini-2.5-flash.

Multimodal support

Full support for Gemini’s official multimodal data format — you can pass images, video, and audio directly.

Key limits:

Media files must be smaller than 20 MB.
Code execution supports only Python and runs in a sandbox.
Reasoning tokens add to your output cost, so set thinking_budget carefully.

Compared with the OpenAI-compatible format

Feature	Gemini native format	OpenAI-compatible format
Endpoint	`https://api.apiyi.com`	`https://api.apiyi.com/v1/chat/completions`
SDK	`google-genai`	`openai`
Reasoning control	`thinking_budget` (0-16384)	`reasoning_effort` (low/medium/high)
Thought process	`include_thoughts=True`	Not supported
Code execution	`tools=[{'code_execution': {}}]`	Not supported
Media upload	`client.files.upload()`	Base64 encoding
Cache detection	`cached_content_token_count`	No dedicated field

If you need to call other types of models (such as the OpenAI series) or use the OpenAI-compatible format, see the Using OpenAI Official SDK documentation.

Complete example

Below is an end-to-end example that combines several features:

from google import genai
from PIL import Image

{/* Configuration */}
client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Define tools */}
tools = [{
    "function_declarations": [{
        "name": "search_database",
        "description": "Search the product database",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search keyword"}
            },
            "required": ["query"]
        }
    }]
}]

{/* Multimodal input + tool calling + streaming output */}
img = Image.open('product.jpg')

response = client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents=[
        "What product is in this image? Please search the database for similar products and recommend some to me.",
        img
    ],
    config={
        'tools': tools,
        'tool_config': {'function_calling_config': {'mode': 'AUTO'}},
        'thinking_budget': 4096,
        'temperature': 0.7,
        'include_thoughts': False
    }
)

{/* Handle the streaming response */}
for chunk in response:
    if chunk.text:
        print(chunk.text, end='', flush=True)

    {/* Check for tool calls */}
    if chunk.candidates and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if hasattr(part, 'function_call'):
                print(f"\n[Calling tool: {part.function_call.name}]")

{/* Review final usage (wait until the stream is fully consumed) */}
if hasattr(response, 'usage_metadata'):
    print(f"\n\nToken usage: {response.usage_metadata.total_token_count}")

Getting help

If you run into issues with the Gemini native format:

Read the API reference for detailed specifications.
Check the model list for available Gemini models.
Contact technical support for help.

Basics

Basic API

Image API

Video API

Multimodal Understanding API

Text API

Documentation Index

​Advantages

​Configuration and usage

​Environment setup

​Basic configuration

​Basic text generation

​Non-streaming response

​Streaming response

​Reasoning with the Gemini 2.5 series

​Reasoning model types

​Controlling the reasoning budget

​Revealing the thought process

​Multimodal input

​Image processing

​Video processing

​Audio processing

​Media resolution optimization

​Code execution

​Function calling (tool use)

​Defining tools

​Automatic tool calling

​Context caching

​Caching mechanism

​Detecting cache hits

​Token usage tracking

​Retrieving usage statistics

​Token field reference

​Things to keep in mind

API key

Endpoint configuration

Model names

Multimodal support

​Compared with the OpenAI-compatible format

​Complete example

​Getting help

Advantages

Configuration and usage

Environment setup

Basic configuration

Basic text generation

Non-streaming response

Streaming response

Reasoning with the Gemini 2.5 series

Reasoning model types

Controlling the reasoning budget

Revealing the thought process

Multimodal input

Image processing

Video processing

Audio processing

Media resolution optimization

Code execution

Function calling (tool use)

Defining tools

Automatic tool calling

Context caching

Caching mechanism

Detecting cache hits

Token usage tracking

Retrieving usage statistics

Token field reference

Things to keep in mind

Compared with the OpenAI-compatible format

Complete example

Getting help