Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.apiyi.com/llms.txt

Use this file to discover all available pages before exploring further.

In addition to the OpenAI-compatible format, APIYI also lets you call the API using Gemini’s official native format. This means you can migrate existing Gemini code seamlessly, or use the native request bodies of Google’s official Gemini SDK to interact with APIYI.

Advantages

  • Seamless compatibility: Use Gemini’s official request and response structures directly, with no conversion required.
  • Full feature coverage: Supports all native Gemini features, including multimodal input (text, image, video), function calling, code execution, and more.
  • Reasoning capabilities: Full support for chain-of-thought reasoning in the Gemini 2.5 series.
  • Easy migration: If you already have a Gemini project, you can switch to APIYI quickly and enjoy a more flexible service.

Configuration and usage

To use the Gemini native format, send API requests to the specific /v1beta/ endpoints.

Environment setup

We recommend using Google’s latest official google-genai Python SDK (the unified Gen AI SDK). The legacy google-generative-ai package was deprecated on November 30, 2025.
First, make sure the google-genai library is installed:
pip install google-genai

Basic configuration

Configure the APIYI service endpoint:
from google import genai

{/* Configure APIYI service */}
client = genai.Client(
    api_key="YOUR_API_KEY",  # Your APIYI key
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Generate content with a Gemini model */}
response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents='Your prompt here'
)
print(response.text)

Basic text generation

Non-streaming response

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Send the request */}
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents="Tell me a science fiction story about artificial intelligence."
)

{/* Print the result */}
print(response.text)

Streaming response

For long text generation, use streaming responses for a better user experience:
from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Streaming generation */}
response = client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents="Please write a detailed article about quantum computing."
)

{/* Real-time output */}
for chunk in response:
    print(chunk.text, end='', flush=True)

Reasoning with the Gemini 2.5 series

The Gemini 2.5 series supports powerful chain-of-thought reasoning that can expose the model’s thought process.

Reasoning model types

  • gemini-2.5-flash: Hybrid reasoning model. You can adjust the reasoning depth with the thinking_budget parameter (range: 0-16384 tokens).
  • gemini-2.5-pro: Pure reasoning model. Chain-of-thought reasoning is always on and cannot be disabled.

Controlling the reasoning budget

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Configure the reasoning budget */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="How would you design an efficient distributed caching system? Please analyze the technical options in detail.",
    config={
        "thinking_budget": 8192,  # Reasoning budget: 0-16384
        "temperature": 1.0        # Temperature range: 0-2
    }
)

print(response.text)
Reasoning budget notes:
  • thinking_budget=0: Reasoning disabled, fastest response.
  • thinking_budget=1024-8192: Medium reasoning depth, balances speed and quality.
  • thinking_budget=16384: Maximum reasoning depth, best for complex problems.

Revealing the thought process

If you want to see the model’s thought process (thinking tokens), set include_thoughts=True:
from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Enable the thought process display */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="Analyze the time complexity of the following code: def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)",
    config={
        "thinking_budget": 8192,
        "include_thoughts": True  # Show the thought process
    }
)

{/* Iterate over all parts, including the thought process */}
for part in response.candidates[0].content.parts:
    if hasattr(part, 'thought') and part.thought:
        print(f"💭 Thought process: {part.text}")
    else:
        print(f"📝 Final answer: {part.text}")
Billing note: Thinking tokens produced during reasoning count toward the output token cost. Using a high reasoning budget may increase your bill.

Multimodal input

Gemini models can process images, audio, video, and other media types.

Image processing

from google import genai
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Load a local image */}
img = Image.open('path/to/your/image.jpg')

{/* Multimodal request */}
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=[
        "Describe this image in detail, including the main elements, colors, and composition.",
        img
    ]
)

print(response.text)

Video processing

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Upload a video file */}
video_file = client.files.upload(path='path/to/video.mp4')

{/* Analyze the video content */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        "Summarize the main content and key points of this video.",
        video_file
    ]
)

print(response.text)

Audio processing

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Upload an audio file */}
audio_file = client.files.upload(path='path/to/audio.mp3')

{/* Transcribe and analyze the audio */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        "Transcribe this audio and summarize the main topics.",
        audio_file
    ]
)

print(response.text)

Media resolution optimization

To save on token costs, you can adjust the resolution of media files:
from google import genai
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Process the image at a lower resolution to reduce cost */}
img = Image.open('large_image.jpg')

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=[
        "What is the subject of this image?",
        img
    ],
    config={
        "response_modalities": ["TEXT"],
        "media_resolution": "MEDIA_RESOLUTION_LOW"  # LOW | MEDIUM | HIGH
    }
)

print(response.text)
Media file limits:
  • File size: under 20 MB.
  • Supported formats: images (JPG, PNG, WebP), audio (MP3, WAV), video (MP4, MOV).
  • Upload methods: use client.files.upload() or pass a PIL Image object directly.

Code execution

Gemini models can execute Python code automatically, which makes them ideal for data analysis scenarios.
from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Data analysis example with the code execution tool enabled */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="""
Suppose I have the following sales data:
- Product A: 100 units at $50 each
- Product B: 200 units at $30 each
- Product C: 150 units at $40 each

Please calculate:
1. Total revenue
2. Average unit price
3. Draw a bar chart showing revenue distribution
""",
    config={'tools': [{'code_execution': {}}]}
)

{/* Inspect the executed code and its results */}
for part in response.candidates[0].content.parts:
    if hasattr(part, 'executable_code'):
        print(f"Executed code:\n{part.executable_code.code}")
    if hasattr(part, 'code_execution_result'):
        print(f"Execution result:\n{part.code_execution_result.output}")
    if hasattr(part, 'text'):
        print(f"Analysis:\n{part.text}")
Code execution limits:
  • Only Python code is supported.
  • The execution environment is a sandbox with no network or file system access.
  • Execution time is subject to a timeout.

Function calling (tool use)

The Gemini native format fully supports function calling, which lets the model call external tools.

Defining tools

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Define a weather lookup tool */}
tools = [
    {
        "function_declarations": [
            {
                "name": "get_current_weather",
                "description": "Get the current weather for a given city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, for example: Beijing, Shanghai"
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "Temperature unit"
                        }
                    },
                    "required": ["location"]
                }
            }
        ]
    }
]

Automatic tool calling

from google import genai

{/* Configure the tool-calling mode */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="What's the weather in Beijing right now? What's the temperature in Celsius?",
    config={
        'tools': tools,
        'tool_config': {'function_calling_config': {'mode': 'AUTO'}}
    }
)

{/* Check whether a tool needs to be called */}
function_call = response.candidates[0].content.parts[0].function_call

if function_call:
    print(f"Calling tool: {function_call.name}")
    print(f"Arguments: {dict(function_call.args)}")

    {/* Call your real weather API here */}
    def get_current_weather(location, unit="celsius"):
        # Replace this with a real weather API call
        return {
            "location": location,
            "temperature": 22,
            "unit": unit,
            "condition": "sunny"
        }

    {/* Get the tool execution result */}
    weather_data = get_current_weather(**dict(function_call.args))

    {/* Return the result to the model */}
    from google.genai import types

    response = client.models.generate_content(
        model='gemini-2.5-flash',
        contents=[
            types.Content(
                parts=[
                    types.Part(
                        function_response=types.FunctionResponse(
                            name=function_call.name,
                            response=weather_data
                        )
                    )
                ]
            )
        ]
    )

    print(f"Final answer: {response.text}")
Tool-calling modes:
  • mode: 'AUTO': The model decides whether to call a tool (recommended).
  • mode: 'ANY': The model is forced to call a tool.
  • mode: 'NONE': Tool calling is disabled.

Context caching

APIYI automatically enables implicit context caching for the Gemini native format, which can significantly reduce the cost of repeated conversations.

Caching mechanism

  • Enabled automatically: No manual configuration is required; context is cached automatically.
  • Cache pricing: Cached tokens are billed at 25% of the normal input price.
  • Expiration: Caches expire automatically after a certain period.

Detecting cache hits

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Send a request */}
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="Hello, please introduce quantum computing."
)

{/* Check for cache hits */}
usage = response.usage_metadata
if hasattr(usage, 'cached_content_token_count'):
    print(f"Cached tokens: {usage.cached_content_token_count}")
    print(f"Input tokens: {usage.prompt_token_count}")
    print(f"Output tokens: {usage.candidates_token_count}")
Caching benefits:
  • For long-context conversations, you can save up to 75% on input token costs.
  • Suitable for multi-turn dialogues, document analysis, code review, and similar scenarios.
  • Cache hits are not guaranteed; actual savings depend on the scenario.

Token usage tracking

Every API call returns detailed token usage information.

Retrieving usage statistics

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents="Explain the basic principles of machine learning."
)

{/* Get usage metadata */}
usage = response.usage_metadata

print(f"Prompt tokens: {usage.prompt_token_count}")
print(f"Output tokens: {usage.candidates_token_count}")
print(f"Total tokens: {usage.total_token_count}")

{/* If cache hits occurred */}
if hasattr(usage, 'cached_content_token_count'):
    print(f"Cached tokens: {usage.cached_content_token_count}")

{/* If there are reasoning tokens (Gemini 2.5 series) */}
if hasattr(usage, 'thoughts_token_count'):
    print(f"Reasoning tokens: {usage.thoughts_token_count}")

Token field reference

FieldDescriptionBilling
prompt_token_countNumber of input prompt tokensBilled at the input price
candidates_token_countNumber of output content tokensBilled at the output price
cached_content_token_countNumber of tokens served from cacheBilled at 25% of the input price
thoughts_token_countNumber of tokens used during reasoningBilled at the output price
total_token_countTotal token count

Things to keep in mind

API key

Make sure you use your APIYI key, not a Google AI Studio key.

Endpoint configuration

The Gemini native format uses https://api.apiyi.com as the base URL, which is compatible with Google’s official REST API format.

Model names

Use Gemini’s official model names directly, such as gemini-3-pro-preview and gemini-2.5-flash.

Multimodal support

Full support for Gemini’s official multimodal data format — you can pass images, video, and audio directly.
Key limits:
  • Media files must be smaller than 20 MB.
  • Code execution supports only Python and runs in a sandbox.
  • Reasoning tokens add to your output cost, so set thinking_budget carefully.

Compared with the OpenAI-compatible format

FeatureGemini native formatOpenAI-compatible format
Endpointhttps://api.apiyi.comhttps://api.apiyi.com/v1/chat/completions
SDKgoogle-genaiopenai
Reasoning controlthinking_budget (0-16384)reasoning_effort (low/medium/high)
Thought processinclude_thoughts=TrueNot supported
Code executiontools=[{'code_execution': {}}]Not supported
Media uploadclient.files.upload()Base64 encoding
Cache detectioncached_content_token_countNo dedicated field
If you need to call other types of models (such as the OpenAI series) or use the OpenAI-compatible format, see the Using OpenAI Official SDK documentation.

Complete example

Below is an end-to-end example that combines several features:
from google import genai
from PIL import Image

{/* Configuration */}
client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.apiyi.com"}
)

{/* Define tools */}
tools = [{
    "function_declarations": [{
        "name": "search_database",
        "description": "Search the product database",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search keyword"}
            },
            "required": ["query"]
        }
    }]
}]

{/* Multimodal input + tool calling + streaming output */}
img = Image.open('product.jpg')

response = client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents=[
        "What product is in this image? Please search the database for similar products and recommend some to me.",
        img
    ],
    config={
        'tools': tools,
        'tool_config': {'function_calling_config': {'mode': 'AUTO'}},
        'thinking_budget': 4096,
        'temperature': 0.7,
        'include_thoughts': False
    }
)

{/* Handle the streaming response */}
for chunk in response:
    if chunk.text:
        print(chunk.text, end='', flush=True)

    {/* Check for tool calls */}
    if chunk.candidates and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if hasattr(part, 'function_call'):
                print(f"\n[Calling tool: {part.function_call.name}]")

{/* Review final usage (wait until the stream is fully consumed) */}
if hasattr(response, 'usage_metadata'):
    print(f"\n\nToken usage: {response.usage_metadata.total_token_count}")

Getting help

If you run into issues with the Gemini native format:
  • Read the API reference for detailed specifications.
  • Check the model list for available Gemini models.
  • Contact technical support for help.