code_execution tool that runs Python in a sandbox. Examples below assume the client setup from Native Calls.
Image Understanding
Pass a PIL Image directly — the SDK handles encoding:types.Part.from_bytes:
Audio Understanding
Video Understanding
Cost Control with media_resolution
Token consumption for media scales with resolution. For “rough look” tasks (classification, presence checks), lower resolution saves real money:| Level | Use for |
|---|---|
LOW | Classification, coarse recognition — cheapest |
MEDIUM | General description and understanding (balanced default) |
HIGH | OCR, small text, detail-dense tasks |
Supported Formats
| Type | Formats | How to pass |
|---|---|---|
| Images | JPG, PNG, WebP | PIL Image or Part.from_bytes |
| Audio | MP3, WAV | Part.from_bytes |
| Video | MP4, MOV | Part.from_bytes |
Code Execution
Declare thecode_execution tool and the model writes Python, runs it in a sandbox, and answers based on the result — ideal for calculations and data analysis:
Code execution limits: Python only; the sandbox has no network or filesystem access; execution time is capped. To call your own external services, use Function Calling.
Related Links
- This group: Native Calls · Cache Billing · Function Calling
- Use cases: Video Understanding · Vision Understanding
- Official Google docs:
ai.google.dev/gemini-api/docs/vision