Video Understanding API

APIYI provides powerful video understanding capabilities, supporting deep analysis and understanding of videos using advanced AI models like Gemini 2.5 Pro. Through a unified OpenAI API format, you can easily implement video content recognition, scene description, action analysis, and other functions.

🎬 Intelligent Video Analysis Supports various video analysis tasks including scene understanding, action recognition, content summarization, and more, enabling AI to truly “understand” video content.

🌟 Core Features

🎯 Top Model Support: Leading multimodal video understanding models like Gemini 2.5 Pro
📹 Flexible Input: Supports Base64 encoded video files
🌏 Chinese Optimization: Perfect support for Chinese scene understanding and content description
⚡ Professional Analysis: Deep understanding of video content, actions, scenes, and context
💰 Cost-Effective: Powerful capabilities at reasonable pricing

📋 Supported Video Understanding Models

Model Name	Model ID	Features	Recommended Scenarios
Gemini 2.5 Pro ⭐	`gemini-2.5-pro`	Ultra-long context, strong video understanding	Complex video content analysis
Gemini 2.5 Flash	`gemini-2.5-flash`	Fast speed, high cost-performance	Quick video analysis

🚀 Quick Start

1. Basic Example - Local Video Base64 Encoding

from openai import OpenAI
import base64

def gemini_video_test(video_path, question, model="gemini-2.5-pro"):
    """Video understanding function"""
    client = OpenAI(
        api_key="YOUR_API_KEY",  # Replace with your API Key
        base_url="https://api.apiyi.com/v1"
    )

    {/* Read local video file and convert to Base64 */}
    with open(video_path, "rb") as f:
        video_b64 = base64.b64encode(f.read()).decode()
        video_url = f"data:video/mp4;base64,{video_b64}"

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": video_url
                        },
                        "mime_type": "video/mp4",
                    }
                ]
            }
        ],
        temperature=0.2,
        max_tokens=4096
    )

    return response.choices[0].message.content

{/* Usage example */}
if __name__ == "__main__":
    video_path = "./demo.mp4"  # Local video file path
    question = "Please describe the content of this video in detail"

    result = gemini_video_test(video_path, question)
    print(result)

File Size Limit: Recommend single video file under 20MB to ensure optimal processing performance and response speed.

2. Complete Example - With Result Saving

from openai import OpenAI
import base64
import json
from datetime import datetime
import os

def gemini_test(question, model="gemini-2.5-pro"):
    client = OpenAI(
        api_key="YOUR_API_KEY",
        base_url="https://api.apiyi.com/v1"
    )

    model = model
    user_msg = question

    VIDEO_PATH = "./demo.mp4"   # Local file, ≤20 MB recommended
    with open(VIDEO_PATH, "rb") as f:
        video_b64 = base64.b64encode(f.read()).decode()
        video_url = f"data:video/mp4;base64,{video_b64}"

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": video_url
                        },
                        "mime_type": "video/mp4",
                    }
                ]
            }
        ],
        temperature=0.2,
        max_tokens=4096
    )

    return response.choices[0].message.content

if __name__ == "__main__":
    print("Starting video understanding test...")

    {/* Run video understanding */}
    question = "Please describe the content of this video"
    result = gemini_test(question)

    {/* Generate timestamp */}
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

    {/* Get current script directory */}
    current_dir = os.path.dirname(os.path.abspath(__file__))

    {/* Save as txt file */}
    txt_filename = os.path.join(current_dir, f"video_analysis_{timestamp}.txt")
    with open(txt_filename, "w", encoding="utf-8") as f:
        f.write("=" * 60 + "\n")
        f.write("Video Understanding Analysis Results\n")
        f.write("=" * 60 + "\n")
        f.write(f"Analysis Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Question: {question}\n")
        f.write("=" * 60 + "\n\n")
        f.write(result)
        f.write("\n\n" + "=" * 60 + "\n")

    {/* Save as json file */}
    json_filename = os.path.join(current_dir, f"video_analysis_{timestamp}.json")
    data = {
        "timestamp": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        "question": question,
        "model": "gemini-2.5-pro",
        "video_file": "demo.mp4",
        "result": result
    }
    with open(json_filename, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)

    {/* Console output */}
    print("\nVideo Understanding Result:")
    print(result)
    print(f"\nResults saved to:")
    print(f"  - TXT file: {txt_filename}")
    print(f"  - JSON file: {json_filename}")

3. Using requests Library Example

import requests
import base64

def analyze_video_with_requests(video_path, question):
    """Video analysis using requests library"""

    {/* Read and encode video */}
    with open(video_path, "rb") as f:
        video_b64 = base64.b64encode(f.read()).decode()
        video_url = f"data:video/mp4;base64,{video_b64}"

    url = "https://api.apiyi.com/v1/chat/completions"
    headers = {
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "gemini-2.5-pro",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {
                        "type": "image_url",
                        "image_url": {"url": video_url},
                        "mime_type": "video/mp4"
                    }
                ]
            }
        ],
        "temperature": 0.2,
        "max_tokens": 4096
    }

    response = requests.post(url, headers=headers, json=payload)

    if response.status_code == 200:
        return response.json()['choices'][0]['message']['content']
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None

{/* Usage example */}
result = analyze_video_with_requests("./demo.mp4", "Please describe the content of this video")
print(result)

🎯 Common Use Cases

1. Video Content Summary

prompt = """
Please analyze this video and provide a detailed summary, including:
1. Main content and theme of the video
2. Key scenes and important moments
3. People or objects that appear
4. Overall atmosphere and style of the video
5. Suitable application scenarios or target audience
"""

2. Educational Video Analysis

prompt = """
This is an educational video, please analyze:
1. Teaching topic and knowledge points
2. Explanation steps and process
3. Teaching methods and tools used
4. Key points and difficult content
5. Recommended learning takeaways
"""

3. Surveillance Video Analysis

prompt = """
Analyze this surveillance video:
1. Time period and location information (if visible)
2. Number of people and their activities
3. Any abnormal behaviors or events
4. Environmental changes
5. Key content that requires attention
"""

4. Marketing Video Evaluation

prompt = """
Evaluate the effectiveness of this marketing video:
1. Core selling points and message delivery
2. Visual presentation and production quality
3. Target audience positioning
4. Emotional resonance points
5. Improvement suggestions
"""

5. Sports Action Analysis

prompt = """
Analyze the sports actions in the video:
1. Sport type and action category
2. Standardization of technical movements
3. Key action points
4. Possible issues
5. Improvement recommendations
"""

💡 Best Practices

Video Preprocessing Recommendations

Format Support: Mainstream video formats like MP4, AVI, MOV
File Size: Recommend single video under 20MB
Duration: Shorter video clips will get more precise analysis
Resolution: Moderate resolution is sufficient, higher may increase processing time
Encoding: Use efficient encoding formats like H.264

Prompt Optimization Tips

{/* ❌ Not Recommended: Vague prompt */}
prompt = "Look at this video"

{/* ✅ Recommended: Specific and clear prompt */}
prompt = """
Please analyze this video in detail from the following aspects:
1. Video Theme: Overall content and main information
2. Scene Description: Environment, location, time and other background information
3. Subject Analysis: People, objects and their behaviors that appear
4. Action Recognition: Key actions and event sequences
5. Emotional Tone: Emotions and atmosphere conveyed by the video
6. Application Suggestions: Suitable usage scenarios and target audience
"""

Parameter Tuning Recommendations

{/* More accurate analysis */}
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=messages,
    temperature=0.2,      # Lower randomness, improve accuracy
    max_tokens=4096,      # Sufficient output length
)

{/* More creative descriptions */}
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=messages,
    temperature=0.7,      # Increase creativity
    max_tokens=2048,
)

🔧 Advanced Features

1. Error Handling and Retry Mechanism

import time
from openai import OpenAI

def analyze_video_with_retry(video_path, question, max_retries=3):
    """Video analysis with retry mechanism"""
    client = OpenAI(
        api_key="YOUR_API_KEY",
        base_url="https://api.apiyi.com/v1"
    )

    with open(video_path, "rb") as f:
        video_b64 = base64.b64encode(f.read()).decode()
        video_url = f"data:video/mp4;base64,{video_b64}"

    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gemini-2.5-pro",
                messages=[
                    {
                        "role": "user",
                        "content": [
                            {"type": "text", "text": question},
                            {
                                "type": "image_url",
                                "image_url": {"url": video_url},
                                "mime_type": "video/mp4"
                            }
                        ]
                    }
                ],
                temperature=0.2,
                max_tokens=4096
            )
            return response.choices[0].message.content

        except Exception as e:
            print(f"Attempt {attempt + 1}/{max_retries} failed: {e}")
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise

    return None

2. Batch Video Analysis

import os
import glob

def batch_analyze_videos(video_dir, question):
    """Batch analyze all videos in a folder"""
    video_files = glob.glob(os.path.join(video_dir, "*.mp4"))
    results = {}

    for video_file in video_files:
        print(f"Analyzing video: {os.path.basename(video_file)}")
        try:
            result = gemini_video_test(video_file, question)
            results[video_file] = result
        except Exception as e:
            print(f"Analysis failed: {e}")
            results[video_file] = f"Error: {str(e)}"

    return results

{/* Usage example */}
results = batch_analyze_videos("./videos", "Please describe the main content of this video")
for video, analysis in results.items():
    print(f"\n{video}:\n{analysis}\n")

3. Multi-turn Conversation for In-depth Analysis

def interactive_video_analysis(video_path):
    """Interactive video analysis"""
    client = OpenAI(
        api_key="YOUR_API_KEY",
        base_url="https://api.apiyi.com/v1"
    )

    {/* Read video */}
    with open(video_path, "rb") as f:
        video_b64 = base64.b64encode(f.read()).decode()
        video_url = f"data:video/mp4;base64,{video_b64}"

    messages = []

    {/* Initial analysis */}
    messages.append({
        "role": "user",
        "content": [
            {"type": "text", "text": "Please analyze the content of this video"},
            {
                "type": "image_url",
                "image_url": {"url": video_url},
                "mime_type": "video/mp4"
            }
        ]
    })

    response = client.chat.completions.create(
        model="gemini-2.5-pro",
        messages=messages,
        temperature=0.2,
        max_tokens=4096
    )

    assistant_message = response.choices[0].message.content
    print(f"AI: {assistant_message}\n")
    messages.append({"role": "assistant", "content": assistant_message})

    {/* Continue asking questions */}
    while True:
        user_question = input("Your question (type 'quit' to exit): ")
        if user_question.lower() == 'quit':
            break

        messages.append({
            "role": "user",
            "content": [{"type": "text", "text": user_question}]
        })

        response = client.chat.completions.create(
            model="gemini-2.5-pro",
            messages=messages,
            temperature=0.2,
            max_tokens=4096
        )

        assistant_message = response.choices[0].message.content
        print(f"AI: {assistant_message}\n")
        messages.append({"role": "assistant", "content": assistant_message})

📊 Model Comparison

Model	Video Understanding	Response Speed	Context Length	Price
Gemini 2.5 Pro	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	Ultra-long	$$
Gemini 2.5 Flash	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Long	$

🚨 Important Notes

File Size: Recommend single video file under 20MB to ensure optimal performance
Privacy Protection: Do not upload videos containing sensitive information or privacy content
Compliant Usage: Follow relevant laws and regulations, do not use for illegal purposes
Result Verification: AI analysis results are for reference only, important decisions require manual review
Cost Control: Video analysis consumes more tokens, please use reasonably
Video Format: Ensure video format is supported (MP4 format has best compatibility)

💰 Cost Optimization Recommendations

Video Preprocessing: Compress videos appropriately before uploading to reduce file size
Precise Questions: Use clear questions to avoid repeated analysis
Model Selection: Choose appropriate model based on needs (Flash vs Pro)
Segmented Analysis: For long videos, consider processing in segments
Cache Results: For repeatedly analyzed videos, cache previous results

💡 Pro Tip: Video understanding functionality is particularly suitable for automated content moderation, video summarization, educational video analysis, and other scenarios. Recommend testing with Gemini 2.5 Flash first, then choose the appropriate model based on your needs after confirming effectiveness.

Basics

Basic API

Video API

Image API

Multimodal Understanding API

Text API

🌟 Core Features

📋 Supported Video Understanding Models

🚀 Quick Start

1. Basic Example - Local Video Base64 Encoding

2. Complete Example - With Result Saving

3. Using requests Library Example

🎯 Common Use Cases

1. Video Content Summary

2. Educational Video Analysis

3. Surveillance Video Analysis

4. Marketing Video Evaluation

5. Sports Action Analysis

💡 Best Practices

Video Preprocessing Recommendations

Prompt Optimization Tips

Parameter Tuning Recommendations

🔧 Advanced Features

1. Error Handling and Retry Mechanism

2. Batch Video Analysis

3. Multi-turn Conversation for In-depth Analysis

📊 Model Comparison

🚨 Important Notes

💰 Cost Optimization Recommendations

Basics

Basic API

Video API

Image API

Multimodal Understanding API

Text API

​🌟 Core Features

​📋 Supported Video Understanding Models

​🚀 Quick Start

​1. Basic Example - Local Video Base64 Encoding

​2. Complete Example - With Result Saving

​3. Using requests Library Example

​🎯 Common Use Cases

​1. Video Content Summary

​2. Educational Video Analysis

​3. Surveillance Video Analysis

​4. Marketing Video Evaluation

​5. Sports Action Analysis

​💡 Best Practices

​Video Preprocessing Recommendations

​Prompt Optimization Tips

​Parameter Tuning Recommendations

​🔧 Advanced Features

​1. Error Handling and Retry Mechanism

​2. Batch Video Analysis

​3. Multi-turn Conversation for In-depth Analysis

​📊 Model Comparison

​🚨 Important Notes

​💰 Cost Optimization Recommendations

​🔗 Related Resources

🌟 Core Features

📋 Supported Video Understanding Models

🚀 Quick Start

1. Basic Example - Local Video Base64 Encoding

2. Complete Example - With Result Saving

3. Using requests Library Example

🎯 Common Use Cases

1. Video Content Summary

2. Educational Video Analysis

3. Surveillance Video Analysis

4. Marketing Video Evaluation

5. Sports Action Analysis

💡 Best Practices

Video Preprocessing Recommendations

Prompt Optimization Tips

Parameter Tuning Recommendations

🔧 Advanced Features

1. Error Handling and Retry Mechanism

2. Batch Video Analysis

3. Multi-turn Conversation for In-depth Analysis

📊 Model Comparison

🚨 Important Notes

💰 Cost Optimization Recommendations

🔗 Related Resources