APIYI provides powerful video understanding capabilities, supporting deep analysis and understanding of videos using advanced AI models like Gemini 2.5 Pro. Through a unified OpenAI API format, you can easily implement video content recognition, scene description, action analysis, and other functions.
🎬 Intelligent Video Analysis
Supports various video analysis tasks including scene understanding, action recognition, content summarization, and more, enabling AI to truly “understand” video content.
🌟 Core Features
- 🎯 Top Model Support: Leading multimodal video understanding models like Gemini 2.5 Pro
- 📹 Flexible Input: Supports Base64 encoded video files
- 🌏 Chinese Optimization: Perfect support for Chinese scene understanding and content description
- ⚡ Professional Analysis: Deep understanding of video content, actions, scenes, and context
- 💰 Cost-Effective: Powerful capabilities at reasonable pricing
📋 Supported Video Understanding Models
| Model Name | Model ID | Features | Recommended Scenarios |
|---|
| Gemini 2.5 Pro ⭐ | gemini-2.5-pro | Ultra-long context, strong video understanding | Complex video content analysis |
| Gemini 2.5 Flash | gemini-2.5-flash | Fast speed, high cost-performance | Quick video analysis |
🚀 Quick Start
1. Basic Example - Local Video Base64 Encoding
from openai import OpenAI
import base64
def gemini_video_test(video_path, question, model="gemini-2.5-pro"):
"""Video understanding function"""
client = OpenAI(
api_key="YOUR_API_KEY", # Replace with your API Key
base_url="https://api.apiyi.com/v1"
)
{/* Read local video file and convert to Base64 */}
with open(video_path, "rb") as f:
video_b64 = base64.b64encode(f.read()).decode()
video_url = f"data:video/mp4;base64,{video_b64}"
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": [
{"type": "text", "text": question},
{
"type": "image_url",
"image_url": {
"url": video_url
},
"mime_type": "video/mp4",
}
]
}
],
temperature=0.2,
max_tokens=4096
)
return response.choices[0].message.content
{/* Usage example */}
if __name__ == "__main__":
video_path = "./demo.mp4" # Local video file path
question = "Please describe the content of this video in detail"
result = gemini_video_test(video_path, question)
print(result)
File Size Limit: Recommend single video file under 20MB to ensure optimal processing performance and response speed.
2. Complete Example - With Result Saving
from openai import OpenAI
import base64
import json
from datetime import datetime
import os
def gemini_test(question, model="gemini-2.5-pro"):
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.apiyi.com/v1"
)
model = model
user_msg = question
VIDEO_PATH = "./demo.mp4" # Local file, ≤20 MB recommended
with open(VIDEO_PATH, "rb") as f:
video_b64 = base64.b64encode(f.read()).decode()
video_url = f"data:video/mp4;base64,{video_b64}"
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": [
{"type": "text", "text": question},
{
"type": "image_url",
"image_url": {
"url": video_url
},
"mime_type": "video/mp4",
}
]
}
],
temperature=0.2,
max_tokens=4096
)
return response.choices[0].message.content
if __name__ == "__main__":
print("Starting video understanding test...")
{/* Run video understanding */}
question = "Please describe the content of this video"
result = gemini_test(question)
{/* Generate timestamp */}
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
{/* Get current script directory */}
current_dir = os.path.dirname(os.path.abspath(__file__))
{/* Save as txt file */}
txt_filename = os.path.join(current_dir, f"video_analysis_{timestamp}.txt")
with open(txt_filename, "w", encoding="utf-8") as f:
f.write("=" * 60 + "\n")
f.write("Video Understanding Analysis Results\n")
f.write("=" * 60 + "\n")
f.write(f"Analysis Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"Question: {question}\n")
f.write("=" * 60 + "\n\n")
f.write(result)
f.write("\n\n" + "=" * 60 + "\n")
{/* Save as json file */}
json_filename = os.path.join(current_dir, f"video_analysis_{timestamp}.json")
data = {
"timestamp": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
"question": question,
"model": "gemini-2.5-pro",
"video_file": "demo.mp4",
"result": result
}
with open(json_filename, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
{/* Console output */}
print("\nVideo Understanding Result:")
print(result)
print(f"\nResults saved to:")
print(f" - TXT file: {txt_filename}")
print(f" - JSON file: {json_filename}")
3. Using requests Library Example
import requests
import base64
def analyze_video_with_requests(video_path, question):
"""Video analysis using requests library"""
{/* Read and encode video */}
with open(video_path, "rb") as f:
video_b64 = base64.b64encode(f.read()).decode()
video_url = f"data:video/mp4;base64,{video_b64}"
url = "https://api.apiyi.com/v1/chat/completions"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
payload = {
"model": "gemini-2.5-pro",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": [
{"type": "text", "text": question},
{
"type": "image_url",
"image_url": {"url": video_url},
"mime_type": "video/mp4"
}
]
}
],
"temperature": 0.2,
"max_tokens": 4096
}
response = requests.post(url, headers=headers, json=payload)
if response.status_code == 200:
return response.json()['choices'][0]['message']['content']
else:
print(f"Error: {response.status_code} - {response.text}")
return None
{/* Usage example */}
result = analyze_video_with_requests("./demo.mp4", "Please describe the content of this video")
print(result)
🎯 Common Use Cases
1. Video Content Summary
prompt = """
Please analyze this video and provide a detailed summary, including:
1. Main content and theme of the video
2. Key scenes and important moments
3. People or objects that appear
4. Overall atmosphere and style of the video
5. Suitable application scenarios or target audience
"""
2. Educational Video Analysis
prompt = """
This is an educational video, please analyze:
1. Teaching topic and knowledge points
2. Explanation steps and process
3. Teaching methods and tools used
4. Key points and difficult content
5. Recommended learning takeaways
"""
3. Surveillance Video Analysis
prompt = """
Analyze this surveillance video:
1. Time period and location information (if visible)
2. Number of people and their activities
3. Any abnormal behaviors or events
4. Environmental changes
5. Key content that requires attention
"""
4. Marketing Video Evaluation
prompt = """
Evaluate the effectiveness of this marketing video:
1. Core selling points and message delivery
2. Visual presentation and production quality
3. Target audience positioning
4. Emotional resonance points
5. Improvement suggestions
"""
5. Sports Action Analysis
prompt = """
Analyze the sports actions in the video:
1. Sport type and action category
2. Standardization of technical movements
3. Key action points
4. Possible issues
5. Improvement recommendations
"""
💡 Best Practices
Video Preprocessing Recommendations
- Format Support: Mainstream video formats like MP4, AVI, MOV
- File Size: Recommend single video under 20MB
- Duration: Shorter video clips will get more precise analysis
- Resolution: Moderate resolution is sufficient, higher may increase processing time
- Encoding: Use efficient encoding formats like H.264
Prompt Optimization Tips
{/* ❌ Not Recommended: Vague prompt */}
prompt = "Look at this video"
{/* ✅ Recommended: Specific and clear prompt */}
prompt = """
Please analyze this video in detail from the following aspects:
1. Video Theme: Overall content and main information
2. Scene Description: Environment, location, time and other background information
3. Subject Analysis: People, objects and their behaviors that appear
4. Action Recognition: Key actions and event sequences
5. Emotional Tone: Emotions and atmosphere conveyed by the video
6. Application Suggestions: Suitable usage scenarios and target audience
"""
Parameter Tuning Recommendations
{/* More accurate analysis */}
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=messages,
temperature=0.2, # Lower randomness, improve accuracy
max_tokens=4096, # Sufficient output length
)
{/* More creative descriptions */}
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=messages,
temperature=0.7, # Increase creativity
max_tokens=2048,
)
🔧 Advanced Features
1. Error Handling and Retry Mechanism
import time
from openai import OpenAI
def analyze_video_with_retry(video_path, question, max_retries=3):
"""Video analysis with retry mechanism"""
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.apiyi.com/v1"
)
with open(video_path, "rb") as f:
video_b64 = base64.b64encode(f.read()).decode()
video_url = f"data:video/mp4;base64,{video_b64}"
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": question},
{
"type": "image_url",
"image_url": {"url": video_url},
"mime_type": "video/mp4"
}
]
}
],
temperature=0.2,
max_tokens=4096
)
return response.choices[0].message.content
except Exception as e:
print(f"Attempt {attempt + 1}/{max_retries} failed: {e}")
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise
return None
2. Batch Video Analysis
import os
import glob
def batch_analyze_videos(video_dir, question):
"""Batch analyze all videos in a folder"""
video_files = glob.glob(os.path.join(video_dir, "*.mp4"))
results = {}
for video_file in video_files:
print(f"Analyzing video: {os.path.basename(video_file)}")
try:
result = gemini_video_test(video_file, question)
results[video_file] = result
except Exception as e:
print(f"Analysis failed: {e}")
results[video_file] = f"Error: {str(e)}"
return results
{/* Usage example */}
results = batch_analyze_videos("./videos", "Please describe the main content of this video")
for video, analysis in results.items():
print(f"\n{video}:\n{analysis}\n")
3. Multi-turn Conversation for In-depth Analysis
def interactive_video_analysis(video_path):
"""Interactive video analysis"""
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.apiyi.com/v1"
)
{/* Read video */}
with open(video_path, "rb") as f:
video_b64 = base64.b64encode(f.read()).decode()
video_url = f"data:video/mp4;base64,{video_b64}"
messages = []
{/* Initial analysis */}
messages.append({
"role": "user",
"content": [
{"type": "text", "text": "Please analyze the content of this video"},
{
"type": "image_url",
"image_url": {"url": video_url},
"mime_type": "video/mp4"
}
]
})
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=messages,
temperature=0.2,
max_tokens=4096
)
assistant_message = response.choices[0].message.content
print(f"AI: {assistant_message}\n")
messages.append({"role": "assistant", "content": assistant_message})
{/* Continue asking questions */}
while True:
user_question = input("Your question (type 'quit' to exit): ")
if user_question.lower() == 'quit':
break
messages.append({
"role": "user",
"content": [{"type": "text", "text": user_question}]
})
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=messages,
temperature=0.2,
max_tokens=4096
)
assistant_message = response.choices[0].message.content
print(f"AI: {assistant_message}\n")
messages.append({"role": "assistant", "content": assistant_message})
📊 Model Comparison
| Model | Video Understanding | Response Speed | Context Length | Price |
|---|
| Gemini 2.5 Pro | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Ultra-long | $$ |
| Gemini 2.5 Flash | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Long | $ |
🚨 Important Notes
- File Size: Recommend single video file under 20MB to ensure optimal performance
- Privacy Protection: Do not upload videos containing sensitive information or privacy content
- Compliant Usage: Follow relevant laws and regulations, do not use for illegal purposes
- Result Verification: AI analysis results are for reference only, important decisions require manual review
- Cost Control: Video analysis consumes more tokens, please use reasonably
- Video Format: Ensure video format is supported (MP4 format has best compatibility)
💰 Cost Optimization Recommendations
- Video Preprocessing: Compress videos appropriately before uploading to reduce file size
- Precise Questions: Use clear questions to avoid repeated analysis
- Model Selection: Choose appropriate model based on needs (Flash vs Pro)
- Segmented Analysis: For long videos, consider processing in segments
- Cache Results: For repeatedly analyzed videos, cache previous results
💡 Pro Tip: Video understanding functionality is particularly suitable for automated content moderation, video summarization, educational video analysis, and other scenarios. Recommend testing with Gemini 2.5 Flash first, then choose the appropriate model based on your needs after confirming effectiveness.