APIYI provides powerful image understanding capabilities, supporting deep analysis and understanding of images using various advanced AI models. Through a unified OpenAI API format, you can easily implement image recognition, scene description, OCR text recognition, and other functions.
🔍 Intelligent Visual Analysis
Supports various visual tasks including object recognition, scene understanding, text extraction, sentiment analysis, and more, enabling AI to truly “understand” images.
🌟 Core Features
- 🎯 Multi-Model Support: Top multimodal models like the Gemini 3, GPT-5, and Claude 4 series
- 📸 Flexible Input: Supports URL links and Base64 encoded images
- 🌏 Chinese Optimization: Perfect support for Chinese scene understanding and text recognition
- ⚡ Fast Response: High-performance inference with second-level results
- 💰 Cost Control: Multiple model options to meet different budget requirements
📋 Supported Vision Models
The following are current mainstream multimodal recommendations. Model IDs may change with new releases — always defer to the console.
| Model Name | Model ID | Features | Recommended Scenarios |
|---|
| Gemini 3.1 Pro Preview ⭐ | gemini-3.1-pro-preview | Strongest multimodal reasoning, rich detail | Complex image/scene analysis |
| Gemini 3.5 Flash 🔥 | gemini-3.5-flash | Fast and low-cost, best value | Real-time recognition, batch processing |
| GPT-5.5 ⭐ | gpt-5.5 | Strong all-round vision understanding, stable | General image understanding |
| Claude Opus 4.7 | claude-opus-4-7 | Deep understanding, precise descriptions | Professional image+text analysis |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | Rivals Opus, high cost-performance | Cost-effective recognition |
| GPT-4o | gpt-4o | Classic multimodal, mature and stable | General scenarios |
| Gemini 2.5 Flash | gemini-2.5-flash | Ultra-fast and cheap, GA release | Large-batch simple recognition |
Most chat models now support multimodal image input: the table above lists common recommendations, not the full set. Mainstream models including the GPT-5, Gemini 3, Claude 4 series, Grok 4, Qwen, GLM, and Kimi mostly accept image input.
🚀 Quick Start
1. Basic Example - Image URL
import requests
url = "https://api.apiyi.com/v1/chat/completions"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
payload = {
"model": "gpt-5.5",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please describe this image in detail"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}
response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result['choices'][0]['message']['content'])
2. Local Image Example - Base64 Encoding
import base64
import requests
def image_to_base64(image_path):
"""Convert local image to base64 encoding"""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Read local image
base64_image = image_to_base64("path/to/your/image.jpg")
url = "https://api.apiyi.com/v1/chat/completions"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
payload = {
"model": "gemini-3.1-pro-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze all text content in this image"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
]
}
response = requests.post(url, headers=headers, json=payload)
print(response.json()['choices'][0]['message']['content'])
3. Advanced Example - Multi-Image Comparison
import requests
url = "https://api.apiyi.com/v1/chat/completions"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
payload = {
"model": "gemini-3.1-pro-preview",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Please compare the differences between these two images:"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/image1.jpg"}
},
{
"type": "image_url",
"image_url": {"url": "https://example.com/image2.jpg"}
}
]
}
],
"max_tokens": 1000
}
response = requests.post(url, headers=headers, json=payload)
print(response.json()['choices'][0]['message']['content'])
4. cURL Example (Command Line)
Image URL method:
curl https://api.apiyi.com/v1/chat/completions \
-H "Authorization: Bearer $APIYI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.1-pro-preview",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Please describe this image in detail" },
{ "type": "image_url", "image_url": { "url": "https://example.com/image.jpg" } }
]
}
]
}'
Local image Base64 method (encode the image to Base64, then embed it in the request body):
# 1. Convert local image to base64 (macOS / Linux)
BASE64_IMAGE=$(base64 -i path/to/your/image.jpg | tr -d '\n')
# 2. Pass the image content via a data URI
curl https://api.apiyi.com/v1/chat/completions \
-H "Authorization: Bearer $APIYI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Analyze all text content in this image" },
{ "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,'"$BASE64_IMAGE"'" } }
]
}
]
}'
Base64 is about 1.33x the size of the original image. For large images, prefer the image URL method to keep the request body small and the response fast.
GPT-5 series parameter differences: if you switch the examples to a GPT-5 series model such as gpt-5.5 / gpt-5.4, note that:
- Use
max_completion_tokens instead of max_tokens
temperature only supports 1 (leave it at the default — do not pass other values)
- Do not pass the
top_p parameter
The Gemini and Claude series have no such restrictions and work normally with max_tokens, temperature, etc.
🎯 Common Use Cases
1. Product Recognition and Analysis
prompt = """
Please analyze this product image, including:
1. Product type and brand
2. Main features and selling points
3. Suitable target audience
4. Suggested marketing copy
"""
2. Document OCR Recognition
prompt = """
Please extract all text content from the image and organize it in the original format.
If there are tables, please present them in Markdown table format.
"""
3. Medical Imaging Assistance
prompt = """
This is a medical imaging picture, please:
1. Describe basic image information (such as imaging type, body part, etc.)
2. Label visible anatomical structures
3. Note: For reference only, not for diagnostic purposes
"""
4. Security Surveillance Analysis
prompt = """
Analyze the surveillance footage to identify:
1. Number of people and their positions in the scene
2. Any abnormal behavior
3. Environmental safety hazards
4. Timestamp information (if visible)
"""
💡 Best Practices
Image Preprocessing Recommendations
- Format Support: Mainstream formats like JPEG, PNG, GIF, WebP
- Size Limit: Recommended single image under 20MB
- Resolution: Higher resolution images achieve better recognition
- Compression: Moderate compression to improve transfer speed
Prompt Optimization
# ❌ Not Recommended: Vague prompt
prompt = "What is this"
# ✅ Recommended: Specific and clear prompt
prompt = """
Please analyze this image from the following aspects:
1. Main Objects: Identify main objects or people in the image
2. Scene Environment: Describe the shooting location and environmental features
3. Color Composition: Analyze color scheme and composition characteristics
4. Emotional Atmosphere: Emotions or atmosphere conveyed by the image
5. Possible Uses: What scenarios this image is suitable for
"""
Error Handling
import requests
from requests.exceptions import RequestException
def analyze_image_with_retry(image_url, prompt, max_retries=3):
"""Image analysis function with retry mechanism"""
for attempt in range(max_retries):
try:
response = requests.post(
"https://api.apiyi.com/v1/chat/completions",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json={
"model": "gpt-5.5",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": image_url}}
]
}]
},
timeout=30
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
print(f"Rate limited, waiting to retry... (attempt {attempt + 1}/{max_retries})")
time.sleep(2 ** attempt) # Exponential backoff
else:
print(f"Error: {response.status_code} - {response.text}")
except RequestException as e:
print(f"Request exception: {e}")
return None
🔧 Advanced Features
1. Streaming Output
For lengthy analyses, streaming output provides better user experience:
payload = {
"model": "gpt-5.5",
"messages": [...],
"stream": True
}
response = requests.post(url, headers=headers, json=payload, stream=True)
for line in response.iter_lines():
if line:
print(line.decode('utf-8'))
2. Multi-turn Conversation
Maintain context for in-depth analysis:
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What animal is this?"},
{"type": "image_url", "image_url": {"url": "animal.jpg"}}
]
},
{
"role": "assistant",
"content": "This is a Golden Retriever."
},
{
"role": "user",
"content": [{"type": "text", "text": "How old does it look? How is its health condition?"}]
}
]
3. Combined with Function Calling
tools = [
{
"type": "function",
"function": {
"name": "save_image_analysis",
"description": "Save image analysis results to database",
"parameters": {
"type": "object",
"properties": {
"objects": {"type": "array", "items": {"type": "string"}},
"scene": {"type": "string"},
"text_content": {"type": "string"}
}
}
}
}
]
payload = {
"model": "gpt-5.5",
"messages": messages,
"tools": tools,
"tool_choice": "auto"
}
| Model | Response Speed | Recognition Accuracy | Chinese Support | Price |
|---|
| Gemini 3.1 Pro Preview | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | $$ |
| Gemini 3.5 Flash | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $ |
| GPT-5.5 | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | $$ |
| Claude Sonnet 4.6 | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $$ |
| Gemini 2.5 Flash | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $ |
🚨 Important Notes
- Privacy Protection: Do not upload images containing sensitive information
- Compliant Usage: Follow relevant laws and regulations, do not use for illegal purposes
- Result Verification: AI analysis results are for reference only, important decisions require manual review
- Cost Control: Choose models reasonably to avoid unnecessary expenses
💡 Pro Tip: Test first with cost-effective models like Gemini 3.5 Flash or Gemini 2.5 Flash, then switch to advanced models like Gemini 3.1 Pro or GPT-5.5 for production once you’ve confirmed quality. For more available models, see Popular Models or the console model list.