Vision Understanding (Image Recognition) API

APIYI provides powerful image understanding capabilities, supporting deep analysis and understanding of images using various advanced AI models. Through a unified OpenAI API format, you can easily implement image recognition, scene description, OCR text recognition, and other functions.

🔍 Intelligent Visual Analysis Supports various visual tasks including object recognition, scene understanding, text extraction, sentiment analysis, and more, enabling AI to truly “understand” images.

🌟 Core Features

🎯 Multi-Model Support: Top vision models like GPT-4o, Gemini 2.5 Pro, Claude, etc.
📸 Flexible Input: Supports URL links and Base64 encoded images
🌏 Chinese Optimization: Perfect support for Chinese scene understanding and text recognition
⚡ Fast Response: High-performance inference with second-level results
💰 Cost Control: Multiple model options to meet different budget requirements

📋 Supported Vision Models

Model Name	Model ID	Features	Recommended Scenarios
GPT-4o ⭐	`gpt-4o`	Strong comprehensive abilities, accurate recognition	General image understanding
GPT-4.1 Mini	`gpt-4.1-mini`	Lightweight and fast, low cost	Batch processing
Gemini 2.5 Pro ⭐	`gemini-2.5-pro`	Ultra-long context, rich details	Complex scene analysis
Gemini 2.5 Flash	`gemini-2.5-flash`	Extremely fast, high cost-performance	Real-time applications
Claude 3.5 Sonnet	`claude-3-5-sonnet`	Deep understanding, accurate descriptions	Professional analysis

🚀 Quick Start

1. Basic Example - Image URL

import requests

url = "https://api.apiyi.com/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Please describe this image in detail"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result['choices'][0]['message']['content'])

2. Local Image Example - Base64 Encoding

import base64
import requests

def image_to_base64(image_path):
    """Convert local image to base64 encoding"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Read local image
base64_image = image_to_base64("path/to/your/image.jpg")

url = "https://api.apiyi.com/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "gemini-2.5-pro",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze all text content in this image"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
}

response = requests.post(url, headers=headers, json=payload)
print(response.json()['choices'][0]['message']['content'])

3. Advanced Example - Multi-Image Comparison

import requests

url = "https://api.apiyi.com/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please compare the differences between these two images:"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image1.jpg"}
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image2.jpg"}
                }
            ]
        }
    ],
    "max_tokens": 1000
}

response = requests.post(url, headers=headers, json=payload)
print(response.json()['choices'][0]['message']['content'])

🎯 Common Use Cases

1. Product Recognition and Analysis

prompt = """
Please analyze this product image, including:
1. Product type and brand
2. Main features and selling points
3. Suitable target audience
4. Suggested marketing copy
"""

2. Document OCR Recognition

prompt = """
Please extract all text content from the image and organize it in the original format.
If there are tables, please present them in Markdown table format.
"""

3. Medical Imaging Assistance

prompt = """
This is a medical imaging picture, please:
1. Describe basic image information (such as imaging type, body part, etc.)
2. Label visible anatomical structures
3. Note: For reference only, not for diagnostic purposes
"""

4. Security Surveillance Analysis

prompt = """
Analyze the surveillance footage to identify:
1. Number of people and their positions in the scene
2. Any abnormal behavior
3. Environmental safety hazards
4. Timestamp information (if visible)
"""

💡 Best Practices

Image Preprocessing Recommendations

Format Support: Mainstream formats like JPEG, PNG, GIF, WebP
Size Limit: Recommended single image under 20MB
Resolution: Higher resolution images achieve better recognition
Compression: Moderate compression to improve transfer speed

Prompt Optimization

# ❌ Not Recommended: Vague prompt
prompt = "What is this"

# ✅ Recommended: Specific and clear prompt
prompt = """
Please analyze this image from the following aspects:
1. Main Objects: Identify main objects or people in the image
2. Scene Environment: Describe the shooting location and environmental features
3. Color Composition: Analyze color scheme and composition characteristics
4. Emotional Atmosphere: Emotions or atmosphere conveyed by the image
5. Possible Uses: What scenarios this image is suitable for
"""

Error Handling

import requests
from requests.exceptions import RequestException

def analyze_image_with_retry(image_url, prompt, max_retries=3):
    """Image analysis function with retry mechanism"""
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.apiyi.com/v1/chat/completions",
                headers={
                    "Authorization": f"Bearer {API_KEY}",
                    "Content-Type": "application/json"
                },
                json={
                    "model": "gpt-4o",
                    "messages": [{
                        "role": "user",
                        "content": [
                            {"type": "text", "text": prompt},
                            {"type": "image_url", "image_url": {"url": image_url}}
                        ]
                    }]
                },
                timeout=30
            )

            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:
                print(f"Rate limited, waiting to retry... (attempt {attempt + 1}/{max_retries})")
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                print(f"Error: {response.status_code} - {response.text}")

        except RequestException as e:
            print(f"Request exception: {e}")

    return None

🔧 Advanced Features

1. Streaming Output

For lengthy analyses, streaming output provides better user experience:

payload = {
    "model": "gpt-4o",
    "messages": [...],
    "stream": True
}

response = requests.post(url, headers=headers, json=payload, stream=True)
for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

2. Multi-turn Conversation

Maintain context for in-depth analysis:

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What animal is this?"},
            {"type": "image_url", "image_url": {"url": "animal.jpg"}}
        ]
    },
    {
        "role": "assistant",
        "content": "This is a Golden Retriever."
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": "How old does it look? How is its health condition?"}]
    }
]

3. Combined with Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "save_image_analysis",
            "description": "Save image analysis results to database",
            "parameters": {
                "type": "object",
                "properties": {
                    "objects": {"type": "array", "items": {"type": "string"}},
                    "scene": {"type": "string"},
                    "text_content": {"type": "string"}
                }
            }
        }
    }
]

payload = {
    "model": "gpt-4o",
    "messages": messages,
    "tools": tools,
    "tool_choice": "auto"
}

📊 Performance Comparison

Model	Response Speed	Recognition Accuracy	Chinese Support	Price
GPT-4o	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	$$
Gemini 2.5 Pro	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	$$
Gemini 2.5 Flash	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	$
GPT-4.1 Mini	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	$

🚨 Important Notes

Privacy Protection: Do not upload images containing sensitive information
Compliant Usage: Follow relevant laws and regulations, do not use for illegal purposes
Result Verification: AI analysis results are for reference only, important decisions require manual review
Cost Control: Choose models reasonably to avoid unnecessary expenses

💡 Pro Tip: Recommend testing with GPT-4.1 Mini or Gemini 2.5 Flash first, then use advanced models for production deployment after confirming effectiveness.

Basics

Basic API

Video API

Image API

Multimodal Understanding API

Text API

Vision Understanding (Image Recognition) API

🌟 Core Features

📋 Supported Vision Models

🚀 Quick Start

1. Basic Example - Image URL

2. Local Image Example - Base64 Encoding

3. Advanced Example - Multi-Image Comparison

🎯 Common Use Cases

1. Product Recognition and Analysis

2. Document OCR Recognition

3. Medical Imaging Assistance

4. Security Surveillance Analysis

💡 Best Practices

Image Preprocessing Recommendations

Prompt Optimization

Error Handling

🔧 Advanced Features

1. Streaming Output

2. Multi-turn Conversation

3. Combined with Function Calling

📊 Performance Comparison

🚨 Important Notes

Basics

Basic API

Video API

Image API

Multimodal Understanding API

Text API

​🌟 Core Features

​📋 Supported Vision Models

​🚀 Quick Start

​1. Basic Example - Image URL

​2. Local Image Example - Base64 Encoding

​3. Advanced Example - Multi-Image Comparison

​🎯 Common Use Cases

​1. Product Recognition and Analysis

​2. Document OCR Recognition

​3. Medical Imaging Assistance

​4. Security Surveillance Analysis

​💡 Best Practices

​Image Preprocessing Recommendations

​Prompt Optimization

​Error Handling

​🔧 Advanced Features

​1. Streaming Output

​2. Multi-turn Conversation

​3. Combined with Function Calling

​📊 Performance Comparison

​🚨 Important Notes

​🔗 Related Resources

🌟 Core Features

📋 Supported Vision Models

🚀 Quick Start

1. Basic Example - Image URL

2. Local Image Example - Base64 Encoding

3. Advanced Example - Multi-Image Comparison

🎯 Common Use Cases

1. Product Recognition and Analysis

2. Document OCR Recognition

3. Medical Imaging Assistance

4. Security Surveillance Analysis

💡 Best Practices

Image Preprocessing Recommendations

Prompt Optimization

Error Handling

🔧 Advanced Features

1. Streaming Output

2. Multi-turn Conversation

3. Combined with Function Calling

📊 Performance Comparison

🚨 Important Notes

🔗 Related Resources