Gemini Embedding 2 Preview: First Natively Multimodal Embedding Model
Google’s first natively multimodal embedding model Gemini Embedding 2 Preview is now on APIYI — unified vector space for text/image/video/audio/PDF, MTEB English #1 at 68.32, 3072-dim MRL, text at $0.20/M tokens.
On March 10, 2026, Google officially launched Gemini Embedding 2 Preview — the first natively multimodal embedding model in the Gemini series. Unlike the text-only text-embedding-004 and gemini-embedding-001, Gemini Embedding 2 maps text, images, video, audio, and PDF documents into a single unified vector space, enabling true cross-modal semantic retrieval.This means you can search for relevant images using text, or retrieve matching documents using an image — all modalities share the same vector representation without separate processing.APIYI has launched gemini-embedding-2-preview, accessible via the OpenAI-compatible /v1/embeddings endpoint.
Gemini Embedding 2’s vector space is incompatible with previous versions. You cannot mix embeddings from different model versions — migration requires regenerating all embeddings.
texts = [ "Latest trends in artificial intelligence", "Machine learning applications in healthcare", "How large language models work"]response = client.embeddings.create( model="gemini-embedding-2-preview", input=texts, dimensions=1536)for i, data in enumerate(response.data): print(f"Text {i}: {len(data.embedding)} dimensions")
Text pricing is slightly higher than OpenAI’s text-embedding-3 series, but Gemini Embedding 2 is the only model supporting unified 5-modality embeddings — no additional models needed for cross-modal retrieval.
APIYI offers deposit bonuses — the more you deposit, the bigger the bonus. Combined with the model’s competitive pricing, your effective cost is even lower.
Gemini Embedding 2 Preview is the most powerful embedding model available today and the industry’s first natively multimodal embedding model. It tops the MTEB English leaderboard while supporting unified vector representations across five modalities, opening entirely new possibilities for cross-modal retrieval.Core Advantages:
Multimodal Unity: Text/image/video/audio/PDF share one vector space — one model for all retrieval
Performance Leader: MTEB 68.32 #1, major leads in classification, retrieval, and clustering
Flexible Dimensions: MRL supports 128–3072, balance precision vs. cost as needed
Extended Input: 8192 tokens, 4x the previous generation
Usage Recommendations:
Cross-modal retrieval: Gemini Embedding 2 is the only choice — and the best one
Text-only + lowest cost: text-embedding-3-small remains the cheapest option
Text-only + high accuracy: Gemini Embedding 2 at 768-dim already surpasses text-embedding-3-large
RAG scenarios: 8192 token input + flexible dimensions, ideal for large document chunking and retrieval
Retrieval systems seeking the highest embedding quality
Sources: Google official blog (blog.google), Google AI Developer docs (ai.google.dev), MTEB leaderboard. Gemini Embedding 2 Preview launched March 10, 2026. Data retrieved: March 31, 2026.