What an embedding is
An embedding is a fixed-length dense vector (e.g., 1536 or 3072 numbers) that encodes the meaning of text. Semantically similar fragments sit close together in this space even if they share no words. That is the basis of semantic search: instead of keyword matching we look for nearby vectors.
import os
from openai import OpenAI
client = OpenAI()
resp = client.embeddings.create(
model="text-embedding-3-large",
input=["How do I reset my password?", "Recovering account access"],
dimensions=1536, # 3-large is natively 3072, can be truncated
)
v1 = resp.data[0].embedding
v2 = resp.data[1].embedding
print(len(v1)) # 1536
Similarity metrics
- Cosine similarity — the angle between vectors, ignores magnitude. The default for text embeddings.
- Dot product — accounts for both angle and magnitude. If vectors are L2-normalized, dot product == cosine. Most OpenAI/Cohere embeddings are normalized, so the metrics coincide and dot is faster.
- Euclidean (L2) — straight-line distance. For normalized vectors it is monotonically related to cosine, but for text in production you almost always use cosine or dot.
import numpy as np
def cosine(a, b):
a, b = np.array(a), np.array(b)
return a @ b / (np.linalg.norm(a) * np.linalg.norm(b))
print(round(cosine(v1, v2), 3)) # high similarity: different words, same meaning
Which model to pick
| Model | Dimensions | Notes |
|---|---|---|
| text-embedding-3-large | up to 3072 (truncatable) | best OpenAI quality, supports Matryoshka truncation |
| text-embedding-3-small | up to 1536 | cheaper, quality close to ada-002 |
| text-embedding-ada-002 | 1536 (fixed) | legacy, not truncatable, not needed for new code |
| Cohere Embed v3 | 1024 | strong multilingual, has input_type (search_document/search_query) |
| all-MiniLM-L6-v2 | 384 | local, free, sentence-transformers, for prototypes and private data |
The key feature of text-embedding-3-*: the dimensions parameter truncates the vector (Matryoshka Representation Learning) — you can shrink dimensionality with minimal quality loss and save memory and search cost. ada-002 has no such option.
Dimensions: the trade-off
Higher dimensionality means better accuracy but more storage and slower search. 3072 → 256 cuts memory ~12x and recall typically drops by single-digit percent. For most production RAG, 1024–1536 is a reasonable balance.