Embeddings: Basic Usage
Obtain vector representations of text via the OpenAI API. We cover the text-embedding-3-small model, exponential backoff with tenacity, and best practices for batch processing.
Write a get_embedding function with @retry, obtain embeddings for 5 different phrases, and compute cosine similarity between them. Compare semantically close and distant pairs.
Task grader
Copy and adapt to your context. Text in angle brackets should be replaced.
from tenacity import retry, wait_random_exponential, stop_after_attempt
from openai import OpenAI
import numpy as np
client = OpenAI()
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def get_embedding(text: str, model="text-embedding-3-small") -> list[float]:
return client.embeddings.create(
input=[text], model=model
).data[0].embedding
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
texts = [
"The cat sat on the mat",
"A feline rested on the rug",
"Python is a programming language",
]
embeddings = [get_embedding(t) for t in texts]
print(cosine_similarity(embeddings[0], embeddings[1])) # ~0.92
print(cosine_similarity(embeddings[0], embeddings[2])) # ~0.6