OpenAI Embeddings: Complete Guide · Lesson 2

Semantic Text Search

Build a search engine over product reviews: embed query → cosine similarity against all documents → top-N results. Full working example on the Amazon Fine Food Reviews dataset.

25 min read3 questions in quizReady prompt includedIn progress

Practical exercise

What to do after this lesson

Implement search_reviews on your own small dataset (≥20 records). Test queries that share no words with the documents but are semantically relevant.

Task grader

Implement search_reviews on your own small dataset (≥20 records). Test queries that share no words with the documents but are semantically relevant.

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

import pandas as pd
import numpy as np
from ast import literal_eval
from openai import OpenAI

client = OpenAI()

def get_embedding(text, model="text-embedding-3-small"):
    return client.embeddings.create(
        input=[text], model=model
    ).data[0].embedding

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def search_docs(df, query, n=3):
    q_emb = get_embedding(query)
    df["sim"] = df["embedding"].apply(lambda x: cosine_similarity(x, q_emb))
    return df.sort_values("sim", ascending=False).head(n)

Prompt sandbox

Prompt

Report a bug

Semantic Text Search

Task grader

Prompt sandbox

Quiz — 3 questions

Discussion

The Idea Behind Semantic Search

Loading Data with Pre-computed Embeddings

Search Function

Example Queries