Semantic Text Search
Build a search engine over product reviews: embed query → cosine similarity against all documents → top-N results. Full working example on the Amazon Fine Food Reviews dataset.
Implement search_reviews on your own small dataset (≥20 records). Test queries that share no words with the documents but are semantically relevant.
Task grader
Copy and adapt to your context. Text in angle brackets should be replaced.
import pandas as pd
import numpy as np
from ast import literal_eval
from openai import OpenAI
client = OpenAI()
def get_embedding(text, model="text-embedding-3-small"):
return client.embeddings.create(
input=[text], model=model
).data[0].embedding
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def search_docs(df, query, n=3):
q_emb = get_embedding(query)
df["sim"] = df["embedding"].apply(lambda x: cosine_similarity(x, q_emb))
return df.sort_values("sim", ascending=False).head(n)