OpenAI RAG & Production Patterns · Lesson 2

Q&A via Search API and HyDE Re-ranking

Build RAG on top of an external Search API (NewsAPI): model-generated query variants, article retrieval, HyDE (Hypothetical Document Embeddings) re-ranking, streamed answer with source citations.

25 min read3 questions in quizReady prompt includedIn progress

Practical exercise

What to do after this lesson

Implement a Q&A pipeline on top of any public Search API (e.g., Wikipedia API or DuckDuckGo). Run 3 questions. Compare answer quality with HyDE versus without (direct keyword search).

Task grader

Implement a Q&A pipeline on top of any public Search API (e.g., Wikipedia API or DuckDuckGo). Run 3 questions. Compare answer quality with HyDE versus without (direct keyword search).

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

from openai import OpenAI
from numpy import dot
import json

client = OpenAI()

def json_gpt(prompt):
    r = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role":"user","content":prompt}],
        response_format={"type":"json_object"},
    )
    return json.loads(r.choices[0].message.content)

def get_embedding(text):
    return client.embeddings.create(
        model="text-embedding-ada-002", input=[text]
    ).data[0].embedding

def hyde_rerank(question, articles):
    hyp = json_gpt(f'Hypothetical answer to: {question}. Format: {{"hypotheticalAnswer":"..."}}')["hypotheticalAnswer"]
    h_emb = get_embedding(hyp)
    scored = [(a, dot(h_emb, get_embedding(a["title"]))) for a in articles]
    return sorted(scored, key=lambda x: x[1], reverse=True)[:5]

Step 1: Query Expansion

from openai import OpenAI import json client = OpenAI() GPT_MODEL = "gpt-3.5-turbo" def json_gpt(prompt: str) -> dict: response = client.chat.completions.create( model=GPT_MODEL, messages=[{"role": "user", "content": prompt}], response_format={"type": "json_object"}, ) return json.loads(response.choices[0].message.content) USER_QUESTION = "Who won the NBA championship? Who was the MVP?" QUERIES_INPUT = f"""Generate search queries for: {USER_QUESTION} Format: {{"queries": ["q1", "q2", "q3"]}}""" queries = json_gpt(QUERIES_INPUT)["queries"]

Step 2: HyDE Re-ranking

HyDE (Hypothetical Document Embeddings): the model invents an ideal answer, then we find articles similar to that hypothetical answer rather than to the question itself.

from numpy import dot def embeddings(texts: list[str]) -> list[list[float]]: response = client.embeddings.create( model="text-embedding-ada-002", input=texts ) return [d.embedding for d in response.data] HA_INPUT = f"""Generate a hypothetical answer to: {USER_QUESTION} Use placeholders like NAME did X. Format: {{"hypotheticalAnswer": "..."}}""" hyp_answer = json_gpt(HA_INPUT)["hypotheticalAnswer"] hyp_emb = embeddings([hyp_answer])[0] article_embs = embeddings([f"{a['title']} {a['description']}" for a in articles]) scores = [dot(hyp_emb, e) for e in article_embs] top_articles = sorted(zip(articles, scores), key=lambda x: x[1], reverse=True)[:5]

Step 3: Generate the Answer

formatted = [{"title": a["title"], "url": a["url"]} for a, _ in top_articles] ANSWER_INPUT = f"Answer based on TOP_RESULTS: {formatted}\nQUESTION: {USER_QUESTION}" stream = client.chat.completions.create( model=GPT_MODEL, messages=[{"role": "user", "content": ANSWER_INPUT}], stream=True, ) for chunk in stream: print(chunk.choices[0].delta.content or "", end="")

OpenAI embeddings are returned normalized, so dot product == cosine similarity — no extra normalization needed.

Report a bug

Q&A via Search API and HyDE Re-ranking

Task grader

Prompt sandbox

Quiz — 3 questions

Discussion

Architecture: Search → Re-rank → Answer

Step 1: Query Expansion

Step 2: HyDE Re-ranking

Step 3: Generate the Answer