Q&A via Search API and HyDE Re-ranking
Build RAG on top of an external Search API (NewsAPI): model-generated query variants, article retrieval, HyDE (Hypothetical Document Embeddings) re-ranking, streamed answer with source citations.
Architecture: Search → Re-rank → Answer
Instead of a vector database, we use an external Search API. The pipeline has three steps:
- Search — the model generates multiple query variants
- Re-rank — filter by semantic similarity using HyDE
- Answer — generate a response with source links
Step 1: Query Expansion
from openai import OpenAI
import json
client = OpenAI()
GPT_MODEL = "gpt-3.5-turbo"
def json_gpt(prompt: str) -> dict:
response = client.chat.completions.create(
model=GPT_MODEL,
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
USER_QUESTION = "Who won the NBA championship? Who was the MVP?"
QUERIES_INPUT = f"""Generate search queries for: {USER_QUESTION}
Format: {{"queries": ["q1", "q2", "q3"]}}"""
queries = json_gpt(QUERIES_INPUT)["queries"]
Step 2: HyDE Re-ranking
HyDE (Hypothetical Document Embeddings): the model invents an ideal answer, then we find articles similar to that hypothetical answer rather than to the question itself.
from numpy import dot
def embeddings(texts: list[str]) -> list[list[float]]:
response = client.embeddings.create(
model="text-embedding-ada-002", input=texts
)
return [d.embedding for d in response.data]
HA_INPUT = f"""Generate a hypothetical answer to: {USER_QUESTION}
Use placeholders like NAME did X. Format: {{"hypotheticalAnswer": "..."}}"""
hyp_answer = json_gpt(HA_INPUT)["hypotheticalAnswer"]
hyp_emb = embeddings([hyp_answer])[0]
article_embs = embeddings([f"{a['title']} {a['description']}" for a in articles])
scores = [dot(hyp_emb, e) for e in article_embs]
top_articles = sorted(zip(articles, scores), key=lambda x: x[1], reverse=True)[:5]
Step 3: Generate the Answer
formatted = [{"title": a["title"], "url": a["url"]} for a, _ in top_articles]
ANSWER_INPUT = f"Answer based on TOP_RESULTS: {formatted}\nQUESTION: {USER_QUESTION}"
stream = client.chat.completions.create(
model=GPT_MODEL,
messages=[{"role": "user", "content": ANSWER_INPUT}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
OpenAI embeddings are returned normalized, so dot product == cosine similarity — no extra normalization needed.
Implement a Q&A pipeline on top of any public Search API (e.g., Wikipedia API or DuckDuckGo). Run 3 questions. Compare answer quality with HyDE versus without (direct keyword search).
Copy and adapt to your context. Text in angle brackets should be replaced.
from openai import OpenAI
from numpy import dot
import json
client = OpenAI()
def json_gpt(prompt):
r = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role":"user","content":prompt}],
response_format={"type":"json_object"},
)
return json.loads(r.choices[0].message.content)
def get_embedding(text):
return client.embeddings.create(
model="text-embedding-ada-002", input=[text]
).data[0].embedding
def hyde_rerank(question, articles):
hyp = json_gpt(f'Hypothetical answer to: {question}. Format: {{"hypotheticalAnswer":"..."}}')["hypotheticalAnswer"]
h_emb = get_embedding(hyp)
scored = [(a, dot(h_emb, get_embedding(a["title"]))) for a in articles]
return sorted(scored, key=lambda x: x[1], reverse=True)[:5]Searching by the raw question text — users write questions, but good answers are phrased differently. HyDE solves this. Not deduplicating articles from multiple query variants — the same article can occupy several context slots.
Normalized OpenAI embeddings: dot product == cosine similarity, saves computation. Generate 5-10 query variants — coverage improves significantly compared to a single query.
Answering questions against live data (news, documentation) where no pre-built vector database exists. Augmenting any existing search engine.
Closed corporate data without an external Search API. When exact citations are required — HyDE may select a semantically close but factually different source.