Q&A via Search API and HyDE Re-ranking
Build RAG on top of an external Search API (NewsAPI): model-generated query variants, article retrieval, HyDE (Hypothetical Document Embeddings) re-ranking, streamed answer with source citations.
Implement a Q&A pipeline on top of any public Search API (e.g., Wikipedia API or DuckDuckGo). Run 3 questions. Compare answer quality with HyDE versus without (direct keyword search).
Task grader
Copy and adapt to your context. Text in angle brackets should be replaced.
from openai import OpenAI
from numpy import dot
import json
client = OpenAI()
def json_gpt(prompt):
r = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role":"user","content":prompt}],
response_format={"type":"json_object"},
)
return json.loads(r.choices[0].message.content)
def get_embedding(text):
return client.embeddings.create(
model="text-embedding-ada-002", input=[text]
).data[0].embedding
def hyde_rerank(question, articles):
hyp = json_gpt(f'Hypothetical answer to: {question}. Format: {{"hypotheticalAnswer":"..."}}')["hypotheticalAnswer"]
h_emb = get_embedding(hyp)
scored = [(a, dot(h_emb, get_embedding(a["title"]))) for a in articles]
return sorted(scored, key=lambda x: x[1], reverse=True)[:5]