RAG and working with documents · Lesson 2
Chunking, embeddings, retrieval
Where the typical RAG errors live and how to avoid them.
Chunking
- 500-1000 tokens per chunk — a starting point.
- An overlap of 50-100 tokens reduces "meaning breaks."
- Logical boundaries (paragraph, section) are better than arbitrary ones.
- Save metadata: doc_id, page, section.
Embeddings
- Modern models work: OpenAI text-embedding-3-large, Cohere embed v3, Voyage embed-v2.
- One project = one embeddings model (they're incompatible with each other).
- Cache embeddings — it saves a lot.
Retrieval
- Pure cosine similarity is often noisy.
- Use hybrid (BM25 + vector).
- A reranker (cohere-rerank, BGE) on a second pass significantly raises quality.
- top-K = 5-10 for most tasks.
What we measure
- Recall@K: does the actually needed chunk land in the top-K.
- Answer quality on an eval set.
- Latency.
Practical exercise
What to do after this lesson
Make an eval set of 30 "question — reference chunk" pairs. Run it on different chunking strategies. Measure Recall@10.
Ready-to-use prompt
Template for this lesson
Copy and adapt to your context. Text in angle brackets should be replaced.
Help me configure chunking. Documents: <…> Content type (legal / technical / marketing): <…> Average paragraph length: <…> Give me: - A splitting strategy. - Chunk size and overlap. - Which metadata to save. - How to validate.
Common mistakes
What people get wrong
- Chunks too large — precision is lost.
- Too small — context is lost.
- Not saving metadata.
Pro tips
What works but no one documents
- An overlap of 50-100 tokens — a simple fix against "breaks."
- A reranker almost always improves the result.
- Metadata = the ability to filter by source/tag.
When to use
Any RAG system.
When not to use
Not RAG.
Official sources
Квиз — 2 вопроса
1.What significantly raises retrieval quality?
2.What is best to store alongside a chunk?
Отвечено: 0 из 2