LLM Engineer: From Local Setup to Production · Lesson 2
Embeddings and RAG Pipeline
Building RAG with sentence-transformers, FAISS/ChromaDB, chunking strategies, and a 15-line Python example.
Practical exercise
What to do after this lesson
Take any PDF, chunk it at 512 tokens with overlap 50, index with FAISS, and ask 3 questions. Compare LLM answer quality with and without RAG.
Ready-to-use prompt
Template for this lesson
Copy and adapt to your context. Text in angle brackets should be replaced.
Answer the question using only the provided context. Context: {{context}}. Question: {{question}}. If the answer is not in the context, say so.Common mistakes
What people get wrong
- Chunks too large (>1000 tokens) — the LLM loses focus on the relevant part. 2. No overlap — semantic connections across chunk boundaries are lost.
Pro tips
What works but no one documents
- Parent-document retrieval: index small chunks but pass the parent large document to the LLM. 2. `model.encode(batch_size=64)` — batching speeds up indexing by 10×.