LLM Engineer: From Local Setup to Production · Lesson 7
Inference Optimization: vLLM, Batching, Flash Attention
vLLM PagedAttention, continuous batching, Flash Attention 2, and GPTQ/AWQ quantization for production inference.
vLLM PagedAttention, continuous batching, Flash Attention 2, and GPTQ/AWQ quantization for production inference.