gokulaprasanth9697.g@gmail.com
← Back to Writing

A Deep Dive into RAG: What It Actually Does

Retrieval-Augmented Generation is one of the most practical LLM patterns to emerge in the last two years. The idea is simple: instead of relying on a model's parametric memory, you fetch relevant context at query time and stuff it into the prompt.

The pipeline

At a high level: chunk documents, embed chunks, index embeddings, then at query time embed query, find nearest neighbours, inject into prompt, and generate answer. Each of those steps is a place things can go wrong.

0 Comments

Leave a comment