Retrieval-augmented generation (RAG)
Retrieval-augmented generation (RAG) is an AI architecture that improves the accuracy of language model responses by retrieving relevant information from external data sources before generating a response. Instead of relying solely on what the model learned during training, RAG pulls current, verified information from a knowledge base or document store and uses it to ground the response.
In customer service, RAG addresses one of the biggest risks of using large language models: hallucination. Without RAG, an LLM might generate plausible-sounding but incorrect information about a company's policies, pricing, or procedures. With RAG, the LLM references the company's actual documentation when formulating responses.
The RAG process works in three steps:
Retrieval: When a customer asks a question, the system searches the knowledge base for relevant documents, articles, or data
Augmentation: The retrieved information is included in the LLM's context alongside the customer's message
Generation: The LLM generates a response grounded in the retrieved information rather than its parametric knowledge
RAG quality depends heavily on:
Knowledge base quality: RAG can only retrieve what exists. Gaps, outdated content, or conflicting articles in the knowledge base directly degrade response quality.
Retrieval accuracy: The system must find the right documents for the given question. Poor retrieval means the LLM generates responses grounded in irrelevant information.
Chunking strategy: How documents are split and indexed affects whether the retrieval system can find the specific paragraph that answers the customer's question.
For CX teams, RAG is one of those architectural decisions that's invisible when it works and obvious when it doesn't. The practical question is not whether the AI uses RAG (most do), but how well the RAG pipeline is implemented and maintained.
Related terms: knowledge base, large language model, AI hallucinations



