Large language model (LLM)

A large language model (LLM) is a type of AI model trained on vast amounts of text data that can understand, generate, and reason about natural language. Models like GPT-4, Claude, Gemini, and Llama are examples. LLMs are the core technology enabling the current generation of AI customer service agents.

LLMs differ from earlier NLP approaches in a fundamental way: they don't just classify or extract information from text — they can generate contextually appropriate responses, reason through multi-step problems, and handle the ambiguity inherent in natural language. This is what makes modern AI agents capable of genuine conversations rather than scripted interactions.

In customer service applications, LLMs are used for:

  • Conversation handling: Understanding customer messages and generating appropriate responses

  • Reasoning: Breaking down complex requests into steps and determining the right course of action

  • Knowledge retrieval: Understanding which information is relevant to a given query and synthesizing it into a coherent answer

  • Summarization: Condensing long conversation histories or ticket notes into concise summaries for handoffs

Key considerations for CX leaders when evaluating LLM-based AI solutions:

  • Accuracy vs. fluency: LLMs can generate convincing-sounding responses that are factually wrong (hallucinations). The system architecture matters more than the model choice.

  • Cost: LLM inference has a per-token cost that scales with conversation length and volume. Pricing models should account for this.

  • Latency: Response generation time varies by model size and complexity. For real-time chat, latency matters.

  • Data privacy: Understanding how customer data flows through LLM inference, whether conversations are used for model training, and where data is processed is critical for regulated industries.

Related terms: AI agent, AI hallucinations, natural language processing