Context Window

Context window is the maximum amount of text (measured in tokens) that a large language model can process in a single interaction—including both the input provided and the output generated.

Context window determines how much information the AI can "see" at once. A customer service AI needs to hold: the system prompt defining behavior, the conversation history, retrieved knowledge articles, customer data, and any other relevant context—all while leaving room for generating a response. Longer conversations, more retrieved documents, more customer data all consume context window.

Practical implications: If the context window fills up, older information gets dropped. The AI might "forget" what the customer said at the start of a long conversation. Knowledge retrieval must be selective—you can't feed the AI your entire knowledge base, only the most relevant chunks. Complex workflows that require reasoning across many data points need sufficient context to hold all relevant information.

Context window sizes have expanded dramatically with newer models, from thousands of tokens to hundreds of thousands. This shifts the constraint from "can the AI fit the context" to "can we efficiently find and structure the relevant context." Larger windows don't automatically mean better performance—the AI still needs help finding what matters in a sea of potentially relevant information.

Related terms: AI agent memory, Multi-turn conversation, Retrieval augmented generation