context window
A context window is the maximum span of tokens an LLM can jointly attend to in one pass, including both input (prompt) and output (generated content).
Because this window is finite, overly long inputs or requested responses may be truncated or refused, so systems often summarize, chunk, or retrieve relevant subsets to fit the limit. Vendors specify their maximum window size, and most count both prompt plus response tokens toward that limit.
Some modern models support very large context windows—hundreds of thousands or even up to millions of tokens—, though in practice performance often degrades over long contexts. To mitigate this degradation, models may use extended positional encodings, scaling, compression, retrieval, or attention redesign techniques.
By Leodanis Pozo Ramos • Updated Oct. 15, 2025