Context, Memory, and Tokens

We unpack what context is, how tokens are counted, and why the model 'forgets' the start of a conversation.

15 min read2 questions in quizReady prompt includedIn progress

The context window

Context is everything the model sees when it generates an answer: the system prompt, the conversation history, the documents you attached. The size of this "window" is measured in tokens.

One token is roughly 0.75 of an English word. A long document of 100,000 words is about 130,000-150,000 tokens. Models with a small context simply can't "see" an entire long document — it doesn't fit.

What happens across different models

ChatGPT: 128k-400k tokens depending on the plan.
Claude: up to 200k tokens on Pro, up to 1M on Enterprise/API.
Gemini: up to 1M tokens on Pro, experimentally up to 2M.

Why the model "forgets"

In a long conversation, older messages can get pushed out by newer ones. Some clients (for example, ChatGPT in the browser) automatically compress old history. Others simply cut off old messages once the window runs out.

What "memory" is

Don't confuse the context window with assistant "memory". Memory is a separate feature: the model saves facts about you between chats in a dedicated store. It's just notes carefully mixed back into the system prompt — and it has tight limits, not "remembers everything".

Practical exercise

What to do after this lesson

Open your favorite assistant. Ask what its maximum context size is. Then attach a long document and check whether the model actually answers with the end of the document in mind (ask it to quote the last paragraph).

Task grader

Open your favorite assistant. Ask what its maximum context size is. Then attach a long document and check whether the model actually answers with the end of the document in mind (ask it to quote the last paragraph).

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

I'm working with a long document. Before answering my question, quote the last two sentences of the document. I need this as proof that you can see it all the way to the end, not just the first pages.

Question:
<...>

Prompt sandbox

Prompt

Common mistakes

What people get wrong

They attach a 500-page document to a small-context model and are surprised the answer is inaccurate.
They think "memory" = "remembers everything". In reality it's a narrow feature with limits.
They don't realize that non-English text is "more expensive" in tokens than English.

Pro tips

What works but no one documents

For long documents, use Claude or Gemini, not short-context models.
If the document still doesn't fit, walk through it in chunks first, then build a summary "map".
In long chats, periodically ask the model to write its own brief summary of the current context.

When to use

Working with long documents, legal contracts, code, analytics — anywhere you need to keep the whole context in one window.

When not to use

When there are more documents than fit in the window. In that case you need RAG or staged processing.

Official sources

Quiz — 2 questions

1.What is measured in tokens?

2.If a model has a 200k-token context and you paste in a 500k-token document, what happens?

Answered: 0 of 2

← What a Large Language Model Is Model Limits and Hallucinations →

Discussion

No comments yet. Be the first!