Question 1

What is a context window?

Accepted Answer

A context window is the maximum number of tokens an AI model can process in a single API call. It includes everything: your system prompt, conversation history, documents you attach, and the model's response. When the total exceeds the limit, the API either rejects the request or silently truncates the oldest content.

Question 2

Why does my prompt fail when I add a long document?

Accepted Answer

When you add a document to your prompt, you are combining the document tokens + your instruction tokens. If the total exceeds the model's context window, the API returns a context length exceeded error. The fix is to either use a model with a larger context window, summarize the document first, or split it into smaller chunks.

Question 3

How accurate is the token estimate?

Accepted Answer

This tool uses the standard 4-characters-per-token heuristic from OpenAI's documentation. It is accurate within ±15% for English prose. Code, JSON, and non-English text can have significantly different token counts. For critical production work, use your provider's official tokenizer (tiktoken for OpenAI, the Messages API for Anthropic).

Question 4

What happens when you exceed the context window?

Accepted Answer

Behavior depends on the provider: (1) Most APIs return a 400 error with a "context length exceeded" message. (2) Some older APIs silently truncate the beginning of the conversation — older messages are dropped to make room, which can confuse the model. (3) Some providers offer automatic summarization or chunking as a paid feature. Always test near the limit before going to production.

Question 5

Which model has the largest context window?

Accepted Answer

As of 2026, Gemini 2.5 Pro and Gemini 3.1 Pro support 2M tokens (about 1,500 pages of text). GPT-4.1, Claude Sonnet 4.6, and Llama 4 Maverick support 1M tokens. Claude Haiku 4.5 supports 200K. For most tasks, 128K (GPT-4o) or 200K (Claude Sonnet) is more than sufficient.

Question 6

Does the context window include the model's response?

Accepted Answer

Yes. The context window limit covers both input tokens (your prompt) and output tokens (the model's response). If you use a 128K context window and your prompt is 100K tokens, the model can generate at most ~28K tokens in response. Always leave headroom for the output — a common rule of thumb is to use no more than 80% of the window for input.

Window	Tokens	Example models	≈ Pages
8K	8,192	GPT-3.5 Turbo, GPT-4 (8K)	~24
32K	32,768	GPT-4 (32K), Claude Instant	~98
128K	128,000	GPT-4o, GPT-5, Llama 3.3 70B	~384
200K	200,000	Claude Opus 4.8, Claude Sonnet 4.6, Haiku 4.5	~600
1M	1,000,000	GPT-4.1, Gemini 2.5 Flash, Llama 4 Maverick	~3,000
2M	2,000,000	Gemini 2.5 Pro, Gemini 3.1 Pro	~6,000

Context Window Calculator

Select context window

Understanding Context Windows

What is a context window?

Why prompts fail with long documents

Context window reference (2026)

Frequently Asked Questions

Prompt Token Counter

Prompt Cost Calculator

Context Window Calculator

Select context window

Understanding Context Windows

What is a context window?

Why prompts fail with long documents

Context window reference (2026)

Frequently Asked Questions

Related Tools

Prompt Token Counter

Prompt Cost Calculator