Context Window Calculator

Paste any text and instantly see how much of your selected context window it fills. Warns at 70%, 85%, and 100% β€” before you hit the API limit.

~0 tokens
0 characters Β· 0 wordsText never leaves your browser

Select context window

Models: GPT-4o, GPT-4 Turbo, Llama 3.3 70B

Context usage0.0%
70%
85%
100%
0 tokens used128,000 token limit

Est. Tokens

0

~4 chars/token

Context Limit

128,000

128K window

Remaining Tokens

128,000

Remaining Chars

~512,000

estimated

Understanding Context Windows

What is a context window?

Think of the context window as the model's working memory β€” the total amount of text it can "see" at once. Every token in your conversation counts toward this limit: system prompt, chat history, attached documents, and the model's own replies.

When you exceed this limit, the API either rejects your request with an error, or (in older APIs) silently drops the oldest messages β€” which can cause the model to forget earlier context and give inconsistent answers.

Why prompts fail with long documents

// Total tokens sent to API

total = system_prompt_tokens

+ document_tokens

+ user_message_tokens

+ expected_output_tokens

// If total > context_limit β†’ Error!

Output tokens count too β€” always leave room for the model's response (a common rule: use max 80% of the window for input).

Context window reference (2026)

Window Tokens Example models
8K 8,192 GPT-3.5 Turbo, GPT-4 (8K)
32K 32,768 GPT-4 (32K), Claude Instant
128K 128,000 GPT-4o, GPT-5, Llama 3.3 70B
200K 200,000 Claude Opus 4.8, Claude Sonnet 4.6, Haiku 4.5
1M 1,000,000 GPT-4.1, Gemini 2.5 Flash, Llama 4 Maverick
2M 2,000,000 Gemini 2.5 Pro, Gemini 3.1 Pro

Pages estimated at ~250 words/page Β· 1 token β‰ˆ 0.75 words

Frequently Asked Questions

What is a context window?
A context window is the maximum number of tokens an AI model can process in a single API call. It includes everything: your system prompt, conversation history, documents you attach, and the model's response. When the total exceeds the limit, the API either rejects the request or silently truncates the oldest content.
Why does my prompt fail when I add a long document?
When you add a document to your prompt, you are combining the document tokens + your instruction tokens. If the total exceeds the model's context window, the API returns a context length exceeded error. The fix is to either use a model with a larger context window, summarize the document first, or split it into smaller chunks.
How accurate is the token estimate?
This tool uses the standard 4-characters-per-token heuristic from OpenAI's documentation. It is accurate within Β±15% for English prose. Code, JSON, and non-English text can have significantly different token counts. For critical production work, use your provider's official tokenizer (tiktoken for OpenAI, the Messages API for Anthropic).
What happens when you exceed the context window?
Behavior depends on the provider: (1) Most APIs return a 400 error with a "context length exceeded" message. (2) Some older APIs silently truncate the beginning of the conversation β€” older messages are dropped to make room, which can confuse the model. (3) Some providers offer automatic summarization or chunking as a paid feature. Always test near the limit before going to production.
Which model has the largest context window?
As of 2026, Gemini 2.5 Pro and Gemini 3.1 Pro support 2M tokens (about 1,500 pages of text). GPT-4.1, Claude Sonnet 4.6, and Llama 4 Maverick support 1M tokens. Claude Haiku 4.5 supports 200K. For most tasks, 128K (GPT-4o) or 200K (Claude Sonnet) is more than sufficient.
Does the context window include the model's response?
Yes. The context window limit covers both input tokens (your prompt) and output tokens (the model's response). If you use a 128K context window and your prompt is 100K tokens, the model can generate at most ~28K tokens in response. Always leave headroom for the output β€” a common rule of thumb is to use no more than 80% of the window for input.