Prompt Cost Calculator

Estimate AI API costs for your prompts across OpenAI, Anthropic, Google, and more. Compare models, project daily and monthly spend, and find cost-saving opportunities.

~0 tokens
Your text is never sent to any server.0 chars

Model

GPT-4oselected
500
502K4K8K

Input tokens

0

from your prompt

Output tokens

500

expected per request

Total tokens

500

per request

Input rate

$2.50

per 1M tokens

Context window usage0.4%

500 of 128,000 token context window

Cost Estimates

Per request

$0.00500

Per day

$0.5000

100 requests

Per month

$15.00

30 days

Per year

$182.50

365 days

Cost Breakdown

ComponentRateTokensCost / req
Input tokens$2.50 / 1M0$0.00
Output tokens$10.00 / 1M500$0.00500
Total per request500 tokens$0.00500
Daily (100 requests)$0.5000
Monthly (30 days)$15.00
Yearly (365 days)$182.50

Cost-saving opportunities

  • Output tokens dominate your cost. Use max_tokens to cap output length when the task allows it.

All Model Pricing (reference)

25 models
ModelInput / 1MOutput / 1MCost / req
OpenAIGPT-5.5
$15.00$60.00$0.0300
OpenAIGPT-5
$10.00$40.00$0.0200
OpenAIGPT-4.1
$2.00$8.00$0.00400
OpenAIGPT-4.1 mini
$0.40$1.60$0.000800
OpenAIo4 mini
$1.10$4.40$0.00220
OpenAIo3
$2.00$8.00$0.00400
OpenAIGPT-4o
$2.50$10.00$0.00500
OpenAIGPT-4o mini
$0.15$0.60$0.000300
AnthropicClaude Opus 4.8
$5.00$25.00$0.0125
AnthropicClaude Opus 4.7
$5.00$25.00$0.0125
AnthropicClaude Sonnet 4.6
$3.00$15.00$0.00750
AnthropicClaude Haiku 4.5
$1.00$5.00$0.00250
GoogleGemini 3.1 Pro
$2.00$12.00$0.00600
GoogleGemini 3.5 Flash
$1.50$9.00$0.00450
GoogleGemini 2.5 Pro
$1.25$10.00$0.00500
GoogleGemini 2.5 Flash
$0.30$2.50$0.00125
GoogleGemini 2.5 Flash-Lite
$0.10$0.40$0.000200
MetaLlama 4 Maverick
$0.20$0.80$0.000400
MetaLlama 4 Scout
$0.10$0.40$0.000200
MistralMistral Large 3
$2.00$6.00$0.00300
MistralMistral Small 3.1
$0.10$0.30$0.000150
xAIGrok 3
$3.00$15.00$0.00750
xAIGrok 3 Mini
$0.30$0.50$0.000250
DeepSeekDeepSeek-V3
$0.27$1.10$0.000550
DeepSeekDeepSeek-R2
$0.55$2.19$0.00110

⚠ Estimates only. Prices are sourced from official provider documentation and may change without notice. Actual costs depend on usage tier, caching, batch discounts, region, and provider terms. Always verify pricing at your provider's official pricing page before production use.

How LLM API Cost is Calculated

The formula

// Cost per API request

cost =

(inputTokens / 1_000_000)

× inputPricePerMillion

+ (outputTokens / 1_000_000)

× outputPricePerMillion

Most providers price input and output tokens separately. Output tokens are typically 3–6× more expensive because generating text requires more compute than processing it.

Why output tokens matter more

  • 1. Compute asymmetry: Output generation requires a sequential forward pass per token — it cannot be parallelized like input processing.
  • 2. KV cache: Input tokens benefit from key-value caching during generation; output tokens require full attention computation.
  • 3. Practical impact: A task asking for a 2,000-token report costs significantly more than one asking for a 50-token classification label.

How to reduce LLM API costs

Right-size your model

Use Claude Haiku, GPT-4o mini, or Gemini Flash for classification, extraction, and summarization. Reserve frontier models for complex reasoning.

Enable prompt caching

Cache large fixed sections (system prompts, documents). Anthropic and OpenAI offer cache reads at 10% of standard input price.

Use the Batch API

For non-real-time tasks, the Batch API on OpenAI and Anthropic gives a 50% discount on both input and output tokens.

Cap output length

Set max_tokens to a sensible limit. Unconstrained output is the most common source of unexpectedly high API bills.

Deduplicate context

Avoid sending the same large context repeatedly. Structure conversations to minimize repeated tokens per turn.

Track and alert

Set spending alerts in your provider's dashboard. Most providers offer usage APIs — monitor token consumption per request type.

Pricing disclaimer: Prices shown are sourced from official provider documentation as reference estimates only. Actual costs may vary based on usage tier, prompt caching, batch discounts, regional pricing, and provider updates. All calculations are approximate. Always verify current pricing at your provider's official pricing page before making production or budget decisions.

Frequently Asked Questions

How is LLM API cost calculated?
LLM API pricing is based on the number of tokens processed — separately for input (your prompt) and output (the model's response). The formula is: Cost = (input tokens / 1,000,000) × input price + (output tokens / 1,000,000) × output price. Most providers publish prices per 1M tokens.
Why are output tokens more expensive than input tokens?
Generating output tokens requires more computation than processing input tokens. The model must perform a forward pass for every output token it generates, whereas input tokens are processed in parallel. Typically, output tokens cost 3–6× more than input tokens depending on the model.
How accurate are these cost estimates?
Estimates are based on official provider pricing documentation. However, actual costs can vary due to: usage tier discounts, prompt caching (up to 90% savings), batch API discounts (50% on OpenAI and Anthropic), regional pricing differences, and provider pricing changes. Always check your provider's current pricing page before production planning.
What is prompt caching and how does it save money?
Prompt caching lets you reuse previously processed portions of your prompt (like large system prompts or documents) across multiple API calls. Instead of paying full input token prices, cached reads cost 10% of the standard rate on Anthropic and similarly on OpenAI. If your prompt has a large fixed section, caching can cut input costs by up to 90%.
How can I reduce my LLM API costs?
Key strategies: (1) Use a smaller model for simpler tasks — Haiku or GPT-4o mini often work as well as larger models for classification or extraction. (2) Cap output with max_tokens to prevent unnecessarily long responses. (3) Use prompt caching for large repeated context. (4) Use the Batch API for non-real-time workloads (50% discount). (5) Trim your system prompt — every token matters at scale.
What is the cheapest model for high-volume workloads?
For high-volume production use, DeepSeek-V3 ($0.27/1M input), Gemini 2.5 Flash-Lite ($0.10/1M), GPT-4o mini ($0.15/1M), and Claude Haiku 4.5 ($1.00/1M) offer the best price-to-capability ratios. Always benchmark on your specific task before committing to a model at scale.