The Claude API is priced per token โ input and output tokens billed separately, at rates that vary by model. If you're planning an enterprise deployment and haven't modelled your API costs before committing to an architecture, you're flying blind. We've seen companies build their entire data pipeline on Claude Opus only to discover the monthly bill makes the economics unworkable. And we've seen teams default to Haiku for everything, then wonder why quality is inconsistent.
This guide breaks down Claude API pricing across every model and access method, shows you how to calculate real costs for common enterprise workloads, and explains the five optimisation strategies that consistently reduce API spend by 60โ80%. If you need help modelling costs for your specific use case, our Claude API integration team does this as part of every engagement.
Note on pricing: Anthropic updates Claude API pricing periodically. This guide reflects rates as of Q1 2026. Always verify current rates at console.anthropic.com or Anthropic's official pricing page before finalising cost models. Enterprise customers with high volumes should contact Anthropic's sales team for committed use discounts.
How Claude API Pricing Works
The Claude API uses a per-token pricing model. Every text interaction is measured in tokens โ roughly 3โ4 characters each, or about 0.75 words per token. A 1,000-word document is approximately 750โ1,000 tokens. Pricing is expressed as cost per million tokens (MTok), billed separately for input (everything you send to Claude) and output (everything Claude generates back).
Input tokens include: your system prompt, the full conversation history, any documents or data you inject, and tool definitions. Output tokens are the response Claude generates. In most applications, output tokens are shorter than input tokens โ but they cost more per token because generation is computationally more expensive than processing. The combined formula is: Total cost = (input_tokens ร input_rate) + (output_tokens ร output_rate).
Current Pricing by Model
As of March 2026, Claude API is available through three models at the following published rates. Note that prompt caching and batch API apply separate pricing multipliers on top of these base rates.
| Model | Model String | Input (per MTok) | Output (per MTok) | Context |
|---|---|---|---|---|
| Claude Opus 4 | claude-opus-4-6 |
$15.00 | $75.00 | 200K |
| Claude Sonnet 4 | claude-sonnet-4-6 |
$3.00 | $15.00 | 200K |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 |
$0.80 | $4.00 | 200K |
The cost ratio between Opus and Haiku is roughly 18:1 for input tokens and nearly 19:1 for output tokens. This means that for a workload you can handle with Haiku rather than Opus, you're spending 18โ19x more than necessary. For most enterprises processing millions of tokens per day, this is the single biggest cost variable in your Claude API budget.
Prompt Caching Pricing
Prompt caching changes the economics of applications that send the same large context repeatedly โ system prompts, document corpora, tool definitions. When you mark tokens as cacheable and Claude has already processed that prefix, subsequent requests use cached tokens at a fraction of the full input rate.
| Model | Cache Write (per MTok) | Cache Read (per MTok) | Reduction vs Standard Input |
|---|---|---|---|
| Claude Opus 4 | $18.75 (1.25ร) | $1.50 (0.10ร) | 90% cheaper on reads |
| Claude Sonnet 4 | $3.75 (1.25ร) | $0.30 (0.10ร) | 90% cheaper on reads |
| Claude Haiku 4.5 | $1.00 (1.25ร) | $0.08 (0.10ร) | 90% cheaper on reads |
Cache writes cost 25% more than standard input (you pay a small premium to populate the cache). Cache reads cost 90% less than standard input. The break-even point is when your cached prefix is read more than ~1.25 times on average โ which happens almost immediately for any high-volume application. For applications with large stable contexts used across many requests, prompt caching is not optional โ it's the architecture. Read the full prompt caching guide for implementation details and advanced patterns.
10,000 requests/day ร 30,000-token context. Without caching: 300M input tokens/day ร $3.00/MTok = $900/day. With caching (1 write + 9,999 reads): (30K ร $3.75) + (30K ร 9,999 ร $0.30/MTok) = $0.11 + $90.00 โ $90/day. Savings: ~$810/day or $296K/year on Sonnet alone.
Batch API Pricing
The Batch API is Anthropic's asynchronous processing tier. Requests submitted to the Batch API are processed over a longer time window (up to 24 hours) in exchange for a 50% discount on all token costs. There is no quality difference โ the same models, same capabilities, same output โ just asynchronous delivery.
| Model | Batch Input (per MTok) | Batch Output (per MTok) | Discount |
|---|---|---|---|
| Claude Opus 4 | $7.50 | $37.50 | 50% off |
| Claude Sonnet 4 | $1.50 | $7.50 | 50% off |
| Claude Haiku 4.5 | $0.40 | $2.00 | 50% off |
Batch API is the right choice for: nightly classification pipelines, bulk document processing, model evaluation runs, report generation that doesn't require real-time output, and any workload where latency is measured in hours rather than milliseconds. If 40โ60% of your Claude API workload is non-real-time, switching those workloads to batch should be the first item on your cost optimisation roadmap. Our streaming vs batching decision guide helps you categorise your workloads.
Real Cost Scenarios for Common Enterprise Workloads
Abstract pricing tables don't help you build a business case. Here are four real workload scenarios with calculated monthly costs, showing the difference between naive and optimised architectures.
Scenario 1: Customer Support Chatbot
Scenario 2: Legal Contract Analysis
Scenario 3: Nightly Data Classification
Scenario 4: Code Review Assistant
5 Cost Optimisation Strategies
Every enterprise Claude API deployment we've worked on has reduced its API bill by at least 50% after optimisation. These are the five strategies, ordered by typical impact.
Strategy 1: Model routing based on task complexity
Build a lightweight task classifier at the front of your pipeline. The classifier (which can itself be a Haiku call at negligible cost) categorises incoming requests as simple (extraction, classification, short generation) or complex (multi-step reasoning, synthesis, high-stakes analysis). Route simple tasks to Haiku, standard tasks to Sonnet, complex tasks to Opus. For most enterprise workloads, 60โ70% of requests are simple enough for Haiku, 25โ35% are Sonnet-appropriate, and only 5โ10% genuinely need Opus. The cost reduction compounds with volume.
Strategy 2: Aggressive prompt caching
Audit your prompt structure for cacheable prefixes. System prompts are always cacheable. Large document contexts that remain stable across requests are prime cache candidates. Tool definitions in agentic workflows are cacheable. If you have a 10,000-token system prompt that's sent 100,000 times per month, you're paying for 1 billion input tokens when prompt caching would reduce that to roughly 10 million cache-write tokens + 100 million cache-read tokens โ an 87% reduction in that prompt's cost.
Strategy 3: Batch API for async workloads
Identify every workload in your pipeline that doesn't require real-time response. Nightly reports, batch document processing, model evaluation, background enrichment, retroactive classification โ all of these should use the Batch API for an automatic 50% discount. The engineering overhead is minimal: Batch API uses the same request format as the synchronous API, just submitted as a batch file with polling for results.
Strategy 4: Output length control
Output tokens are billed at 5x the rate of input tokens for Sonnet and Opus. Every unnecessary output token is expensive at scale. Set explicit max_tokens on every API call. Use system prompt instructions to enforce concise output formats. If you're generating structured data (JSON, tables, classifications), explicitly constrain the response format. Reducing average output token count by 30% is often achievable with prompt engineering alone โ and at high volume, that's a meaningful cost reduction.
Strategy 5: Context window hygiene
Many applications accumulate conversation history without pruning, eventually sending thousands of tokens of old context that isn't relevant to the current query. Implement sliding window management: retain the last N turns plus a rolling summary of earlier turns. For document-heavy applications, use semantic chunking to send only the relevant passages rather than entire documents. Context window discipline is particularly impactful for applications with long, multi-turn conversations or multi-document analysis tasks.
Want a custom cost model for your Claude API workload?
We'll analyse your use case, model the cost at different architectures, and give you a concrete number before you commit. Our Claude API integration service includes a cost architecture review as the first deliverable.
Get a Custom Cost Model โEnterprise Pricing and Volume Discounts
Anthropic offers enterprise agreements for organisations with high and predictable API volumes. These typically involve committed use contracts with volume discounts, dedicated support, higher rate limits, and custom data processing agreements (DPAs) for regulated industries. If your monthly API spend is projected above $5,000 at standard rates, it's worth a conversation with Anthropic's enterprise sales team โ committed use discounts can materially change the economics.
For cloud-hosted access through Amazon Bedrock or Google Cloud Vertex AI, pricing is governed by those platforms' enterprise agreements and may interact with existing EDP (AWS Enterprise Discount Program) or CUD (GCP Committed Use Discount) commitments. Our Claude API enterprise guide covers the access channel decision in detail. If you're navigating an enterprise procurement for Claude API, talk to our team โ we've been through this process with procurement teams at financial services, healthcare, and enterprise software companies.
Quick Token Cost Calculator
Use this formula to quickly estimate monthly API costs for a given workload:
# Monthly Claude API cost estimate
monthly_cost = (
(avg_input_tokens * requests_per_month * input_rate_per_token)
+ (avg_output_tokens * requests_per_month * output_rate_per_token)
)
# Example: 10,000 requests/month on claude-sonnet-4-6
# Avg input: 2,000 tokens | Avg output: 400 tokens
input_cost = 2000 * 10000 * (3.00 / 1_000_000) # = $60.00/month
output_cost = 400 * 10000 * (15.00 / 1_000_000) # = $60.00/month
total_cost = $120.00/month (before caching optimisation)
# With prompt caching (1,500-token stable system prompt):
# Cache write: 1 x 1500 tokens ร $3.75/MTok = negligible
# Cache reads: 9,999 x 1500 tokens ร $0.30/MTok = $4.50/month
# Savings: ~$22.50/month (the 1,500 cache-eligible input tokens x $3.00 vs $0.30)