Claude API ยท Pricing Guide

Claude API Pricing Explained: Models, Tokens & Cost Optimisation 2026

The Claude API is priced per token โ€” input and output tokens billed separately, at rates that vary by model. If you're planning an enterprise deployment and haven't modelled your API costs before committing to an architecture, you're flying blind. We've seen companies build their entire data pipeline on Claude Opus only to discover the monthly bill makes the economics unworkable. And we've seen teams default to Haiku for everything, then wonder why quality is inconsistent.

This guide breaks down Claude API pricing across every model and access method, shows you how to calculate real costs for common enterprise workloads, and explains the five optimisation strategies that consistently reduce API spend by 60โ€“80%. If you need help modelling costs for your specific use case, our Claude API integration team does this as part of every engagement.

Note on pricing: Anthropic updates Claude API pricing periodically. This guide reflects rates as of Q1 2026. Always verify current rates at console.anthropic.com or Anthropic's official pricing page before finalising cost models. Enterprise customers with high volumes should contact Anthropic's sales team for committed use discounts.

How Claude API Pricing Works

The Claude API uses a per-token pricing model. Every text interaction is measured in tokens โ€” roughly 3โ€“4 characters each, or about 0.75 words per token. A 1,000-word document is approximately 750โ€“1,000 tokens. Pricing is expressed as cost per million tokens (MTok), billed separately for input (everything you send to Claude) and output (everything Claude generates back).

Input tokens include: your system prompt, the full conversation history, any documents or data you inject, and tool definitions. Output tokens are the response Claude generates. In most applications, output tokens are shorter than input tokens โ€” but they cost more per token because generation is computationally more expensive than processing. The combined formula is: Total cost = (input_tokens ร— input_rate) + (output_tokens ร— output_rate).

Current Pricing by Model

As of March 2026, Claude API is available through three models at the following published rates. Note that prompt caching and batch API apply separate pricing multipliers on top of these base rates.

Model Model String Input (per MTok) Output (per MTok) Context
Claude Opus 4 claude-opus-4-6 $15.00 $75.00 200K
Claude Sonnet 4 claude-sonnet-4-6 $3.00 $15.00 200K
Claude Haiku 4.5 claude-haiku-4-5-20251001 $0.80 $4.00 200K

The cost ratio between Opus and Haiku is roughly 18:1 for input tokens and nearly 19:1 for output tokens. This means that for a workload you can handle with Haiku rather than Opus, you're spending 18โ€“19x more than necessary. For most enterprises processing millions of tokens per day, this is the single biggest cost variable in your Claude API budget.

Prompt Caching Pricing

Prompt caching changes the economics of applications that send the same large context repeatedly โ€” system prompts, document corpora, tool definitions. When you mark tokens as cacheable and Claude has already processed that prefix, subsequent requests use cached tokens at a fraction of the full input rate.

Model Cache Write (per MTok) Cache Read (per MTok) Reduction vs Standard Input
Claude Opus 4 $18.75 (1.25ร—) $1.50 (0.10ร—) 90% cheaper on reads
Claude Sonnet 4 $3.75 (1.25ร—) $0.30 (0.10ร—) 90% cheaper on reads
Claude Haiku 4.5 $1.00 (1.25ร—) $0.08 (0.10ร—) 90% cheaper on reads

Cache writes cost 25% more than standard input (you pay a small premium to populate the cache). Cache reads cost 90% less than standard input. The break-even point is when your cached prefix is read more than ~1.25 times on average โ€” which happens almost immediately for any high-volume application. For applications with large stable contexts used across many requests, prompt caching is not optional โ€” it's the architecture. Read the full prompt caching guide for implementation details and advanced patterns.

Real example: Document review application with a 30,000-token context

10,000 requests/day ร— 30,000-token context. Without caching: 300M input tokens/day ร— $3.00/MTok = $900/day. With caching (1 write + 9,999 reads): (30K ร— $3.75) + (30K ร— 9,999 ร— $0.30/MTok) = $0.11 + $90.00 โ‰ˆ $90/day. Savings: ~$810/day or $296K/year on Sonnet alone.

Batch API Pricing

The Batch API is Anthropic's asynchronous processing tier. Requests submitted to the Batch API are processed over a longer time window (up to 24 hours) in exchange for a 50% discount on all token costs. There is no quality difference โ€” the same models, same capabilities, same output โ€” just asynchronous delivery.

Model Batch Input (per MTok) Batch Output (per MTok) Discount
Claude Opus 4 $7.50 $37.50 50% off
Claude Sonnet 4 $1.50 $7.50 50% off
Claude Haiku 4.5 $0.40 $2.00 50% off

Batch API is the right choice for: nightly classification pipelines, bulk document processing, model evaluation runs, report generation that doesn't require real-time output, and any workload where latency is measured in hours rather than milliseconds. If 40โ€“60% of your Claude API workload is non-real-time, switching those workloads to batch should be the first item on your cost optimisation roadmap. Our streaming vs batching decision guide helps you categorise your workloads.

Real Cost Scenarios for Common Enterprise Workloads

Abstract pricing tables don't help you build a business case. Here are four real workload scenarios with calculated monthly costs, showing the difference between naive and optimised architectures.

Scenario 1: Customer Support Chatbot

Volume50K conversations/mo
Avg input tokens800 (system + history)
Avg output tokens200
Model (naive)Sonnet
Naive cost/mo~$270
With Haiku + caching~$48
Savings82%

Scenario 2: Legal Contract Analysis

Volume2,000 contracts/mo
Avg input tokens40,000 (doc + prompt)
Avg output tokens1,500
Model (naive)Opus
Naive cost/mo~$1,425
With Sonnet + caching~$261
Savings82%

Scenario 3: Nightly Data Classification

Volume500K records/mo
Avg input tokens300
Avg output tokens20
Model (naive)Sonnet
Naive cost/mo~$600
With Haiku batch~$50
Savings92%

Scenario 4: Code Review Assistant

Volume10K PR reviews/mo
Avg input tokens4,000 (diff + context)
Avg output tokens600
Model (naive)Opus
Naive cost/mo~$1,050
With Sonnet + caching~$189
Savings82%

5 Cost Optimisation Strategies

Every enterprise Claude API deployment we've worked on has reduced its API bill by at least 50% after optimisation. These are the five strategies, ordered by typical impact.

Strategy 1: Model routing based on task complexity

Build a lightweight task classifier at the front of your pipeline. The classifier (which can itself be a Haiku call at negligible cost) categorises incoming requests as simple (extraction, classification, short generation) or complex (multi-step reasoning, synthesis, high-stakes analysis). Route simple tasks to Haiku, standard tasks to Sonnet, complex tasks to Opus. For most enterprise workloads, 60โ€“70% of requests are simple enough for Haiku, 25โ€“35% are Sonnet-appropriate, and only 5โ€“10% genuinely need Opus. The cost reduction compounds with volume.

Strategy 2: Aggressive prompt caching

Audit your prompt structure for cacheable prefixes. System prompts are always cacheable. Large document contexts that remain stable across requests are prime cache candidates. Tool definitions in agentic workflows are cacheable. If you have a 10,000-token system prompt that's sent 100,000 times per month, you're paying for 1 billion input tokens when prompt caching would reduce that to roughly 10 million cache-write tokens + 100 million cache-read tokens โ€” an 87% reduction in that prompt's cost.

Strategy 3: Batch API for async workloads

Identify every workload in your pipeline that doesn't require real-time response. Nightly reports, batch document processing, model evaluation, background enrichment, retroactive classification โ€” all of these should use the Batch API for an automatic 50% discount. The engineering overhead is minimal: Batch API uses the same request format as the synchronous API, just submitted as a batch file with polling for results.

Strategy 4: Output length control

Output tokens are billed at 5x the rate of input tokens for Sonnet and Opus. Every unnecessary output token is expensive at scale. Set explicit max_tokens on every API call. Use system prompt instructions to enforce concise output formats. If you're generating structured data (JSON, tables, classifications), explicitly constrain the response format. Reducing average output token count by 30% is often achievable with prompt engineering alone โ€” and at high volume, that's a meaningful cost reduction.

Strategy 5: Context window hygiene

Many applications accumulate conversation history without pruning, eventually sending thousands of tokens of old context that isn't relevant to the current query. Implement sliding window management: retain the last N turns plus a rolling summary of earlier turns. For document-heavy applications, use semantic chunking to send only the relevant passages rather than entire documents. Context window discipline is particularly impactful for applications with long, multi-turn conversations or multi-document analysis tasks.

Want a custom cost model for your Claude API workload?

We'll analyse your use case, model the cost at different architectures, and give you a concrete number before you commit. Our Claude API integration service includes a cost architecture review as the first deliverable.

Get a Custom Cost Model โ†’

Enterprise Pricing and Volume Discounts

Anthropic offers enterprise agreements for organisations with high and predictable API volumes. These typically involve committed use contracts with volume discounts, dedicated support, higher rate limits, and custom data processing agreements (DPAs) for regulated industries. If your monthly API spend is projected above $5,000 at standard rates, it's worth a conversation with Anthropic's enterprise sales team โ€” committed use discounts can materially change the economics.

For cloud-hosted access through Amazon Bedrock or Google Cloud Vertex AI, pricing is governed by those platforms' enterprise agreements and may interact with existing EDP (AWS Enterprise Discount Program) or CUD (GCP Committed Use Discount) commitments. Our Claude API enterprise guide covers the access channel decision in detail. If you're navigating an enterprise procurement for Claude API, talk to our team โ€” we've been through this process with procurement teams at financial services, healthcare, and enterprise software companies.

Quick Token Cost Calculator

Use this formula to quickly estimate monthly API costs for a given workload:

# Monthly Claude API cost estimate
monthly_cost = (
    (avg_input_tokens * requests_per_month * input_rate_per_token)
    + (avg_output_tokens * requests_per_month * output_rate_per_token)
)

# Example: 10,000 requests/month on claude-sonnet-4-6
# Avg input: 2,000 tokens | Avg output: 400 tokens
input_cost  = 2000 * 10000 * (3.00 / 1_000_000)   # = $60.00/month
output_cost = 400  * 10000 * (15.00 / 1_000_000)   # = $60.00/month
total_cost  = $120.00/month  (before caching optimisation)

# With prompt caching (1,500-token stable system prompt):
# Cache write: 1 x 1500 tokens ร— $3.75/MTok = negligible
# Cache reads: 9,999 x 1500 tokens ร— $0.30/MTok = $4.50/month
# Savings: ~$22.50/month (the 1,500 cache-eligible input tokens x $3.00 vs $0.30)
๐Ÿ’ฐ
ClaudeImplementation Team

Claude Certified Architects who have modelled and optimised API costs for 50+ enterprise deployments. We know where the budget goes. Learn about our team โ†’