Claude Rate Limits by Plan: API Quotas

Claude API Rate Limits by Tier — March 2026

Anthropic enforces rate limits at the API tier level, not per-model. Limits are measured in three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). Upgrading tiers increases all three.

API Tier	Qualification	RPM (claude-sonnet)	Input TPM	Output TPM	Daily Token Limit
Free Tier New API accounts, no spend	New account creation	5 RPM	25K ITPM	5K OTPM	300K / day
Build Tier 1 After first $5 spend	$5 total API spend	50 RPM	50K ITPM	10K OTPM	1M / day
Build Tier 2 30 days + $50 spend	30-day account + $50 spend	1,000 RPM	80K ITPM	16K OTPM	10M / day
Build Tier 3 90 days + $500 spend	90-day account + $500 spend	2,000 RPM	160K ITPM	32K OTPM	30M / day
Build Tier 4 After $5,000 spend	$5,000 cumulative API spend	4,000 RPM	400K ITPM	80K OTPM	300M / day
Scale / Enterprise Negotiated	Custom commercial agreement	Custom (10K+ RPM)	Custom	Custom	Effectively unlimited

Note: RPM figures are approximate and Anthropic adjusts limits based on account standing, model selection, and capacity. Always check your actual limits via the Anthropic Console or the API response headers (x-ratelimit-limit-requests).

Rate Limits by Model (Build Tier 2 Example)

Different models have different per-model rate limits within the same API tier. Haiku has higher RPM allowances than Sonnet, which is higher than Opus. This reflects compute cost and capacity allocation.

Model	RPM (Tier 2)	Input TPM	Output TPM	Max Context / Call	Max Output / Call
claude-haiku-4-5	1,000 RPM	800K ITPM	80K OTPM	200,000 tokens	8,192 tokens
claude-sonnet-4-6	1,000 RPM	80K ITPM	16K OTPM	200,000 tokens	8,192 tokens
claude-opus-4-6	500 RPM	40K ITPM	8K OTPM	200,000 tokens	8,192 tokens
claude-opus-4-6 Extended Thinking	200 RPM	40K ITPM	8K OTPM	200,000 tokens	64,000 tokens (incl. thinking)
Batch API (any model)	100 batches / day	Unconstrained by RPM	N/A	200,000 tokens / request	8,192 tokens

claude.ai Subscription Message Limits

Rate limits on claude.ai are measured in "messages" or "usage credits" rather than tokens. The limits reset every 8 hours on Pro and Max plans. Enterprise has no usage limits.

Plan	Daily Messages (approx.)	Reset Period	Opus 4.6 Access	Extended Thinking
Claude Free	~20-30 / day Varies by model & length	Daily reset	Limited	No
Claude Pro ($20/mo)	~5× Free Priority queue access	8 hours	Yes	Yes (limited)
Claude Max ($100/mo)	~20× Free Full Opus priority	8 hours	Yes (priority)	Yes
Claude Team ($30/user)	~5× Free per seat	8 hours	Yes	Yes
Claude Enterprise	Unlimited	N/A	Yes	Yes

Message counts are approximate. Anthropic does not publish exact message limits publicly — they depend on message length and model complexity. Longer messages with large file attachments consume more "credits" than short text queries.

6 Ways to Architect Around Rate Limits

Hitting rate limits in production is an architecture problem, not just a quota problem. These patterns eliminate bottlenecks without needing to upgrade tiers.

01

Request Queuing with Backpressure

Never send requests directly to the Claude API from your frontend or synchronous handlers. Use a queue (Redis, SQS, RabbitMQ) with a worker pool that respects RPM limits. When the queue is full, apply backpressure upstream. This absorbs traffic spikes without hitting rate limit errors.

02

Exponential Backoff on 429s

A 429 (Too Many Requests) response means you've hit a rate limit. Don't immediately retry — use exponential backoff with jitter. Start at 1 second, double each retry, add random 0-1 second jitter. Cap at 60 seconds. This prevents thundering herd re-requests from other clients.

03

Model Routing by Priority

Route high-priority, real-time requests to Sonnet. Route bulk, non-urgent tasks to Haiku (higher RPM budget). Route complex reasoning tasks to Opus but queue them aggressively. Each model has separate rate limit buckets — model routing is effectively tier expansion without tier upgrade. See our model selection guide.

04

Batch API for Non-Real-Time Workloads

The Batch API has no RPM limit — it runs asynchronously with 24-hour turnaround. Any workload that doesn't need immediate response (nightly reports, document processing, data enrichment) should use batch. This moves load off your real-time RPM quota entirely.

05

Response Caching for Repeated Queries

Identical or near-identical prompts produce similar outputs. Cache responses with a hash of the input prompt as the cache key. TTL of 1-24 hours depending on how time-sensitive the content is. For FAQ-type applications, 80%+ of queries may be serveable from cache — dramatically reducing live API calls.

06

Token Budget Enforcement

ITPM (input tokens per minute) is often the first limit hit, not RPM. Enforce a per-request token budget: cap system prompts at a maximum size, limit context retrieved from RAG, truncate conversation history beyond N turns. This lets you serve more requests within the same ITPM allocation. See our token management guide.

Claude Rate Limits by Plan: API Quotas, Messages & Token Limits

Claude API Rate Limits by Tier — March 2026

Rate Limits by Model (Build Tier 2 Example)

claude.ai Subscription Message Limits

6 Ways to Architect Around Rate Limits

Request Queuing with Backpressure

Exponential Backoff on 429s

Model Routing by Priority

Batch API for Non-Real-Time Workloads

Response Caching for Repeated Queries

Token Budget Enforcement

Reading Rate Limit Headers in Python

What Happens When You Hit a Rate Limit?

Building a High-Volume Claude Application?

Claude Rate Limits by Plan: API Quotas, Messages & Token Limits

Claude API Rate Limits by Tier — March 2026

Rate Limits by Model (Build Tier 2 Example)

claude.ai Subscription Message Limits

6 Ways to Architect Around Rate Limits

Request Queuing with Backpressure

Exponential Backoff on 429s

Model Routing by Priority

Batch API for Non-Real-Time Workloads

Response Caching for Repeated Queries

Token Budget Enforcement

Reading Rate Limit Headers in Python

What Happens When You Hit a Rate Limit?

Building a High-Volume Claude Application?

Get the Claude Enterprise Weekly

Further Reading

Claude Rate Limiting and Scaling

Claude API vs OpenAI API: Developer

Claude API Error Codes Reference

Claude API for Enterprise

Claude API Pricing Explained: Models