How to Deploy Claude on AWS Bedrock: Architecture, Pricing & Best Practices

Key Takeaways

Claude 3 Haiku, Sonnet, and Opus are all available on AWS Bedrock as managed inference endpoints
Bedrock eliminates the need to manage GPU infrastructure — Anthropic handles scaling
IAM roles, VPC endpoints, and AWS PrivateLink are required for enterprise-grade isolation
Bedrock pricing is token-based and varies by model — Haiku costs ~$0.25 per million input tokens
Bedrock Agents wraps Claude with tool use, memory, and knowledge base connections natively

AWS Bedrock became the first hyperscaler to offer Anthropic's Claude models as a managed inference service. For enterprises already running workloads in AWS, this is significant: it means you can call Claude API from inside your VPC, using IAM credentials you already trust, without a single byte of data touching Anthropic's direct infrastructure. The security posture is fundamentally different from calling the Anthropic API directly, and for regulated industries — financial services, healthcare, government — that difference matters enormously.

This guide covers the complete technical path from "I have an AWS account" to "I have a production Claude deployment on Bedrock." We'll cover model access, IAM configuration, API patterns, Bedrock Agents, pricing, and the architectural decisions that separate proof-of-concepts from systems that can handle thousands of concurrent enterprise users.

If you're evaluating whether to run Claude on Bedrock vs. calling the Anthropic API directly, start with this rule: if your data governance, compliance team, or legal department requires data to stay within your AWS account boundary, Bedrock is the right answer. If speed to prototype and maximum model access are the priority, the direct API wins. Our Claude AI strategy consulting service can help you make that call.

What Is AWS Bedrock and Why Use It for Claude?

AWS Bedrock is Amazon's fully managed foundation model service. You don't provision GPUs, you don't manage inference infrastructure, and you don't negotiate with Anthropic separately — AWS handles the commercial relationship and billing. From your AWS console or SDK, Claude looks like any other AWS service: you call it via API, pay for what you use, and manage access through IAM like every other AWS resource.

For enterprise AWS shops, this is a material difference. Your existing AWS security controls — SCPs, permission boundaries, CloudTrail audit logging, GuardDuty threat detection, Macie for data classification — all apply to Bedrock calls automatically. You're not running a shadow IT project through a direct API key stored somewhere in a GitHub secret. Claude runs inside your account, audited alongside everything else.

Claude Models Available on Bedrock

As of early 2026, AWS Bedrock offers access to the following Anthropic models:

Claude 3.5 Haiku — fastest and most cost-efficient; ideal for high-volume classification, triage, summarisation
Claude 3.5 Sonnet — best balance of speed and intelligence; recommended for most production workloads
Claude 3 Opus — maximum reasoning capability; use for complex analysis, legal document review, code generation
Claude 3 Haiku / Sonnet / Opus — previous generation, still available and slightly cheaper

Model availability varies by AWS region. US East (N. Virginia) and US West (Oregon) have the broadest coverage. EU West (Ireland) and AP Southeast (Singapore) have Claude 3.5 Sonnet and Haiku. Always check the Bedrock console for current regional availability before architecting your solution.

Step 1: Enabling Claude Model Access in Bedrock

By default, Bedrock models are not enabled in your AWS account. You must explicitly request access. This is not instant — it requires a brief review process that typically takes 1–24 hours for Claude models. In enterprise accounts with AWS Enterprise Support, you can request expedited access through your TAM.

How to Request Model Access

Open the AWS Console and navigate to Amazon Bedrock
In the left sidebar, click Model access
Click Manage model access
Find Anthropic in the provider list and check all Claude models you need
Review Anthropic's end-user license terms and accept
Click Request access

You'll receive an email confirmation and the console will show "Access granted" once approved. Note that model access is per-region — you must request access separately in each AWS region where you plan to deploy.

# Verify model access via CLI
aws bedrock list-foundation-models \
  --by-provider anthropic \
  --region us-east-1 \
  --query 'modelSummaries[*].{Name:modelId,Status:modelLifecycle.status}'

Step 2: IAM Roles and Permissions

This is where most enterprise Bedrock deployments either get it right or spend weeks debugging access denied errors. Claude on Bedrock requires the correct IAM permissions on the principal (user, role, or service account) making the API call.

For production deployments, the correct pattern is IAM roles attached to compute — not access keys stored as secrets. Your Lambda function, EC2 instance, ECS task, or SageMaker endpoint should have an execution role with Bedrock permissions. Never use long-lived access keys in application code.

Minimum Required Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BedrockInvokeAccess",
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0"
      ]
    }
  ]
}

Scope the Resource ARN to specific models rather than using a wildcard. This follows least-privilege principles and prevents your application role from accidentally calling more expensive models. For Bedrock Agents, you'll also need bedrock:InvokeAgent and associated S3 permissions for knowledge bases.

CloudTrail Logging

Enable CloudTrail data events for Bedrock in every production account. This creates an immutable audit log of every model invocation — who called it, from where, and what the request metadata looked like (not the content). For Claude AI governance and compliance programmes, this is table stakes.

# Enable CloudTrail for Bedrock data events
aws cloudtrail put-event-selectors \
  --trail-name your-trail-name \
  --advanced-event-selectors '[
    {
      "Name": "BedrockModelInvocation",
      "FieldSelectors": [
        {"Field": "eventCategory", "Equals": ["Data"]},
        {"Field": "resources.type", "Equals": ["AWS::Bedrock::Model"]}
      ]
    }
  ]'

Step 3: Making Your First Claude API Call on Bedrock

AWS Bedrock uses the invoke_model API, which wraps model-specific request bodies. Claude on Bedrock uses the Anthropic Messages API format — the same JSON structure you'd use with the direct Anthropic API, but wrapped in the Bedrock SDK call.

Python Example (Boto3)

import boto3
import json

client = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

model_id = "anthropic.claude-3-5-sonnet-20241022-v2:0"

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": "Summarise the key risks in this contract clause: ..."
        }
    ]
})

response = client.invoke_model(
    body=body,
    modelId=model_id,
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response.get('body').read())
print(response_body['content'][0]['text'])

Streaming Responses

For real-time interfaces — chatbots, coding assistants, document drafting tools — use streaming. Bedrock supports server-sent event streaming via invoke_model_with_response_stream. This allows your UI to render tokens as they arrive rather than waiting for the full response, which is critical for perceived performance at scale.

response = client.invoke_model_with_response_stream(
    body=body,
    modelId=model_id
)

stream = response.get('body')
if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            delta = json.loads(chunk.get('bytes').decode())
            if delta['type'] == 'content_block_delta':
                print(delta['delta']['text'], end='', flush=True)

Getting Bedrock Integration Right the First Time

Most Bedrock deployments stall at IAM configuration, VPC setup, or SDK version mismatches. Our Claude API integration service includes architecture review, IAM hardening, and production-ready code patterns. We've done this across 50+ enterprise deployments.

Book a Free Architecture Review →

Step 4: VPC Endpoints and AWS PrivateLink

By default, Bedrock API calls route over the public internet — through Anthropic's infrastructure, which is Amazon's infrastructure, but still via public IPs. For financial services, healthcare, and government workloads, you need API calls to remain within your AWS network boundary. VPC endpoints for Bedrock solve this.

AWS PrivateLink creates a private connection from your VPC to the Bedrock service endpoint without traversing the public internet. Traffic stays on Amazon's internal network. Combined with security group rules that restrict outbound traffic, you can guarantee that no Claude inference request ever leaves your VPC.

Creating a Bedrock VPC Endpoint

# Create VPC endpoint for Bedrock runtime
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxxxxxxxx \
  --service-name com.amazonaws.us-east-1.bedrock-runtime \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-xxxxxxxx subnet-yyyyyyyy \
  --security-group-ids sg-zzzzzzzz \
  --private-dns-enabled

After creating the endpoint, your Bedrock SDK calls will automatically route through it — no code changes required. The SDK uses the regional endpoint hostname, and private DNS resolution directs that to your VPC endpoint.

For enterprise security governance, also create the bedrock endpoint (not just bedrock-runtime) if your application manages model access, knowledge bases, or agents. The two endpoints serve different API surfaces.

Step 5: Bedrock Agents — Claude with Memory, Tools, and Knowledge

AWS Bedrock Agents is Amazon's native agentic AI framework built on top of Claude. It adds three capabilities that the raw invoke API doesn't provide: tool use (action groups), knowledge base retrieval (RAG), and conversation memory (session management). For enterprise use cases — an HR assistant that can look up policies, a code reviewer that can run tests, a support agent that can check order status — Bedrock Agents is the right abstraction.

Bedrock Agents Architecture

A Bedrock Agent consists of:

Foundation Model — always Claude (Sonnet is the default for most agent workloads)
Instructions — system prompt defining the agent's role, tone, and constraints
Action Groups — Lambda functions Claude can invoke as tools (via OpenAPI schema)
Knowledge Bases — S3-backed vector stores (OpenSearch Serverless) for RAG retrieval
Memory — optional session retention across conversations (not yet GA as of Q1 2026)

The agent orchestration loop is handled by AWS: Claude reads the user's input, decides which action group or knowledge base to call, executes it, reads the result, and responds. Your application code calls the agent endpoint — not Claude directly. This is a material simplification for teams without deep LLM orchestration experience.

For teams needing more control — custom orchestration, multi-agent workflows, complex state management — consider building with the Claude Agent SDK directly on the Anthropic API. Bedrock Agents is AWS-opinionated; the SDK is architecture-agnostic.

import boto3

bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

response = bedrock_agent.invoke_agent(
    agentId='your-agent-id',
    agentAliasId='TSTALIASID',
    sessionId='user-session-12345',
    inputText='What is our PTO policy for contractors?'
)

completion = ''
for event in response['completion']:
    if 'chunk' in event:
        completion += event['chunk']['bytes'].decode()

AWS Bedrock Pricing for Claude Models

Bedrock pricing for Claude is token-based and billed per 1,000 tokens. As of early 2026, indicative pricing for on-demand inference in US East (N. Virginia) is approximately:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude 3.5 Haiku	$0.80	$4.00
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Opus	$15.00	$75.00

Bedrock pricing is typically 5–10% higher than direct Anthropic API pricing, reflecting the managed infrastructure, AWS data residency guarantees, and integration with the AWS billing ecosystem. For most enterprises, this premium is justified by the compliance and procurement benefits. Always verify current pricing in the AWS Bedrock console — rates change as AWS and Anthropic negotiate commercial terms.

For high-volume workloads, AWS also offers Provisioned Throughput for Bedrock — dedicated capacity that eliminates throttling at a fixed monthly cost. At sufficient scale (typically 10M+ tokens/day), Provisioned Throughput is cheaper than on-demand. Our Claude API integration team can model the break-even analysis for your specific usage pattern.

Production Architecture Patterns

Getting Claude working on Bedrock in a development account takes an afternoon. Getting it to production — with the availability, security, observability, and cost controls your enterprise requires — takes deliberate architecture. These are the patterns we've deployed across financial services, legal, and healthcare clients.

Pattern 1: Lambda-Backed Inference Endpoint

For low-to-medium throughput use cases (under 100 requests per second), a Lambda function is the simplest production pattern. Lambda handles auto-scaling automatically, you pay only for invocations, and the IAM execution role model is cleanest here. Add API Gateway in front for rate limiting, API key management, and request validation.

Pattern 2: ECS/Fargate Application with Bedrock Calls

For high-throughput or latency-sensitive applications — a chatbot serving 10,000 concurrent users, a real-time document processing pipeline — run your application on ECS Fargate with Bedrock calls from the container. ECS task roles provide clean IAM, and the application manages connection pooling, caching, and retry logic explicitly.

Pattern 3: SageMaker Pipelines for Batch Processing

For asynchronous batch workloads — overnight contract analysis, weekly report generation, bulk document classification — SageMaker Pipelines with Bedrock calls is the right architecture. Bedrock also offers a native Batch Inference API for large file sets, with up to 50% cost savings versus on-demand invocation.

Observability Stack

Every production Bedrock deployment should instrument:

CloudWatch metrics for Bedrock: InvocationLatency, OutputTokenCount, InputTokenCount, InvocationThrottles
CloudTrail data events for audit (as configured above)
Application-level logging of prompts, model IDs, latencies, and error rates
Cost Explorer tags on Bedrock API calls for chargebacks by team/product

Without observability, Claude deployments on Bedrock become black boxes. You won't know which prompts are expensive, which models are being called, or where latency spikes originate. Our Claude enterprise implementation service includes a full observability framework from day one.

Cost Management

Bedrock costs scale linearly with token usage. Common traps that blow enterprise budgets:

Missing prompt caching — if you're sending a 50,000-token system prompt with every request, enable Claude prompt caching. It reduces costs by up to 90% on cacheable tokens.
Using Opus when Sonnet is sufficient — Opus is 5x more expensive than Sonnet. Profile which tasks actually need Opus-level reasoning and route others to Sonnet or Haiku.
Unthrottled APIs — implement rate limiting at the API Gateway layer to prevent runaway costs from bugs or abuse.

Ready to Deploy Claude on Bedrock in Production?

We've architected and deployed Claude on AWS Bedrock for regulated enterprises across financial services, healthcare, and government. From IAM hardening to Bedrock Agents to cost optimisation — our Claude API integration service covers every layer.

Talk to a Claude Architect → See Deployment Case Studies

Frequently Asked Questions

Does AWS have access to my prompts when using Bedrock?

AWS states that customer data processed through Bedrock is not used to train foundation models and is not shared with model providers (including Anthropic). Your prompts and completions are encrypted in transit and at rest. For additional assurance, enable VPC endpoints so traffic never leaves your network boundary. Review the Bedrock data privacy documentation and have legal review the AWS DPA if operating under GDPR, HIPAA, or FedRAMP requirements.

Is Claude on Bedrock the same model as Claude on the Anthropic API?

Yes — the model weights are identical. Bedrock is a deployment infrastructure layer; Anthropic trains and maintains the models. The model IDs may differ slightly (Bedrock uses versioned ARNs), but the capabilities, context windows, and safety training are the same as the Anthropic API equivalents.

What regions support Claude on Bedrock?

US East (N. Virginia), US West (Oregon), EU West (Ireland), EU Central (Frankfurt), AP Southeast (Singapore), and AP Northeast (Tokyo) all support Claude models, though specific model versions vary by region. For EU data residency requirements, Frankfurt is the recommended primary region. Always check the Bedrock console for current regional model availability before finalising your architecture.

How does Bedrock handle rate limits and throttling?

Bedrock on-demand inference has per-account, per-region rate limits (tokens per minute and requests per minute). These limits are set by AWS, vary by model, and can be increased via Service Quotas requests. For workloads requiring guaranteed throughput without throttling risk, Provisioned Throughput provides dedicated capacity at a fixed monthly commitment.

Can I use Claude extended thinking on Bedrock?

Extended thinking (Claude's deep reasoning mode) is available on Bedrock for supported Claude 3.5 model versions. It's invoked by adding a thinking parameter to the request body with a budget of tokens allocated to reasoning. See our Claude extended thinking guide for implementation details and use case guidance.

ClaudeImplementations Team

Claude Certified Architects with 50+ enterprise deployments across financial services, legal, healthcare, and manufacturing. About us →