Claude Agent SDK Guide: Build Production AI Agents

What the Claude Agent SDK Actually Is

The Claude Agent SDK is Anthropic's framework for building production AI agents on top of the Claude API. It sits above the raw API — which handles individual request/response cycles — and provides the scaffolding for agentic workflows: persistent loops, tool execution, sub-agent orchestration, and context management.

If you've used the Claude API directly, you've already built the core of what the Agent SDK formalises. The SDK takes the patterns that every serious team invents independently — the run loop, the tool registry, the message thread management — and provides a standard, tested implementation with production-grade features baked in.

It's available in Python and TypeScript. Python is the dominant choice for backend agent systems; TypeScript for full-stack deployments where agents integrate with Node.js application infrastructure.

Claude Agent SDK

The Claude Agent SDK is part of the broader Anthropic Python SDK (anthropic package). Install it with pip install anthropic. Agent-specific classes are under the anthropic.lib.agents namespace. TypeScript users: npm install @anthropic-ai/sdk.

Core SDK Concepts

The Agent Class

The Agent class is the central abstraction. It wraps a Claude model with a system prompt, a tool set, and execution configuration. You instantiate an agent once and call it repeatedly with different tasks.

python
from anthropic import Anthropic
from anthropic.lib.agents import Agent, tool

client = Anthropic()

# Define a tool using the @tool decorator
@tool
def search_documents(query: str, top_k: int = 5) -> list[dict]:
    """
    Search the internal document store for relevant documents.

    Args:
        query: Natural language search query
        top_k: Number of results to return (default 5, max 20)
    Returns:
        List of document objects with title, url, and excerpt
    """
    return document_store.search(query, limit=top_k)

@tool
def create_ticket(title: str, description: str, priority: str = "medium") -> dict:
    """
    Create a new support ticket in the ticketing system.

    Args:
        title: Short ticket title (max 120 chars)
        description: Full description of the issue
        priority: One of 'low', 'medium', 'high', 'critical'
    Returns:
        Created ticket with id, url, and status
    """
    return ticketing_system.create(title=title, body=description, priority=priority)

# Create the agent
support_agent = Agent(
    client=client,
    model="claude-sonnet-4-6",
    system="""You are a support desk agent. Your job is to:
    1. Search the documentation to find answers to user questions
    2. Create support tickets for issues that require engineering attention
    3. Always search before creating a ticket — most questions are already answered

    Tone: professional, concise. Never speculate — only report what the docs say.""",
    tools=[search_documents, create_ticket],
    max_tokens=2048,
    max_iterations=15
)

Running an Agent

The run() method executes the full agent loop — send task to Claude, handle tool calls, loop until completion or termination condition. It returns an AgentResult object with the final output, tool call history, and token usage.

python
# Synchronous run
result = support_agent.run("A user is getting a 403 error when trying to access /api/reports")

print(result.output)          # Final text response
print(result.tool_calls)      # List of all tool calls made
print(result.usage.total_tokens)  # Token count for the run

# Async run (recommended for production)
import asyncio

async def handle_support_request(user_message: str):
    result = await support_agent.arun(user_message)
    return result.output

# Streaming (for real-time UI feedback)
async def stream_support_response(user_message: str):
    async for event in support_agent.astream(user_message):
        if event.type == "text_delta":
            yield event.delta
        elif event.type == "tool_use_start":
            yield f"\n[Searching: {event.tool_name}...]\n"

Sub-Agent Architecture

The most powerful feature of the Claude Agent SDK is sub-agents: the ability for one agent to spawn and coordinate other agents. This is the foundation of multi-agent systems and the basis of complex enterprise agent deployments.

In the SDK model, a parent (orchestrator) agent has access to a tool that creates and runs child (sub) agents. The orchestrator doesn't execute tasks directly — it delegates them. Sub-agents receive specific instructions, run their own tool loops, and return results to the orchestrator.

python
from anthropic.lib.agents import Agent, tool, SubAgentRunner

# Define specialist sub-agents
research_agent = Agent(
    client=client,
    model="claude-sonnet-4-6",
    system="You are a research specialist. Search for and summarise information accurately.",
    tools=[web_search, read_document, extract_data],
    max_iterations=10
)

writer_agent = Agent(
    client=client,
    model="claude-sonnet-4-6",
    system="You are a professional business writer. Turn research into clear, compelling reports.",
    tools=[format_document, check_grammar],
    max_iterations=8
)

validator_agent = Agent(
    client=client,
    model="claude-opus-4-6",  # Opus for quality validation
    system="You are a quality validator. Check documents for accuracy, completeness and logical consistency.",
    tools=[verify_facts, check_sources],
    max_iterations=5
)

# Create orchestrator with sub-agent tools
runner = SubAgentRunner(client=client)

@tool
def delegate_research(topic: str, depth: str = "comprehensive") -> str:
    """Run the research specialist agent on a given topic."""
    result = runner.run(research_agent, f"Research {depth}ly: {topic}")
    return result.output

@tool
def delegate_writing(research_notes: str, format: str = "executive brief") -> str:
    """Run the writing specialist to produce a document from research notes."""
    result = runner.run(writer_agent, f"Write a {format} based on:\n{research_notes}")
    return result.output

@tool
def delegate_validation(document: str) -> str:
    """Run the validator agent to quality-check a document."""
    result = runner.run(validator_agent, f"Validate this document:\n{document}")
    return result.output

orchestrator = Agent(
    client=client,
    model="claude-opus-4-6",
    system="""You are an orchestrator. For research tasks:
    1. First delegate research to the research specialist
    2. Then delegate writing to the writing specialist
    3. Finally, delegate validation to the validator
    4. Synthesise the validated output into a final deliverable.""",
    tools=[delegate_research, delegate_writing, delegate_validation],
    max_iterations=12
)

Tool Design Best Practices

The quality of your agent's behaviour depends more on how you design tools than on how you write system prompts. The @tool decorator generates the JSON Schema that Claude uses to understand what each tool does and how to call it. Docstrings are not optional — they're the tool's interface contract.

Principle	Good	Bad
Specificity	`query_invoices(vendor, date_range, status)`	`database_query(sql)`
Return types	Structured dict/list with consistent schema	Raw HTML or unformatted text
Error handling	Return `{"error": "message", "code": "NOT_FOUND"}`	Raise an exception (breaks the agent loop)
Idempotency	Read operations are always safe to retry	Write operations that can double-execute
Scope	One tool, one function	Mega-tools with 15 parameters and complex logic
Documentation	Args, returns, and edge cases in docstring	Single-line "does stuff" docstring

Handling Tool Errors Gracefully

Tool errors are inevitable in production. External APIs go down, database queries time out, files don't exist. The wrong approach is letting exceptions bubble up and crash the agent loop. The right approach is catching errors in your tool implementation and returning structured error responses that Claude can reason about:

python
@tool
def fetch_customer_data(customer_id: str) -> dict:
    """
    Fetch customer record from CRM by ID.
    Returns customer data or an error object if not found.
    """
    try:
        customer = crm.get_customer(customer_id)
        return {
            "success": True,
            "customer": {
                "id": customer.id,
                "name": customer.name,
                "email": customer.email,
                "tier": customer.tier,
                "account_since": customer.created_at.isoformat()
            }
        }
    except CustomerNotFoundError:
        return {"success": False, "error": "customer_not_found", "customer_id": customer_id}
    except CRMTimeoutError:
        return {"success": False, "error": "crm_timeout", "retry_after": 30}
    except Exception as e:
        logger.error(f"Unexpected CRM error for {customer_id}: {e}")
        return {"success": False, "error": "unexpected_error"}

When Claude receives a structured error, it can decide whether to retry, try an alternative approach, or explain the issue to the user. When it receives an exception traceback, it typically gets confused and produces lower-quality responses.

State Management and Session Continuity

By default, each agent.run() call starts with a fresh conversation. For agents that need to maintain state across multiple interactions — a project manager agent tracking ongoing work, a customer service agent remembering prior contact history — you need to manage state explicitly.

The SDK provides a Session class for in-process continuity and hooks for external state persistence:

python
from anthropic.lib.agents import Agent, Session

# In-memory session (single process, single conversation)
session = Session()

result1 = project_agent.run(
    "Start tracking a new project: Q2 marketing campaign",
    session=session
)

result2 = project_agent.run(
    "Add a task: design new landing page, owner: Sarah, due April 15",
    session=session  # Same session — agent remembers the project context
)

# Persistent session (survives process restarts)
from anthropic.lib.agents import PersistentSession

session = PersistentSession(
    session_id=f"project-{project_id}",
    storage=redis_storage  # Your storage backend
)

# Session is automatically saved after each run
result = project_agent.run("What's the current status of the Q2 campaign?", session=session)

Integrating with MCP Servers

The Claude Agent SDK integrates natively with the Model Context Protocol. Rather than defining tools manually in Python, you can connect an agent to an MCP server and it will automatically discover and expose all the server's tools.

This is the preferred pattern for enterprise agents that need to access internal systems — Salesforce, Jira, SharePoint, internal APIs — because the MCP server handles authentication, access control, and tool schema generation. You connect the agent; the MCP handles the rest.

python
from anthropic.lib.agents import Agent, MCPServerConnection

# Connect agent to MCP servers
salesforce_mcp = MCPServerConnection(
    url="https://mcp.internal.company.com/salesforce",
    auth_token=os.environ["MCP_AUTH_TOKEN"]
)

jira_mcp = MCPServerConnection(
    url="https://mcp.internal.company.com/jira",
    auth_token=os.environ["MCP_AUTH_TOKEN"]
)

# Agent automatically gets all tools from both MCP servers
sales_ops_agent = Agent(
    client=client,
    model="claude-sonnet-4-6",
    system="You are a sales operations agent. Use Salesforce for CRM data and Jira for project tracking.",
    mcp_servers=[salesforce_mcp, jira_mcp]  # Tools discovered automatically
)

# The agent can now call any tool exposed by either MCP server
result = sales_ops_agent.run(
    "Update the Acme Corp opportunity to Closed-Won and create a Jira epic for onboarding"
)

Production Configuration and Limits

Every production agent deployment needs these configuration parameters explicitly set — don't rely on defaults:

Parameter	Default	Production Recommendation
`max_iterations`	10	Set per task type — document tasks: 20+, lookup tasks: 5–8
`max_tokens`	4096	2048 for tool-heavy agents; 8192 for writing tasks
`timeout`	120s	30–60s per iteration; separate overall task timeout
`retry_on_overload`	True	Keep true; add jitter via `retry_config`
`context_trim_strategy`	None	Set to `"progressive_summarise"` for long tasks
`log_level`	WARNING	INFO in production; DEBUG only for debugging

Cost Management

Set per-session token budgets for high-volume agent deployments. The SDK supports a token_budget parameter that will gracefully terminate an agent run if it exceeds the budget, returning whatever partial result has been accumulated rather than throwing an error.

Testing Claude Agent SDK Implementations

Agent testing requires a different approach than unit testing. Individual tool calls are deterministic and easy to test. Agent behaviour — which tools the agent chooses to call, in which order, with which parameters — is non-deterministic and context-dependent.

Effective agent testing uses three layers. Tool unit tests verify each tool returns correct results for given inputs. Scenario tests run the full agent against representative task sets and evaluate output quality with a scoring rubric. Regression tests replay past conversations and check that behaviour hasn't drifted after prompt or model changes. The AI Agent Evaluation guide covers this in full.

The Agent SDK includes a MockAgent class for testing orchestrator logic without real API calls — the mock returns pre-scripted tool call sequences, letting you test your orchestrator's handling of specific tool response combinations.

Next Steps in the Agent Architecture Series

This guide covers the SDK fundamentals. For specific deployment patterns:

Enterprise AI Agent Architecture: Design Patterns & Security — the full architecture reference
Building Multi-Agent Systems with Claude — orchestration at scale
Claude AI Agents for Customer Service — domain-specific implementation
AI Agent Evaluation & Testing — quality measurement in production
MCP Protocol Guide — connecting agents to enterprise systems
AI Agent Development Services — work with our certified architects

Deploy Claude Agents in Your Enterprise

From SDK setup to production orchestration — our Claude Certified Architects build and deploy agent systems that integrate with your existing infrastructure and meet enterprise security requirements.

Book an Agent Architecture Call See Agent Development →