What the Claude Agent SDK Actually Is
The Claude Agent SDK is Anthropic's framework for building production AI agents on top of the Claude API. It sits above the raw API — which handles individual request/response cycles — and provides the scaffolding for agentic workflows: persistent loops, tool execution, sub-agent orchestration, and context management.
If you've used the Claude API directly, you've already built the core of what the Agent SDK formalises. The SDK takes the patterns that every serious team invents independently — the run loop, the tool registry, the message thread management — and provides a standard, tested implementation with production-grade features baked in.
It's available in Python and TypeScript. Python is the dominant choice for backend agent systems; TypeScript for full-stack deployments where agents integrate with Node.js application infrastructure.
The Claude Agent SDK is part of the broader Anthropic Python SDK (anthropic package). Install it with pip install anthropic. Agent-specific classes are under the anthropic.lib.agents namespace. TypeScript users: npm install @anthropic-ai/sdk.
Core SDK Concepts
The Agent Class
The Agent class is the central abstraction. It wraps a Claude model with a system prompt, a tool set, and execution configuration. You instantiate an agent once and call it repeatedly with different tasks.
from anthropic import Anthropic
from anthropic.lib.agents import Agent, tool
client = Anthropic()
# Define a tool using the @tool decorator
@tool
def search_documents(query: str, top_k: int = 5) -> list[dict]:
"""
Search the internal document store for relevant documents.
Args:
query: Natural language search query
top_k: Number of results to return (default 5, max 20)
Returns:
List of document objects with title, url, and excerpt
"""
return document_store.search(query, limit=top_k)
@tool
def create_ticket(title: str, description: str, priority: str = "medium") -> dict:
"""
Create a new support ticket in the ticketing system.
Args:
title: Short ticket title (max 120 chars)
description: Full description of the issue
priority: One of 'low', 'medium', 'high', 'critical'
Returns:
Created ticket with id, url, and status
"""
return ticketing_system.create(title=title, body=description, priority=priority)
# Create the agent
support_agent = Agent(
client=client,
model="claude-sonnet-4-6",
system="""You are a support desk agent. Your job is to:
1. Search the documentation to find answers to user questions
2. Create support tickets for issues that require engineering attention
3. Always search before creating a ticket — most questions are already answered
Tone: professional, concise. Never speculate — only report what the docs say.""",
tools=[search_documents, create_ticket],
max_tokens=2048,
max_iterations=15
)Running an Agent
The run() method executes the full agent loop — send task to Claude, handle tool calls, loop until completion or termination condition. It returns an AgentResult object with the final output, tool call history, and token usage.
# Synchronous run
result = support_agent.run("A user is getting a 403 error when trying to access /api/reports")
print(result.output) # Final text response
print(result.tool_calls) # List of all tool calls made
print(result.usage.total_tokens) # Token count for the run
# Async run (recommended for production)
import asyncio
async def handle_support_request(user_message: str):
result = await support_agent.arun(user_message)
return result.output
# Streaming (for real-time UI feedback)
async def stream_support_response(user_message: str):
async for event in support_agent.astream(user_message):
if event.type == "text_delta":
yield event.delta
elif event.type == "tool_use_start":
yield f"\n[Searching: {event.tool_name}...]\n"Sub-Agent Architecture
The most powerful feature of the Claude Agent SDK is sub-agents: the ability for one agent to spawn and coordinate other agents. This is the foundation of multi-agent systems and the basis of complex enterprise agent deployments.
In the SDK model, a parent (orchestrator) agent has access to a tool that creates and runs child (sub) agents. The orchestrator doesn't execute tasks directly — it delegates them. Sub-agents receive specific instructions, run their own tool loops, and return results to the orchestrator.
from anthropic.lib.agents import Agent, tool, SubAgentRunner
# Define specialist sub-agents
research_agent = Agent(
client=client,
model="claude-sonnet-4-6",
system="You are a research specialist. Search for and summarise information accurately.",
tools=[web_search, read_document, extract_data],
max_iterations=10
)
writer_agent = Agent(
client=client,
model="claude-sonnet-4-6",
system="You are a professional business writer. Turn research into clear, compelling reports.",
tools=[format_document, check_grammar],
max_iterations=8
)
validator_agent = Agent(
client=client,
model="claude-opus-4-6", # Opus for quality validation
system="You are a quality validator. Check documents for accuracy, completeness and logical consistency.",
tools=[verify_facts, check_sources],
max_iterations=5
)
# Create orchestrator with sub-agent tools
runner = SubAgentRunner(client=client)
@tool
def delegate_research(topic: str, depth: str = "comprehensive") -> str:
"""Run the research specialist agent on a given topic."""
result = runner.run(research_agent, f"Research {depth}ly: {topic}")
return result.output
@tool
def delegate_writing(research_notes: str, format: str = "executive brief") -> str:
"""Run the writing specialist to produce a document from research notes."""
result = runner.run(writer_agent, f"Write a {format} based on:\n{research_notes}")
return result.output
@tool
def delegate_validation(document: str) -> str:
"""Run the validator agent to quality-check a document."""
result = runner.run(validator_agent, f"Validate this document:\n{document}")
return result.output
orchestrator = Agent(
client=client,
model="claude-opus-4-6",
system="""You are an orchestrator. For research tasks:
1. First delegate research to the research specialist
2. Then delegate writing to the writing specialist
3. Finally, delegate validation to the validator
4. Synthesise the validated output into a final deliverable.""",
tools=[delegate_research, delegate_writing, delegate_validation],
max_iterations=12
)Tool Design Best Practices
The quality of your agent's behaviour depends more on how you design tools than on how you write system prompts. The @tool decorator generates the JSON Schema that Claude uses to understand what each tool does and how to call it. Docstrings are not optional — they're the tool's interface contract.
| Principle | Good | Bad |
|---|---|---|
| Specificity | query_invoices(vendor, date_range, status) |
database_query(sql) |
| Return types | Structured dict/list with consistent schema | Raw HTML or unformatted text |
| Error handling | Return {"error": "message", "code": "NOT_FOUND"} |
Raise an exception (breaks the agent loop) |
| Idempotency | Read operations are always safe to retry | Write operations that can double-execute |
| Scope | One tool, one function | Mega-tools with 15 parameters and complex logic |
| Documentation | Args, returns, and edge cases in docstring | Single-line "does stuff" docstring |
Handling Tool Errors Gracefully
Tool errors are inevitable in production. External APIs go down, database queries time out, files don't exist. The wrong approach is letting exceptions bubble up and crash the agent loop. The right approach is catching errors in your tool implementation and returning structured error responses that Claude can reason about:
@tool
def fetch_customer_data(customer_id: str) -> dict:
"""
Fetch customer record from CRM by ID.
Returns customer data or an error object if not found.
"""
try:
customer = crm.get_customer(customer_id)
return {
"success": True,
"customer": {
"id": customer.id,
"name": customer.name,
"email": customer.email,
"tier": customer.tier,
"account_since": customer.created_at.isoformat()
}
}
except CustomerNotFoundError:
return {"success": False, "error": "customer_not_found", "customer_id": customer_id}
except CRMTimeoutError:
return {"success": False, "error": "crm_timeout", "retry_after": 30}
except Exception as e:
logger.error(f"Unexpected CRM error for {customer_id}: {e}")
return {"success": False, "error": "unexpected_error"}When Claude receives a structured error, it can decide whether to retry, try an alternative approach, or explain the issue to the user. When it receives an exception traceback, it typically gets confused and produces lower-quality responses.
State Management and Session Continuity
By default, each agent.run() call starts with a fresh conversation. For agents that need to maintain state across multiple interactions — a project manager agent tracking ongoing work, a customer service agent remembering prior contact history — you need to manage state explicitly.
The SDK provides a Session class for in-process continuity and hooks for external state persistence:
from anthropic.lib.agents import Agent, Session
# In-memory session (single process, single conversation)
session = Session()
result1 = project_agent.run(
"Start tracking a new project: Q2 marketing campaign",
session=session
)
result2 = project_agent.run(
"Add a task: design new landing page, owner: Sarah, due April 15",
session=session # Same session — agent remembers the project context
)
# Persistent session (survives process restarts)
from anthropic.lib.agents import PersistentSession
session = PersistentSession(
session_id=f"project-{project_id}",
storage=redis_storage # Your storage backend
)
# Session is automatically saved after each run
result = project_agent.run("What's the current status of the Q2 campaign?", session=session)Integrating with MCP Servers
The Claude Agent SDK integrates natively with the Model Context Protocol. Rather than defining tools manually in Python, you can connect an agent to an MCP server and it will automatically discover and expose all the server's tools.
This is the preferred pattern for enterprise agents that need to access internal systems — Salesforce, Jira, SharePoint, internal APIs — because the MCP server handles authentication, access control, and tool schema generation. You connect the agent; the MCP handles the rest.
from anthropic.lib.agents import Agent, MCPServerConnection
# Connect agent to MCP servers
salesforce_mcp = MCPServerConnection(
url="https://mcp.internal.company.com/salesforce",
auth_token=os.environ["MCP_AUTH_TOKEN"]
)
jira_mcp = MCPServerConnection(
url="https://mcp.internal.company.com/jira",
auth_token=os.environ["MCP_AUTH_TOKEN"]
)
# Agent automatically gets all tools from both MCP servers
sales_ops_agent = Agent(
client=client,
model="claude-sonnet-4-6",
system="You are a sales operations agent. Use Salesforce for CRM data and Jira for project tracking.",
mcp_servers=[salesforce_mcp, jira_mcp] # Tools discovered automatically
)
# The agent can now call any tool exposed by either MCP server
result = sales_ops_agent.run(
"Update the Acme Corp opportunity to Closed-Won and create a Jira epic for onboarding"
)Production Configuration and Limits
Every production agent deployment needs these configuration parameters explicitly set — don't rely on defaults:
| Parameter | Default | Production Recommendation |
|---|---|---|
max_iterations | 10 | Set per task type — document tasks: 20+, lookup tasks: 5–8 |
max_tokens | 4096 | 2048 for tool-heavy agents; 8192 for writing tasks |
timeout | 120s | 30–60s per iteration; separate overall task timeout |
retry_on_overload | True | Keep true; add jitter via retry_config |
context_trim_strategy | None | Set to "progressive_summarise" for long tasks |
log_level | WARNING | INFO in production; DEBUG only for debugging |
Set per-session token budgets for high-volume agent deployments. The SDK supports a token_budget parameter that will gracefully terminate an agent run if it exceeds the budget, returning whatever partial result has been accumulated rather than throwing an error.
Testing Claude Agent SDK Implementations
Agent testing requires a different approach than unit testing. Individual tool calls are deterministic and easy to test. Agent behaviour — which tools the agent chooses to call, in which order, with which parameters — is non-deterministic and context-dependent.
Effective agent testing uses three layers. Tool unit tests verify each tool returns correct results for given inputs. Scenario tests run the full agent against representative task sets and evaluate output quality with a scoring rubric. Regression tests replay past conversations and check that behaviour hasn't drifted after prompt or model changes. The AI Agent Evaluation guide covers this in full.
The Agent SDK includes a MockAgent class for testing orchestrator logic without real API calls — the mock returns pre-scripted tool call sequences, letting you test your orchestrator's handling of specific tool response combinations.
Next Steps in the Agent Architecture Series
This guide covers the SDK fundamentals. For specific deployment patterns:
- Enterprise AI Agent Architecture: Design Patterns & Security — the full architecture reference
- Building Multi-Agent Systems with Claude — orchestration at scale
- Claude AI Agents for Customer Service — domain-specific implementation
- AI Agent Evaluation & Testing — quality measurement in production
- MCP Protocol Guide — connecting agents to enterprise systems
- AI Agent Development Services — work with our certified architects
Deploy Claude Agents in Your Enterprise
From SDK setup to production orchestration — our Claude Certified Architects build and deploy agent systems that integrate with your existing infrastructure and meet enterprise security requirements.