Key Takeaways

Extended Thinking is a reasoning mode available on Claude Opus 4.6 — Claude "thinks out loud" before producing its final response
It materially improves performance on multi-step logic, complex analysis, and tasks requiring the consideration of many interdependent variables
The thinking_budget_tokens parameter controls how much thinking Claude does — start at 5,000–10,000 tokens for most use cases
Extended Thinking adds cost and latency — use it selectively on high-value tasks, not on every API call
You can access the thinking content in the response — useful for building systems that show reasoning chains to users or auditors

What Claude Extended Thinking Actually Does

Claude Extended Thinking is not a prompt technique. It is a first-class API feature that activates a separate reasoning mode in Claude Opus 4.6. When you enable it, Claude works through the problem in an internal scratchpad — weighing alternatives, checking its own logic, reconsidering assumptions — before producing a final response. You can optionally receive this thinking content in the API response alongside the final answer.

The practical effect is that Claude Extended Thinking shifts Claude's performance profile on difficult reasoning tasks. Standard Claude Opus excels at a very wide range of tasks. Extended Thinking further improves performance specifically on tasks that benefit from systematic deliberation: multi-step mathematical reasoning, complex logical analysis, tasks that require considering many interdependent factors simultaneously, and problems where the "obvious" first answer is often wrong.

It is not a universal improvement. On simple tasks — summarisation, classification, extraction — Extended Thinking adds cost and latency without improving output quality. The skill of using it effectively is knowing which problems belong in which category. Our Claude Extended Thinking product guide covers the full capability in depth. This tutorial focuses on the practical implementation and use case selection.

When to Use Extended Thinking (and When Not To)

The decision to use Extended Thinking should be made at the system design level, not per-call. Design your application so Extended Thinking is always on for tasks that benefit from it, and always off for tasks that do not. Trying to decide at runtime adds complexity without benefit.

✓ Good Fit for Extended Thinking

Complex financial modelling with many variables and constraints
Legal analysis requiring consideration of multiple precedents and implications
Architecture and design decisions with significant trade-offs
Mathematical problem-solving and proof verification
Debugging complex systems where the root cause is non-obvious
Risk analysis with interdependent scenarios
Strategic planning documents requiring systematic evaluation
Competitive analysis with many factors to weigh

✗ Poor Fit for Extended Thinking

Document summarisation (the answer is in the text)
Simple classification tasks (high cost, no quality gain)
Creative writing and copywriting
Data extraction and formatting
FAQ responses from a knowledge base
Translation tasks
High-volume, low-stakes batch processing
Real-time chat interfaces requiring sub-2-second responses

API Configuration: The Extended Thinking Parameters

Extended Thinking is activated by adding a thinking block to your API request. The critical parameter is budget_tokens, which controls the maximum number of tokens Claude can use for its internal reasoning. This is separate from your max_tokens parameter, which controls the final response length.

Importantly, budget_tokens must be less than max_tokens. Claude will not always use the full budget — it uses as much thinking as the problem requires. Setting a generous budget does not guarantee it will be used; it just sets the ceiling. Start conservative and increase based on observed quality.

        Python · Basic Extended Thinking Configuration
        import anthropic

client = anthropic.Anthropic()

def call_with_extended_thinking(
    prompt: str,
    thinking_budget: int = 8000,
    max_output_tokens: int = 16000,
    include_thinking_in_response: bool = False
) -> dict:
    """
    Call Claude with Extended Thinking enabled.

    Args:
        prompt: The user prompt
        thinking_budget: Max tokens for Claude's internal reasoning (5,000-100,000)
        max_output_tokens: Max tokens for final response (must be > thinking_budget)
        include_thinking_in_response: Whether to return the thinking content

    Returns:
        dict with 'response' (final answer) and optionally 'thinking' (reasoning chain)
    """

    # Validate: max_output_tokens must exceed thinking_budget
    if max_output_tokens <= thinking_budget:
        max_output_tokens = thinking_budget * 2
        print(f"Adjusted max_output_tokens to {max_output_tokens}")

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=max_output_tokens,
        thinking={
            "type": "enabled",
            "budget_tokens": thinking_budget
        },
        messages=[{
            "role": "user",
            "content": prompt
        }]
    )

    result = {
        "response": None,
        "thinking": None,
        "thinking_tokens_used": 0,
        "response_tokens_used": 0
    }

    for block in response.content:
        if block.type == "thinking":
            result["thinking_tokens_used"] = len(block.thinking.split())
            if include_thinking_in_response:
                result["thinking"] = block.thinking
        elif block.type == "text":
            result["response"] = block.text
            result["response_tokens_used"] = response.usage.output_tokens

    return result

# Example: financial scenario analysis
analysis = call_with_extended_thinking(
    prompt="""Analyse the following acquisition scenario:

Company A (acquirer): £500M revenue, 15% EBITDA margin, 2.5x net debt/EBITDA
Company B (target): £120M revenue, 8% EBITDA margin, high growth (25% YoY), £0 debt

Proposed price: £300M (2.5x revenue)

Assess:
1. The strategic rationale and risks
2. Whether the valuation is defensible
3. Key integration challenges
4. Your recommendation with reasoning""",
    thinking_budget=12000,
    include_thinking_in_response=False
)

print(analysis["response"])
      

Calibrating the Thinking Budget

The right thinking budget depends on problem complexity. Too low, and Claude cuts its reasoning short before reaching the best answer. Too high, and you pay for thinking tokens that produce no quality improvement. The table below provides starting points based on problem type.

Problem Type	Recommended Budget	Max Tokens	Typical Latency
Simple multi-step reasoning	5,000	10,000	8–15 seconds
Moderate complexity analysis	8,000–12,000	20,000	15–35 seconds
Complex trade-off evaluation	16,000–24,000	40,000	35–90 seconds
Deep mathematical/logical problems	32,000–64,000	80,000	90–240 seconds
Maximum depth reasoning	100,000	200,000	240+ seconds

To calibrate empirically: run the same problem with budgets of 5,000, 10,000, and 20,000 tokens. Compare output quality and check thinking_tokens_used in the response. If Claude is consistently using near the full budget, increase it. If it is using less than 50% of the budget, the problem does not need more thinking — decrease the budget to reduce cost.

Streaming Extended Thinking Responses

Extended Thinking responses take longer — sometimes 30–90 seconds for complex problems. Without streaming, users face a blank screen for that duration. Streaming delivers thinking and response content incrementally, allowing you to show a progress indicator or stream the final response as it is generated.

        Python · Streaming Extended Thinking with Progress Indicator
        import time

def stream_extended_thinking(
    prompt: str,
    thinking_budget: int = 10000,
    show_thinking_progress: bool = True
) -> str:
    """Stream a Claude Extended Thinking response with progress feedback."""

    full_response = ""
    thinking_started = False
    response_started = False

    with client.messages.stream(
        model="claude-opus-4-6",
        max_tokens=thinking_budget * 2,
        thinking={
            "type": "enabled",
            "budget_tokens": thinking_budget
        },
        messages=[{"role": "user", "content": prompt}]
    ) as stream:

        for event in stream:
            # Handle different event types
            if hasattr(event, 'type'):

                if event.type == 'content_block_start':
                    if hasattr(event, 'content_block'):
                        if event.content_block.type == 'thinking':
                            if show_thinking_progress:
                                print("⟳ Claude is thinking...", end='', flush=True)
                            thinking_started = True
                        elif event.content_block.type == 'text':
                            if thinking_started and show_thinking_progress:
                                print(" ✓")  # Close the thinking indicator
                            print("\n📝 Response:\n")
                            response_started = True

                elif event.type == 'content_block_delta':
                    if hasattr(event, 'delta'):
                        if event.delta.type == 'thinking_delta':
                            if show_thinking_progress:
                                print(".", end='', flush=True)
                        elif event.delta.type == 'text_delta':
                            print(event.delta.text, end='', flush=True)
                            full_response += event.delta.text

    print()  # Final newline
    return full_response

# Usage
result = stream_extended_thinking(
    prompt="Design the optimal database schema for a multi-tenant SaaS platform "
           "supporting 10,000 enterprise customers, each with 100-10,000 users, "
           "requiring row-level security, audit trails, and sub-100ms query times. "
           "Consider PostgreSQL, partitioning strategies, and indexing.",
    thinking_budget=15000,
    show_thinking_progress=True
)
      

Enterprise Use Cases: Extended Thinking in Production

Financial Risk Analysis

One of the highest-ROI applications of Claude Extended Thinking is financial risk analysis — scenarios with many interdependent variables where a shallow answer is worse than no answer. A private equity firm uses it to generate initial investment memos: given a data room of 40–60 documents, Claude thinks through the opportunity across seven risk dimensions before producing a structured memo. Extended Thinking ensures the analysis considers second-order effects and contradictory signals that a simple extraction pass would miss.

See our Claude for private equity guide for the full architecture pattern including how to chunk data rooms into Claude's context window and structure the memo output.

Architecture and Code Design

Code architecture decisions benefit from Extended Thinking when the design space is large and the trade-offs are interdependent. Asking "design a microservices architecture for this system" with Extended Thinking produces architectures that Claude has actively stress-tested: it considers failure modes, data consistency challenges, and scaling bottlenecks during the thinking phase rather than in the final response.

Teams using Claude Code for enterprise development can invoke Extended Thinking explicitly for architecture-level decisions by passing the thinking parameter in their CLAUDE.md configuration for specific task types.

Legal and Regulatory Analysis

Legal analysis requiring the synthesis of multiple regulations, precedents, and fact patterns is a natural fit. A compliance team uses Extended Thinking to analyse whether a proposed product feature creates regulatory exposure across five jurisdictions simultaneously — a task that would otherwise require five separate legal review requests. The thinking chain provides the audit trail showing which regulations were considered and why each conclusion was reached.

Accessing and Using the Thinking Content

Claude's thinking content — the internal reasoning chain — is available in the API response as a separate content block. Most applications do not need to display it to end users. But there are specific patterns where surfacing the thinking chain creates genuine value.

Audit and compliance applications can store the thinking chain as a record of how a conclusion was reached — useful when decisions need to be explainable to regulators or senior stakeholders. Agent systems can use the thinking chain to debug why an agent made a particular decision. Research applications can surface the thinking to domain experts who want to validate Claude's analytical approach, not just its conclusions.

        Python · Accessing and Structuring Thinking Content
        def analyse_with_audit_trail(
    problem: str,
    thinking_budget: int = 16000
) -> dict:
    """
    Run Extended Thinking analysis and capture the reasoning chain
    for audit and compliance purposes.
    """

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=thinking_budget * 2,
        thinking={
            "type": "enabled",
            "budget_tokens": thinking_budget
        },
        messages=[{"role": "user", "content": problem}]
    )

    thinking_content = ""
    final_answer = ""

    for block in response.content:
        if block.type == "thinking":
            thinking_content = block.thinking
        elif block.type == "text":
            final_answer = block.text

    # Structure for storage and audit
    audit_record = {
        "timestamp": time.time(),
        "model": "claude-opus-4-6",
        "problem_hash": hash(problem),
        "thinking_tokens": response.usage.output_tokens,
        "thinking_summary": _extract_reasoning_steps(thinking_content),
        "final_answer": final_answer,
        "thinking_full": thinking_content  # Store full chain for audit
    }

    return audit_record

def _extract_reasoning_steps(thinking_text: str) -> list[str]:
    """Extract key reasoning steps from thinking content for summary."""
    # Simple heuristic: sentences starting with "First", "Then", "Therefore",
    # "However", "Because", "This means", etc.
    import re
    step_patterns = [
        r"(?:First|Second|Third|Finally|Therefore|However|Because|This means)[^.]+\.",
        r"The key (?:issue|question|consideration|problem)[^.]+\.",
        r"(?:I need to consider|I should|It's important)[^.]+\."
    ]

    steps = []
    for pattern in step_patterns:
        matches = re.findall(pattern, thinking_text, re.IGNORECASE)
        steps.extend(matches[:3])  # Cap at 3 per pattern type

    return steps[:10]  # Return top 10 reasoning steps
      

Cost and Latency Optimisation

Extended Thinking tokens are charged at the same rate as output tokens, but thinking tokens can easily dwarf response tokens in volume. A task with a 10,000-token thinking budget and a 1,000-token response costs as much as an 11,000-token standard response. For high-value, low-frequency use cases this is fine. For high-volume pipelines, cost management matters.

Cost Management Patterns

Tier your problems: Route easy-to-classify problems to Sonnet (no Extended Thinking), reserve Opus + Extended Thinking for genuinely complex cases
Set minimum budgets: Do not set a 100,000-token budget for a problem that needs 8,000 tokens — unused budget does not cost money, but over-budgeting can slow responses
Async processing: Move Extended Thinking calls off the synchronous request path for non-interactive use cases — process overnight, deliver results in the morning
Cache system prompts: If you have a large system prompt (legal context, financial data), cache it with prompt caching — thinking tokens are not cached, but input tokens are
Monitor thinking_tokens_used: Track actual token usage per problem type and adjust budgets accordingly — most teams over-budget by 30–50% initially

Integration Patterns for Enterprise Systems

Extended Thinking works well as the reasoning core of larger agentic systems. The pattern is: use standard Claude calls for fast, routine steps; invoke Extended Thinking only at decision points where deep analysis changes the downstream path.

In a due diligence agent, for example, document extraction and summarisation use standard Sonnet calls. When the agent reaches the "synthesise findings and produce investment recommendation" step, it switches to Opus with Extended Thinking. This hybrid approach delivers 80% of the cost savings of avoiding Extended Thinking everywhere while preserving the quality improvement at the steps that matter most.

For the full architecture of multi-step agents using Extended Thinking as a reasoning gate, see our enterprise AI agent architecture guide. For teams integrating Extended Thinking into Claude Code workflows, the sub-agents guide covers how to spawn an Extended Thinking sub-agent for complex architectural decisions while keeping the main agent loop fast.

If you are building production systems that use Extended Thinking and want help with architecture, budget calibration, and quality evaluation frameworks, our Claude API integration service includes Extended Thinking optimisation as a standard deliverable. Book a call with our Claude Certified Architects to discuss your specific use case.

Deploying Extended Thinking in Production?

Our Claude Certified Architects help enterprises identify the right use cases, calibrate thinking budgets, and integrate Extended Thinking into production AI workflows.

Book a Free Strategy Call →

🧠

ClaudeImplementation Team

Claude Certified Architects specialising in enterprise AI deployment. We have shipped Claude integrations — including Extended Thinking systems — across financial services, legal, healthcare, and technology companies.

How to Use Claude Extended Thinking for Complex Problem Solving

Key Takeaways

What Claude Extended Thinking Actually Does

When to Use Extended Thinking (and When Not To)

✓ Good Fit for Extended Thinking

✗ Poor Fit for Extended Thinking

API Configuration: The Extended Thinking Parameters

Calibrating the Thinking Budget

Streaming Extended Thinking Responses

Enterprise Use Cases: Extended Thinking in Production

Financial Risk Analysis

Architecture and Code Design

Legal and Regulatory Analysis

Accessing and Using the Thinking Content

Cost and Latency Optimisation

Cost Management Patterns

Integration Patterns for Enterprise Systems

Deploying Extended Thinking in Production?

ClaudeImplementation Team

Related Articles

How to Use Claude Extended Thinking for Complex Problem Solving

Key Takeaways

What Claude Extended Thinking Actually Does

When to Use Extended Thinking (and When Not To)

✓ Good Fit for Extended Thinking

✗ Poor Fit for Extended Thinking

API Configuration: The Extended Thinking Parameters

Calibrating the Thinking Budget

Streaming Extended Thinking Responses

Enterprise Use Cases: Extended Thinking in Production

Financial Risk Analysis

Architecture and Code Design

Legal and Regulatory Analysis

Accessing and Using the Thinking Content

Cost and Latency Optimisation

Cost Management Patterns

Integration Patterns for Enterprise Systems

Deploying Extended Thinking in Production?

ClaudeImplementation Team

Related Articles

Claude Extended Thinking: Deep Reasoning for Complex Tasks

Enterprise AI Agent Architecture with Claude: Design Patterns & Security

Claude API Pricing Explained: Models, Tokens & Cost Optimisation

Claude Implementation Tactics — Weekly

Related Articles

Claude Extended Thinking: Deep

How to Use Claude for Meeting

How to Use Claude for HR Policy Q&A

How to Use Claude for Automated Data

How to Connect Claude