Key Takeaways

  • Extended Thinking is a reasoning mode available on Claude Opus 4.6 โ€” Claude "thinks out loud" before producing its final response
  • It materially improves performance on multi-step logic, complex analysis, and tasks requiring the consideration of many interdependent variables
  • The thinking_budget_tokens parameter controls how much thinking Claude does โ€” start at 5,000โ€“10,000 tokens for most use cases
  • Extended Thinking adds cost and latency โ€” use it selectively on high-value tasks, not on every API call
  • You can access the thinking content in the response โ€” useful for building systems that show reasoning chains to users or auditors

What Claude Extended Thinking Actually Does

Claude Extended Thinking is not a prompt technique. It is a first-class API feature that activates a separate reasoning mode in Claude Opus 4.6. When you enable it, Claude works through the problem in an internal scratchpad โ€” weighing alternatives, checking its own logic, reconsidering assumptions โ€” before producing a final response. You can optionally receive this thinking content in the API response alongside the final answer.

The practical effect is that Claude Extended Thinking shifts Claude's performance profile on difficult reasoning tasks. Standard Claude Opus excels at a very wide range of tasks. Extended Thinking further improves performance specifically on tasks that benefit from systematic deliberation: multi-step mathematical reasoning, complex logical analysis, tasks that require considering many interdependent factors simultaneously, and problems where the "obvious" first answer is often wrong.

It is not a universal improvement. On simple tasks โ€” summarisation, classification, extraction โ€” Extended Thinking adds cost and latency without improving output quality. The skill of using it effectively is knowing which problems belong in which category. Our Claude Extended Thinking product guide covers the full capability in depth. This tutorial focuses on the practical implementation and use case selection.

When to Use Extended Thinking (and When Not To)

The decision to use Extended Thinking should be made at the system design level, not per-call. Design your application so Extended Thinking is always on for tasks that benefit from it, and always off for tasks that do not. Trying to decide at runtime adds complexity without benefit.

โœ“ Good Fit for Extended Thinking

  • Complex financial modelling with many variables and constraints
  • Legal analysis requiring consideration of multiple precedents and implications
  • Architecture and design decisions with significant trade-offs
  • Mathematical problem-solving and proof verification
  • Debugging complex systems where the root cause is non-obvious
  • Risk analysis with interdependent scenarios
  • Strategic planning documents requiring systematic evaluation
  • Competitive analysis with many factors to weigh

โœ— Poor Fit for Extended Thinking

  • Document summarisation (the answer is in the text)
  • Simple classification tasks (high cost, no quality gain)
  • Creative writing and copywriting
  • Data extraction and formatting
  • FAQ responses from a knowledge base
  • Translation tasks
  • High-volume, low-stakes batch processing
  • Real-time chat interfaces requiring sub-2-second responses

API Configuration: The Extended Thinking Parameters

Extended Thinking is activated by adding a thinking block to your API request. The critical parameter is budget_tokens, which controls the maximum number of tokens Claude can use for its internal reasoning. This is separate from your max_tokens parameter, which controls the final response length.

Importantly, budget_tokens must be less than max_tokens. Claude will not always use the full budget โ€” it uses as much thinking as the problem requires. Setting a generous budget does not guarantee it will be used; it just sets the ceiling. Start conservative and increase based on observed quality.

Python ยท Basic Extended Thinking Configuration import anthropic client = anthropic.Anthropic() def call_with_extended_thinking( prompt: str, thinking_budget: int = 8000, max_output_tokens: int = 16000, include_thinking_in_response: bool = False ) -> dict: """ Call Claude with Extended Thinking enabled. Args: prompt: The user prompt thinking_budget: Max tokens for Claude's internal reasoning (5,000-100,000) max_output_tokens: Max tokens for final response (must be > thinking_budget) include_thinking_in_response: Whether to return the thinking content Returns: dict with 'response' (final answer) and optionally 'thinking' (reasoning chain) """ # Validate: max_output_tokens must exceed thinking_budget if max_output_tokens <= thinking_budget: max_output_tokens = thinking_budget * 2 print(f"Adjusted max_output_tokens to {max_output_tokens}") response = client.messages.create( model="claude-opus-4-6", max_tokens=max_output_tokens, thinking={ "type": "enabled", "budget_tokens": thinking_budget }, messages=[{ "role": "user", "content": prompt }] ) result = { "response": None, "thinking": None, "thinking_tokens_used": 0, "response_tokens_used": 0 } for block in response.content: if block.type == "thinking": result["thinking_tokens_used"] = len(block.thinking.split()) if include_thinking_in_response: result["thinking"] = block.thinking elif block.type == "text": result["response"] = block.text result["response_tokens_used"] = response.usage.output_tokens return result # Example: financial scenario analysis analysis = call_with_extended_thinking( prompt="""Analyse the following acquisition scenario: Company A (acquirer): ยฃ500M revenue, 15% EBITDA margin, 2.5x net debt/EBITDA Company B (target): ยฃ120M revenue, 8% EBITDA margin, high growth (25% YoY), ยฃ0 debt Proposed price: ยฃ300M (2.5x revenue) Assess: 1. The strategic rationale and risks 2. Whether the valuation is defensible 3. Key integration challenges 4. Your recommendation with reasoning""", thinking_budget=12000, include_thinking_in_response=False ) print(analysis["response"])

Calibrating the Thinking Budget

The right thinking budget depends on problem complexity. Too low, and Claude cuts its reasoning short before reaching the best answer. Too high, and you pay for thinking tokens that produce no quality improvement. The table below provides starting points based on problem type.

Problem Type Recommended Budget Max Tokens Typical Latency
Simple multi-step reasoning 5,000 10,000 8โ€“15 seconds
Moderate complexity analysis 8,000โ€“12,000 20,000 15โ€“35 seconds
Complex trade-off evaluation 16,000โ€“24,000 40,000 35โ€“90 seconds
Deep mathematical/logical problems 32,000โ€“64,000 80,000 90โ€“240 seconds
Maximum depth reasoning 100,000 200,000 240+ seconds

To calibrate empirically: run the same problem with budgets of 5,000, 10,000, and 20,000 tokens. Compare output quality and check thinking_tokens_used in the response. If Claude is consistently using near the full budget, increase it. If it is using less than 50% of the budget, the problem does not need more thinking โ€” decrease the budget to reduce cost.

Streaming Extended Thinking Responses

Extended Thinking responses take longer โ€” sometimes 30โ€“90 seconds for complex problems. Without streaming, users face a blank screen for that duration. Streaming delivers thinking and response content incrementally, allowing you to show a progress indicator or stream the final response as it is generated.

Python ยท Streaming Extended Thinking with Progress Indicator import time def stream_extended_thinking( prompt: str, thinking_budget: int = 10000, show_thinking_progress: bool = True ) -> str: """Stream a Claude Extended Thinking response with progress feedback.""" full_response = "" thinking_started = False response_started = False with client.messages.stream( model="claude-opus-4-6", max_tokens=thinking_budget * 2, thinking={ "type": "enabled", "budget_tokens": thinking_budget }, messages=[{"role": "user", "content": prompt}] ) as stream: for event in stream: # Handle different event types if hasattr(event, 'type'): if event.type == 'content_block_start': if hasattr(event, 'content_block'): if event.content_block.type == 'thinking': if show_thinking_progress: print("โŸณ Claude is thinking...", end='', flush=True) thinking_started = True elif event.content_block.type == 'text': if thinking_started and show_thinking_progress: print(" โœ“") # Close the thinking indicator print("\n๐Ÿ“ Response:\n") response_started = True elif event.type == 'content_block_delta': if hasattr(event, 'delta'): if event.delta.type == 'thinking_delta': if show_thinking_progress: print(".", end='', flush=True) elif event.delta.type == 'text_delta': print(event.delta.text, end='', flush=True) full_response += event.delta.text print() # Final newline return full_response # Usage result = stream_extended_thinking( prompt="Design the optimal database schema for a multi-tenant SaaS platform " "supporting 10,000 enterprise customers, each with 100-10,000 users, " "requiring row-level security, audit trails, and sub-100ms query times. " "Consider PostgreSQL, partitioning strategies, and indexing.", thinking_budget=15000, show_thinking_progress=True )

Enterprise Use Cases: Extended Thinking in Production

Financial Risk Analysis

One of the highest-ROI applications of Claude Extended Thinking is financial risk analysis โ€” scenarios with many interdependent variables where a shallow answer is worse than no answer. A private equity firm uses it to generate initial investment memos: given a data room of 40โ€“60 documents, Claude thinks through the opportunity across seven risk dimensions before producing a structured memo. Extended Thinking ensures the analysis considers second-order effects and contradictory signals that a simple extraction pass would miss.

See our Claude for private equity guide for the full architecture pattern including how to chunk data rooms into Claude's context window and structure the memo output.

Architecture and Code Design

Code architecture decisions benefit from Extended Thinking when the design space is large and the trade-offs are interdependent. Asking "design a microservices architecture for this system" with Extended Thinking produces architectures that Claude has actively stress-tested: it considers failure modes, data consistency challenges, and scaling bottlenecks during the thinking phase rather than in the final response.

Teams using Claude Code for enterprise development can invoke Extended Thinking explicitly for architecture-level decisions by passing the thinking parameter in their CLAUDE.md configuration for specific task types.

Legal analysis requiring the synthesis of multiple regulations, precedents, and fact patterns is a natural fit. A compliance team uses Extended Thinking to analyse whether a proposed product feature creates regulatory exposure across five jurisdictions simultaneously โ€” a task that would otherwise require five separate legal review requests. The thinking chain provides the audit trail showing which regulations were considered and why each conclusion was reached.

Accessing and Using the Thinking Content

Claude's thinking content โ€” the internal reasoning chain โ€” is available in the API response as a separate content block. Most applications do not need to display it to end users. But there are specific patterns where surfacing the thinking chain creates genuine value.

Audit and compliance applications can store the thinking chain as a record of how a conclusion was reached โ€” useful when decisions need to be explainable to regulators or senior stakeholders. Agent systems can use the thinking chain to debug why an agent made a particular decision. Research applications can surface the thinking to domain experts who want to validate Claude's analytical approach, not just its conclusions.

Python ยท Accessing and Structuring Thinking Content def analyse_with_audit_trail( problem: str, thinking_budget: int = 16000 ) -> dict: """ Run Extended Thinking analysis and capture the reasoning chain for audit and compliance purposes. """ response = client.messages.create( model="claude-opus-4-6", max_tokens=thinking_budget * 2, thinking={ "type": "enabled", "budget_tokens": thinking_budget }, messages=[{"role": "user", "content": problem}] ) thinking_content = "" final_answer = "" for block in response.content: if block.type == "thinking": thinking_content = block.thinking elif block.type == "text": final_answer = block.text # Structure for storage and audit audit_record = { "timestamp": time.time(), "model": "claude-opus-4-6", "problem_hash": hash(problem), "thinking_tokens": response.usage.output_tokens, "thinking_summary": _extract_reasoning_steps(thinking_content), "final_answer": final_answer, "thinking_full": thinking_content # Store full chain for audit } return audit_record def _extract_reasoning_steps(thinking_text: str) -> list[str]: """Extract key reasoning steps from thinking content for summary.""" # Simple heuristic: sentences starting with "First", "Then", "Therefore", # "However", "Because", "This means", etc. import re step_patterns = [ r"(?:First|Second|Third|Finally|Therefore|However|Because|This means)[^.]+\.", r"The key (?:issue|question|consideration|problem)[^.]+\.", r"(?:I need to consider|I should|It's important)[^.]+\." ] steps = [] for pattern in step_patterns: matches = re.findall(pattern, thinking_text, re.IGNORECASE) steps.extend(matches[:3]) # Cap at 3 per pattern type return steps[:10] # Return top 10 reasoning steps

Cost and Latency Optimisation

Extended Thinking tokens are charged at the same rate as output tokens, but thinking tokens can easily dwarf response tokens in volume. A task with a 10,000-token thinking budget and a 1,000-token response costs as much as an 11,000-token standard response. For high-value, low-frequency use cases this is fine. For high-volume pipelines, cost management matters.

Cost Management Patterns

  • Tier your problems: Route easy-to-classify problems to Sonnet (no Extended Thinking), reserve Opus + Extended Thinking for genuinely complex cases
  • Set minimum budgets: Do not set a 100,000-token budget for a problem that needs 8,000 tokens โ€” unused budget does not cost money, but over-budgeting can slow responses
  • Async processing: Move Extended Thinking calls off the synchronous request path for non-interactive use cases โ€” process overnight, deliver results in the morning
  • Cache system prompts: If you have a large system prompt (legal context, financial data), cache it with prompt caching โ€” thinking tokens are not cached, but input tokens are
  • Monitor thinking_tokens_used: Track actual token usage per problem type and adjust budgets accordingly โ€” most teams over-budget by 30โ€“50% initially

Integration Patterns for Enterprise Systems

Extended Thinking works well as the reasoning core of larger agentic systems. The pattern is: use standard Claude calls for fast, routine steps; invoke Extended Thinking only at decision points where deep analysis changes the downstream path.

In a due diligence agent, for example, document extraction and summarisation use standard Sonnet calls. When the agent reaches the "synthesise findings and produce investment recommendation" step, it switches to Opus with Extended Thinking. This hybrid approach delivers 80% of the cost savings of avoiding Extended Thinking everywhere while preserving the quality improvement at the steps that matter most.

For the full architecture of multi-step agents using Extended Thinking as a reasoning gate, see our enterprise AI agent architecture guide. For teams integrating Extended Thinking into Claude Code workflows, the sub-agents guide covers how to spawn an Extended Thinking sub-agent for complex architectural decisions while keeping the main agent loop fast.

If you are building production systems that use Extended Thinking and want help with architecture, budget calibration, and quality evaluation frameworks, our Claude API integration service includes Extended Thinking optimisation as a standard deliverable. Book a call with our Claude Certified Architects to discuss your specific use case.

Deploying Extended Thinking in Production?

Our Claude Certified Architects help enterprises identify the right use cases, calibrate thinking budgets, and integrate Extended Thinking into production AI workflows.

Book a Free Strategy Call โ†’
๐Ÿง 

ClaudeImplementation Team

Claude Certified Architects specialising in enterprise AI deployment. We have shipped Claude integrations โ€” including Extended Thinking systems โ€” across financial services, legal, healthcare, and technology companies.

Related Articles