The Document Processing Problem

Every enterprise drowns in documents. Invoices arrive as PDFs. Contracts as Word files. Compliance forms as images scanned from paper. Someone has to read them, extract data, classify them, and route them to the right process. It's high-volume, low-skill, and expensive.

Claude agents solve this. They can classify a document in seconds, extract data with structured confidence scores, validate against business rules, and flag anything unusual for human review. A single agent can handle thousands of documents daily, with an accuracy rate that exceeds most human processors.

Document Processing Stack

This article assumes you understand Claude's vision API for image and PDF analysis. Document agents combine vision with tool orchestration. See our AI agent development services for production deployments that handle complex workflows.

Document Ingestion & Conversion

The first step is getting documents into a format Claude can process. For PDFs and images, Claude can read them directly via the vision API. For Word documents (.docx), you convert to text or PDF first.

Ready to Deploy Claude in Your Organisation?

Our Claude Certified Architects have guided 50+ enterprise deployments. Book a free 30-minute scoping call to map your path from POC to production.

Book a Free Strategy Call →
python
import base64
import pypdf
from docx import Document
from anthropic import Anthropic

client = Anthropic()

def load_document(file_path: str) -> dict:
    """
    Load a document and prepare it for Claude vision API.
    Returns a dict with document type, content, and metadata.
    """
    if file_path.endswith('.pdf'):
        # Extract text and get first page as image
        reader = pypdf.PdfReader(file_path)
        text = "\n".join([page.extract_text() for page in reader.pages])
        page_count = len(reader.pages)

        # Convert first page to image for vision
        import pdf2image
        images = pdf2image.convert_from_path(file_path, first_page=1, last_page=1)
        image_bytes = io.BytesIO()
        images[0].save(image_bytes, format='PNG')
        image_base64 = base64.standard_b64encode(image_bytes.getvalue()).decode()

        return {
            'type': 'pdf',
            'text': text,
            'image_base64': image_base64,
            'page_count': page_count,
            'file_name': file_path.split('/')[-1]
        }

    elif file_path.endswith('.docx'):
        # Extract text from Word document
        doc = Document(file_path)
        text = "\n".join([para.text for para in doc.paragraphs])

        return {
            'type': 'docx',
            'text': text,
            'page_count': 1,
            'file_name': file_path.split('/')[-1]
        }

    elif file_path.endswith(('.jpg', '.jpeg', '.png')):
        # Load image directly
        with open(file_path, 'rb') as f:
            image_base64 = base64.standard_b64encode(f.read()).decode()

        return {
            'type': 'image',
            'image_base64': image_base64,
            'page_count': 1,
            'file_name': file_path.split('/')[-1]
        }

    else:
        raise ValueError(f"Unsupported file type: {file_path}")


def classify_document(doc: dict) -> dict:
    """
    Classify a document type using Claude vision.
    Returns document type and confidence score.
    """
    prompt = """Classify this document into one of these categories:
- INVOICE: Payment request for goods/services
- CONTRACT: Legal agreement between parties
- PO: Purchase order
- COMPLIANCE: Regulatory/policy document
- EXPENSE_REPORT: Employee expense submission
- OTHER: Doesn't fit above categories

Respond with ONLY JSON:
{
  "document_type": "CATEGORY",
  "confidence": 0.95,
  "reasoning": "brief reason"
}"""

    image_block = {
        "type": "image",
        "source": {
            "type": "base64",
            "media_type": "image/png",
            "data": doc['image_base64']
        }
    } if 'image_base64' in doc else None

    messages = []
    if image_block:
        messages.append({
            "role": "user",
            "content": [image_block, {"type": "text", "text": prompt}]
        })
    else:
        messages.append({
            "role": "user",
            "content": f"{prompt}\n\nDocument text:\n{doc.get('text', 'No text available')[:2000]}"
        })

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=messages
    )

    return json.loads(response.content[0].text)

Data Extraction from Documents

Once classified, extract structured data using the agent. For an invoice, extract: vendor name, invoice number, date, amount, line items. For a contract, extract: parties, effective date, term, key obligations.

The key is confidence scoring. Claude returns not just the extracted value, but a confidence score (0-1). Low-confidence extractions go to a human review queue. High-confidence ones are processed automatically.

Document Type Key Fields Typical Accuracy Human Review % Processing Time
Invoice Vendor, amount, date, PO ref 98% 2-5% 2-3 sec
Receipt Merchant, total, date, items 96% 4-8% 1-2 sec
Contract Parties, dates, obligations 92% 8-15% 5-10 sec
Compliance form Attestations, signatures, dates 94% 6-10% 3-5 sec
Handwritten form Signature, date, amounts 85% 15-25% 2-3 sec

Building a Document Processing Agent

Combine classification and extraction into a single agent with multiple tools. The agent routes documents based on type and extracts relevant data.

python
from anthropic.lib.agents import Agent, tool
from typing import Any

@tool
def extract_invoice_data(invoice_text: str, invoice_image_base64: str) -> dict:
    """Extract structured data from an invoice."""
    prompt = """Extract invoice data. Return ONLY valid JSON:
{
  "vendor_name": "...",
  "invoice_number": "...",
  "invoice_date": "YYYY-MM-DD",
  "due_date": "YYYY-MM-DD",
  "total_amount": 1000.00,
  "currency": "USD",
  "line_items": [
    {"description": "...", "quantity": 1, "unit_price": 100.00, "amount": 100.00}
  ],
  "extraction_confidence": {
    "vendor_name": 0.95,
    "total_amount": 0.98,
    "overall": 0.96
  }
}"""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {"type": "base64", "media_type": "image/png", "data": invoice_image_base64}
                },
                {"type": "text", "text": prompt}
            ]
        }]
    )

    return json.loads(response.content[0].text)

@tool
def extract_contract_data(contract_text: str) -> dict:
    """Extract key terms from a contract."""
    prompt = """Extract contract data. Return ONLY valid JSON:
{
  "contract_type": "SERVICE_AGREEMENT|NDA|PURCHASE|OTHER",
  "parties": ["Party A", "Party B"],
  "effective_date": "YYYY-MM-DD",
  "term_months": 12,
  "auto_renewal": true,
  "termination_notice_days": 30,
  "key_obligations": ["obligation 1", "obligation 2"],
  "payment_terms": "NET_30",
  "liability_cap": 100000,
  "extraction_confidence": {
    "parties": 0.98,
    "effective_date": 0.85,
    "overall": 0.90
  }
}"""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"{prompt}\n\nContract text:\n{contract_text[:5000]}"
        }]
    )

    return json.loads(response.content[0].text)

@tool
def validate_extraction(
    extracted_data: dict,
    document_type: str,
    business_rules: dict
) -> dict:
    """Validate extracted data against business rules."""
    validation_results = {
        "passed": True,
        "errors": [],
        "warnings": [],
        "requires_review": False
    }

    # Check confidence scores
    overall_conf = extracted_data.get('extraction_confidence', {}).get('overall', 0)
    if overall_conf < 0.85:
        validation_results['requires_review'] = True
        validation_results['warnings'].append(f"Low confidence: {overall_conf:.0%}")

    # Document-specific validation
    if document_type == 'invoice':
        amount = extracted_data.get('total_amount', 0)
        if amount > business_rules.get('max_invoice_amount', 100000):
            validation_results['errors'].append(f"Invoice exceeds max amount: ${amount}")
            validation_results['passed'] = False

        if not extracted_data.get('vendor_name'):
            validation_results['errors'].append("Missing vendor name")
            validation_results['passed'] = False

    elif document_type == 'contract':
        term = extracted_data.get('term_months', 0)
        if term > business_rules.get('max_contract_term_months', 60):
            validation_results['warnings'].append(f"Long term: {term} months")
            validation_results['requires_review'] = True

    return validation_results

# Build the document processing agent
doc_processor_agent = Agent(
    client=client,
    model="claude-sonnet-4-6",
    system="""You are a document processing agent. For each document:
1. Classify the document type using vision
2. Extract relevant structured data
3. Validate against business rules
4. Flag low-confidence fields for human review

Always be conservative — if you're unsure, flag for review.""",
    tools=[extract_invoice_data, extract_contract_data, validate_extraction],
    max_tokens=2048,
    max_iterations=8
)

# Process a document
def process_document(file_path: str) -> dict:
    doc = load_document(file_path)
    classification = classify_document(doc)

    result = doc_processor_agent.run(
        f"Process this {classification['document_type']} document: {doc['file_name']}"
    )

    return {
        "file": doc['file_name'],
        "classification": classification,
        "extraction_result": result.output,
        "token_usage": result.usage.total_tokens
    }

Multi-Page Documents & Large Files

Real documents often span dozens of pages. Claude can process multi-page PDFs, but you should be strategic about which pages you send. For a 50-page contract, send the first and last page (signatures), plus a summary of key pages, rather than the entire text.

Implement page-level classification: scan all pages quickly, identify which pages contain relevant data, and extract only from those pages. This reduces token usage and extraction time.

Confidence Scoring & Human Review Queues

Route low-confidence extractions to a human review queue. Create a UI where reviewers can quickly validate or correct the agent's work, then feed corrections back into the model for retraining.

The Active Learning Loop

Every time a human corrects an extraction, store that example. Periodically retrain your extraction prompts with these corrections. Over time, agent accuracy improves and human review rates drop.

Real-World Use Cases

Invoice Processing at Scale

A B2B company receives 5,000 vendor invoices monthly. An agent processes each in 2 seconds, extracting vendor name, amount, PO reference. 95% require no human review. The remaining 5% (250 invoices) go to accounts payable for 30-second verification. Processing time drops from 2 weeks to 2 hours.

Contract Management & Compliance

Legal team uploads contracts to a SharePoint folder. An agent immediately classifies (service agreement, NDA, purchase order), extracts key dates and obligations, and flags renewal deadlines. Critical clauses are extracted and compared against company templates.

Expense Report Processing

Employees submit photos of receipts. An agent extracts merchant, amount, date, and category. Expenses are matched against team budgets. Anomalies (5x normal expense) are flagged for manager review. Approved expenses flow directly to accounting.

Integration with SharePoint & Google Drive

Use MCP (Model Context Protocol) to connect agents to document repositories. Set up a trigger: when a file lands in a SharePoint folder, invoke the document processing agent. Results are written back to a processed folder with extracted data in metadata.

This creates a fully automated pipeline: document arrives → agent processes → results stored → humans review edge cases → feedback improves agent.

Scaling Document Agents

For enterprise deployments, implement queueing and batch processing. If 1,000 documents arrive simultaneously, queue them and process in batches of 10, balancing API rate limits with speed.

Use API integration best practices to handle retries, timeouts, and failures. Document processing is IO-heavy — design for resilience.

Vision API at the Core

Document agents live or die by vision quality. See our Claude vision API guide for optimization techniques: image preprocessing, resolution, color depth, and how to structure prompts for maximum accuracy.

Get the Claude Enterprise Weekly

Platform updates, deployment guides, and procurement intelligence — direct to your inbox every Tuesday.