The Document Processing Problem
Every enterprise drowns in documents. Invoices arrive as PDFs. Contracts as Word files. Compliance forms as images scanned from paper. Someone has to read them, extract data, classify them, and route them to the right process. It's high-volume, low-skill, and expensive.
Claude agents solve this. They can classify a document in seconds, extract data with structured confidence scores, validate against business rules, and flag anything unusual for human review. A single agent can handle thousands of documents daily, with an accuracy rate that exceeds most human processors.
This article assumes you understand Claude's vision API for image and PDF analysis. Document agents combine vision with tool orchestration. See our AI agent development services for production deployments that handle complex workflows.
Document Ingestion & Conversion
The first step is getting documents into a format Claude can process. For PDFs and images, Claude can read them directly via the vision API. For Word documents (.docx), you convert to text or PDF first.
Ready to Deploy Claude in Your Organisation?
Our Claude Certified Architects have guided 50+ enterprise deployments. Book a free 30-minute scoping call to map your path from POC to production.
Book a Free Strategy Call →import base64
import pypdf
from docx import Document
from anthropic import Anthropic
client = Anthropic()
def load_document(file_path: str) -> dict:
"""
Load a document and prepare it for Claude vision API.
Returns a dict with document type, content, and metadata.
"""
if file_path.endswith('.pdf'):
# Extract text and get first page as image
reader = pypdf.PdfReader(file_path)
text = "\n".join([page.extract_text() for page in reader.pages])
page_count = len(reader.pages)
# Convert first page to image for vision
import pdf2image
images = pdf2image.convert_from_path(file_path, first_page=1, last_page=1)
image_bytes = io.BytesIO()
images[0].save(image_bytes, format='PNG')
image_base64 = base64.standard_b64encode(image_bytes.getvalue()).decode()
return {
'type': 'pdf',
'text': text,
'image_base64': image_base64,
'page_count': page_count,
'file_name': file_path.split('/')[-1]
}
elif file_path.endswith('.docx'):
# Extract text from Word document
doc = Document(file_path)
text = "\n".join([para.text for para in doc.paragraphs])
return {
'type': 'docx',
'text': text,
'page_count': 1,
'file_name': file_path.split('/')[-1]
}
elif file_path.endswith(('.jpg', '.jpeg', '.png')):
# Load image directly
with open(file_path, 'rb') as f:
image_base64 = base64.standard_b64encode(f.read()).decode()
return {
'type': 'image',
'image_base64': image_base64,
'page_count': 1,
'file_name': file_path.split('/')[-1]
}
else:
raise ValueError(f"Unsupported file type: {file_path}")
def classify_document(doc: dict) -> dict:
"""
Classify a document type using Claude vision.
Returns document type and confidence score.
"""
prompt = """Classify this document into one of these categories:
- INVOICE: Payment request for goods/services
- CONTRACT: Legal agreement between parties
- PO: Purchase order
- COMPLIANCE: Regulatory/policy document
- EXPENSE_REPORT: Employee expense submission
- OTHER: Doesn't fit above categories
Respond with ONLY JSON:
{
"document_type": "CATEGORY",
"confidence": 0.95,
"reasoning": "brief reason"
}"""
image_block = {
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": doc['image_base64']
}
} if 'image_base64' in doc else None
messages = []
if image_block:
messages.append({
"role": "user",
"content": [image_block, {"type": "text", "text": prompt}]
})
else:
messages.append({
"role": "user",
"content": f"{prompt}\n\nDocument text:\n{doc.get('text', 'No text available')[:2000]}"
})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
messages=messages
)
return json.loads(response.content[0].text)Data Extraction from Documents
Once classified, extract structured data using the agent. For an invoice, extract: vendor name, invoice number, date, amount, line items. For a contract, extract: parties, effective date, term, key obligations.
The key is confidence scoring. Claude returns not just the extracted value, but a confidence score (0-1). Low-confidence extractions go to a human review queue. High-confidence ones are processed automatically.
| Document Type | Key Fields | Typical Accuracy | Human Review % | Processing Time |
|---|---|---|---|---|
| Invoice | Vendor, amount, date, PO ref | 98% | 2-5% | 2-3 sec |
| Receipt | Merchant, total, date, items | 96% | 4-8% | 1-2 sec |
| Contract | Parties, dates, obligations | 92% | 8-15% | 5-10 sec |
| Compliance form | Attestations, signatures, dates | 94% | 6-10% | 3-5 sec |
| Handwritten form | Signature, date, amounts | 85% | 15-25% | 2-3 sec |
Building a Document Processing Agent
Combine classification and extraction into a single agent with multiple tools. The agent routes documents based on type and extracts relevant data.
from anthropic.lib.agents import Agent, tool
from typing import Any
@tool
def extract_invoice_data(invoice_text: str, invoice_image_base64: str) -> dict:
"""Extract structured data from an invoice."""
prompt = """Extract invoice data. Return ONLY valid JSON:
{
"vendor_name": "...",
"invoice_number": "...",
"invoice_date": "YYYY-MM-DD",
"due_date": "YYYY-MM-DD",
"total_amount": 1000.00,
"currency": "USD",
"line_items": [
{"description": "...", "quantity": 1, "unit_price": 100.00, "amount": 100.00}
],
"extraction_confidence": {
"vendor_name": 0.95,
"total_amount": 0.98,
"overall": 0.96
}
}"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/png", "data": invoice_image_base64}
},
{"type": "text", "text": prompt}
]
}]
)
return json.loads(response.content[0].text)
@tool
def extract_contract_data(contract_text: str) -> dict:
"""Extract key terms from a contract."""
prompt = """Extract contract data. Return ONLY valid JSON:
{
"contract_type": "SERVICE_AGREEMENT|NDA|PURCHASE|OTHER",
"parties": ["Party A", "Party B"],
"effective_date": "YYYY-MM-DD",
"term_months": 12,
"auto_renewal": true,
"termination_notice_days": 30,
"key_obligations": ["obligation 1", "obligation 2"],
"payment_terms": "NET_30",
"liability_cap": 100000,
"extraction_confidence": {
"parties": 0.98,
"effective_date": 0.85,
"overall": 0.90
}
}"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"{prompt}\n\nContract text:\n{contract_text[:5000]}"
}]
)
return json.loads(response.content[0].text)
@tool
def validate_extraction(
extracted_data: dict,
document_type: str,
business_rules: dict
) -> dict:
"""Validate extracted data against business rules."""
validation_results = {
"passed": True,
"errors": [],
"warnings": [],
"requires_review": False
}
# Check confidence scores
overall_conf = extracted_data.get('extraction_confidence', {}).get('overall', 0)
if overall_conf < 0.85:
validation_results['requires_review'] = True
validation_results['warnings'].append(f"Low confidence: {overall_conf:.0%}")
# Document-specific validation
if document_type == 'invoice':
amount = extracted_data.get('total_amount', 0)
if amount > business_rules.get('max_invoice_amount', 100000):
validation_results['errors'].append(f"Invoice exceeds max amount: ${amount}")
validation_results['passed'] = False
if not extracted_data.get('vendor_name'):
validation_results['errors'].append("Missing vendor name")
validation_results['passed'] = False
elif document_type == 'contract':
term = extracted_data.get('term_months', 0)
if term > business_rules.get('max_contract_term_months', 60):
validation_results['warnings'].append(f"Long term: {term} months")
validation_results['requires_review'] = True
return validation_results
# Build the document processing agent
doc_processor_agent = Agent(
client=client,
model="claude-sonnet-4-6",
system="""You are a document processing agent. For each document:
1. Classify the document type using vision
2. Extract relevant structured data
3. Validate against business rules
4. Flag low-confidence fields for human review
Always be conservative — if you're unsure, flag for review.""",
tools=[extract_invoice_data, extract_contract_data, validate_extraction],
max_tokens=2048,
max_iterations=8
)
# Process a document
def process_document(file_path: str) -> dict:
doc = load_document(file_path)
classification = classify_document(doc)
result = doc_processor_agent.run(
f"Process this {classification['document_type']} document: {doc['file_name']}"
)
return {
"file": doc['file_name'],
"classification": classification,
"extraction_result": result.output,
"token_usage": result.usage.total_tokens
}Multi-Page Documents & Large Files
Real documents often span dozens of pages. Claude can process multi-page PDFs, but you should be strategic about which pages you send. For a 50-page contract, send the first and last page (signatures), plus a summary of key pages, rather than the entire text.
Implement page-level classification: scan all pages quickly, identify which pages contain relevant data, and extract only from those pages. This reduces token usage and extraction time.
Confidence Scoring & Human Review Queues
Route low-confidence extractions to a human review queue. Create a UI where reviewers can quickly validate or correct the agent's work, then feed corrections back into the model for retraining.
Every time a human corrects an extraction, store that example. Periodically retrain your extraction prompts with these corrections. Over time, agent accuracy improves and human review rates drop.
Real-World Use Cases
Invoice Processing at Scale
A B2B company receives 5,000 vendor invoices monthly. An agent processes each in 2 seconds, extracting vendor name, amount, PO reference. 95% require no human review. The remaining 5% (250 invoices) go to accounts payable for 30-second verification. Processing time drops from 2 weeks to 2 hours.
Contract Management & Compliance
Legal team uploads contracts to a SharePoint folder. An agent immediately classifies (service agreement, NDA, purchase order), extracts key dates and obligations, and flags renewal deadlines. Critical clauses are extracted and compared against company templates.
Expense Report Processing
Employees submit photos of receipts. An agent extracts merchant, amount, date, and category. Expenses are matched against team budgets. Anomalies (5x normal expense) are flagged for manager review. Approved expenses flow directly to accounting.
Integration with SharePoint & Google Drive
Use MCP (Model Context Protocol) to connect agents to document repositories. Set up a trigger: when a file lands in a SharePoint folder, invoke the document processing agent. Results are written back to a processed folder with extracted data in metadata.
This creates a fully automated pipeline: document arrives → agent processes → results stored → humans review edge cases → feedback improves agent.
Scaling Document Agents
For enterprise deployments, implement queueing and batch processing. If 1,000 documents arrive simultaneously, queue them and process in batches of 10, balancing API rate limits with speed.
Use API integration best practices to handle retries, timeouts, and failures. Document processing is IO-heavy — design for resilience.
Document agents live or die by vision quality. See our Claude vision API guide for optimization techniques: image preprocessing, resolution, color depth, and how to structure prompts for maximum accuracy.