The post-mortem problem isn't effort — every SRE team knows post-mortems matter. The problem is timing. A major incident ends at 3am. By the time anyone has capacity to write the post-mortem, it's a week later, the Slack threads have scrolled into history, and the precise decision points that defined the incident response are reconstructed from memory. Claude Cowork for incident post-mortems solves the timing problem by making the documentation fast enough to complete the same day as incident resolution.
This is a sub-article in the Claude Cowork for DevOps and platform engineers series. This article goes deep on the specific workflow, prompts, and integration setup for incident post-mortems. For the full picture — runbooks, infrastructure docs, SRE automations — see the main guide. The companion articles cover runbook generation, infrastructure documentation, and 8 Cowork automations for DevOps teams.
Why Post-Mortems Fail (and What Cowork Changes)
Post-mortems fail for two reasons: they take too long to produce, and they're written by someone who wasn't the primary incident responder. The person who managed the incident is exhausted after resolution. The documentation task lands on them anyway, 10 days later, when the cognitive burden of reconstructing an accurate timeline is enormous.
The artefact-based approach is what makes Cowork post-mortems accurate. You're not asking Cowork to remember what happened — you're feeding it the actual timeline data from PagerDuty, the actual conversation from the Slack incident channel, and the actual monitoring screenshots. It reads the evidence and structures it. This is categorically different from asking AI to generate a post-mortem from a brief description.
The 4-Step Cowork Post-Mortem Workflow
This is the Cowork Post-Mortem Workflow — a named, repeatable process your team can run the same way every time.
Collect the artefacts (5 minutes)
Immediately after incident resolution, assign one person to collect: the PagerDuty alert timeline export, the Slack incident channel HTML export (or a copy-paste of the key messages), any monitoring screenshots or graph exports taken during the incident, and the relevant service runbook. If your CI/CD system logged a deployment near the incident start time, include that too. Drop all of these into a Cowork canvas.
Run the generation prompt (15 minutes including review)
Use the post-mortem generation prompt below. Cowork reads all the artefacts and produces a structured first draft. The draft includes a timeline with UTC timestamps derived from the actual alert data, a root cause analysis following the 5-Whys framework, and action items formatted with owner, priority, and proposed due date.
Fill in the human context (10 minutes)
Cowork flags every section where it inferred rather than directly read information. These flagged sections are the ones that need your review. Typically there are 3–5: usually around the exact escalation decision points, team communication that happened in a video call rather than Slack, and the precise customer impact assessment. Fill these in from memory while they're fresh.
Publish and create the Jira tickets
Use the Confluence connector to publish the post-mortem directly to your team space. Then use the Jira connector to create action item tickets — Cowork can format the action items as Jira issues with the title, description, assignee, and suggested due date already populated. The action items are live in your backlog before the post-mortem meeting happens.
Post-Mortem Prompt Templates
Produce a complete incident post-mortem for the incident that ended at [TIME] [DATE]. Artefacts I'm providing: - PagerDuty alert timeline [attached] - Slack incident channel export [attached] - Service runbook for [SERVICE NAME] [attached] - Monitoring screenshots [attached if available] Post-mortem format: Google SRE (adapt if our runbook specifies a different format) Required sections: 1. SUMMARY — 2-3 sentences: what failed, how long, what was the impact 2. TIMELINE — Chronological with UTC timestamps from the actual alert data. Include: first alert, escalation events, key diagnostic decisions, resolution. Mark inferred timestamps with [EST]. 3. ROOT CAUSE — 5-Whys analysis. Start with the immediate cause and drill down. Don't stop at "human error" — that's always a symptom. 4. CONTRIBUTING FACTORS — System, process, or tooling factors that made the incident worse or harder to resolve 5. IMPACT — Duration, affected services, affected users/customers (use actual numbers from the Slack channel if mentioned) 6. WHAT WENT WELL — At least 3 things (honest, specific — not boilerplate) 7. WHAT WENT POORLY — At least 3 things (honest, specific) 8. ACTION ITEMS — Each formatted as: [TITLE] | Owner: [name or role] | Priority: P1/P2/P3 | Due: [relative date, e.g., "within 2 weeks"] | Type: Monitoring/Process/Code/Runbook For any section where you inferred rather than directly read information from the artefacts, add [VERIFY] so I can review it. Do not write "human error" as a root cause without explaining the system conditions that made the error easy to make and hard to catch.
Produce a shortened post-mortem for the [P2/P3] incident on [SERVICE] at [TIME]. Artefacts: [paste Slack thread or PagerDuty alert details] Shortened format: 1. What failed (1 sentence) 2. Timeline (bullet points, key moments only) 3. Root cause (1-2 sentences) 4. Action items (max 3, most important only) 5. Resolution confirmed by: [who confirmed service recovery] This doesn't need to be exhaustive — it needs to exist and be accurate.
I'm attaching our last 10 post-mortems. Analyse them for: 1. Action item completion patterns: which categories of action items (monitoring, code, runbook, process) get completed vs abandoned? 2. Root cause patterns: what recurring system issues appear across multiple post-mortems? 3. Time-to-publish: based on the dates in the documents, how long after incidents do post-mortems get published? 4. Template gaps: are there questions the current template doesn't ask that the incidents suggest we should be asking? Produce a report with specific recommendations for improving our post-mortem process.
Integration Setup: PagerDuty, Confluence, and Jira
The Cowork post-mortem workflow gets significantly faster when the data flows automatically. Here's how to configure the integrations:
PagerDuty Integration
Connect Cowork to PagerDuty via the MCP connector. Configure it to export incident timelines — alert trigger, acknowledgement, escalation, and resolution events — in a format Cowork can read directly. When an incident resolves, Cowork can pull the full timeline automatically without any manual export step.
Confluence Integration
Configure the Confluence connector with write access to your SRE team space. When Cowork finalises a post-mortem, it publishes it to the correct space with the right page template applied. This eliminates the copy-paste step and ensures every post-mortem uses a consistent format in Confluence.
Jira Integration
The Jira connector can create issues from post-mortem action items. Configure it with the correct project, issue type (usually Task or Bug depending on your workflow), and default labels for post-mortem action items. Cowork formats the action items as Jira issues — title, description, component, assignee — and creates them in bulk after the post-mortem is approved.
For organisations using non-standard tooling (ServiceNow for incident management, Notion instead of Confluence, Linear instead of Jira), the MCP server development service can build custom connectors that connect Cowork to your specific stack.
Making Post-Mortems Actually Change Things
Writing better post-mortems faster is valuable on its own. But the real return is when post-mortems inform future runbooks, change management processes, and monitoring coverage. Claude Cowork enables a feedback loop that manual documentation can't sustain:
- After a post-mortem, Cowork can compare the action items against your existing runbook and identify which gaps the incident revealed. It then updates the runbook draft with the missing diagnostic steps.
- The deployment risk assessment automation reads post-mortem history to identify patterns — "this type of database migration has caused 3 incidents in the last 12 months" — and surfaces those risks at deployment review time.
- Quarterly, the runbook gap analysis reads all post-mortems and identifies services where post-mortem data exists but runbook coverage doesn't — feeding the documentation backlog automatically.
The compound effect: When post-mortems are published fast and acted on, MTTR drops. Teams with mature post-mortem cultures and maintained runbooks resolve P1 incidents 40–60% faster than those without. The time you spend writing post-mortems pays back in time saved during future incidents.
Related Resources
Frequently Asked Questions
What artefacts does Cowork need to produce an accurate post-mortem?
The minimum viable set is: a PagerDuty alert timeline export (or equivalent from your alerting system) and the Slack incident channel export. With just these two, Cowork can produce an accurate timeline and basic root cause analysis. The output improves significantly when you also include the service runbook (so Cowork understands the expected behaviour), monitoring screenshots (for visual evidence of the anomaly), and any deployment records from around the incident start time.
How does Cowork handle the 5-Whys analysis when it doesn't have full context?
Cowork derives the 5-Whys from what it can read in the artefacts. Where it can trace the causal chain from the data, it does. Where it needs to infer — particularly for system design decisions or process factors — it flags these sections with [VERIFY] so the incident reviewer can fill in the context. The first pass is always from evidence; the inference fills gaps. The quality of the 5-Whys improves when you include previous post-mortems as context — Cowork can identify systemic patterns across incidents.
Can Cowork produce post-mortems in our custom template format?
Yes. If your organisation uses a specific template (Atlassian's format, a custom template from your incident management system, or a regulatory-specific format for industries like financial services), save the template as a Cowork skill instruction. Cowork will follow it. You can also mix template sections — for example, using the Google SRE timeline format but a custom action item format that maps to your Jira workflow.
What about post-mortems for incidents that involve third-party services?
Post-mortems for incidents involving AWS, Azure, or third-party APIs are a specific case. Cowork handles them well when you include the vendor's status page incident report alongside your own artefacts. It can then structure the post-mortem to clearly separate what the vendor did, what you did in response, and what your systems did to mitigate — which is the right structure for this type of incident. Action items for third-party incidents typically focus on resilience improvements (circuit breakers, fallbacks, monitoring) rather than root cause fixes.
How do we handle post-mortems for sensitive security incidents?
Security incident post-mortems need to be handled within your data classification framework. Claude Cowork running within Claude Enterprise means no data leaves your enterprise tenant. For post-mortems involving PII, customer data exposure, or security breaches, work with your security team to define what can be included in the Cowork canvas versus what needs to be documented through a separate, more restricted channel. Our security and governance service covers the framework for this.
Post-Mortems That Get Written Are the Only Kind That Help
We deploy Claude Cowork for platform engineering teams and configure the post-mortem workflow from day one. Same-day post-mortems. Jira action items created automatically. Documentation that actually reflects what happened.