/install afrexai-post-mortem
Post-Mortem & Incident Review Framework
Run structured post-mortems that actually prevent repeat failures. Blameless analysis, root cause identification, and action tracking.
When to Use
- After any production incident, outage, or service degradation
- After a missed deadline, failed launch, or lost deal
- After any event costing >$5K or >4 hours of team time
- Quarterly review of recurring incident patterns
Post-Mortem Template
1. Incident Summary (Complete Within 24 Hours)
Incident ID: [AUTO-GENERATED]
Date/Time: [Start] → [End] (Duration: X hours)
Severity: SEV-1 (revenue impact) | SEV-2 (customer impact) | SEV-3 (internal impact)
Impact: [Users affected] | [Revenue lost] | [SLA breached Y/N]
Detection: How was it found? (Monitoring / Customer report / Internal discovery)
Detection Delay: Time from incident start → first alert
2. Timeline (Minute-by-Minute for SEV-1, 15-min blocks for SEV-2/3)
HH:MM - Event description
HH:MM - First alert triggered
HH:MM - Team notified
HH:MM - Investigation started
HH:MM - Root cause identified
HH:MM - Fix deployed
HH:MM - Confirmed resolved
3. Root Cause Analysis — 5 Whys
Why 1: [Direct cause]
Why 2: [Why did that happen?]
Why 3: [Why did THAT happen?]
Why 4: [Systemic cause]
Why 5: [Organizational/cultural root]
4. Contributing Factors
Score each factor 0-3 (0=not a factor, 3=primary contributor):
| Factor | Score | Notes |
|---|---|---|
| Missing/inadequate monitoring | ||
| Insufficient testing | ||
| Documentation gaps | ||
| Process not followed | ||
| Knowledge concentration (bus factor) | ||
| Capacity/scaling limits | ||
| Third-party dependency | ||
| Communication breakdown | ||
| Change management failure | ||
| Technical debt |
5. What Went Well
List 3-5 things that worked during the response:
- Fast detection? Good runbooks? Strong communication? Quick escalation?
6. Action Items
Every action MUST have an owner and deadline:
| # | Action | Owner | Deadline | Priority | Status |
|---|---|---|---|---|---|
| 1 | P0/P1/P2 | Open |
Priority definitions:
- P0: Must complete before next business day
- P1: Must complete within 1 week
- P2: Must complete within 1 sprint/month
7. Recurrence Prevention
- Monitoring added/improved for this failure mode
- Runbook created/updated
- Test coverage added
- Architecture change needed? (If yes, create RFC)
- Training needed for team?
Blameless Post-Mortem Rules
- Focus on systems, not individuals
- "What happened" not "who did it"
- Assume everyone acted with best intentions and available information
- The goal is learning, not punishment
- If you find yourself writing someone's name next to a mistake, rewrite it as a process gap
Incident Cost Calculator
Direct costs:
Revenue lost during downtime: $___
SLA credits issued: $___
Emergency vendor/contractor costs: $___
Indirect costs:
Engineering hours × loaded rate: ___ hrs × $___/hr = $___
Customer churn risk (affected users × churn probability × LTV): $___
Brand/reputation (estimate): $___
Total incident cost: $___
Cost per minute of downtime: $___
Quarterly Incident Review
Every quarter, analyze patterns across all post-mortems:
- Top 3 root cause categories — Where should you invest in prevention?
- Mean time to detect (MTTD) — Is monitoring improving?
- Mean time to resolve (MTTR) — Is response getting faster?
- Action item completion rate — Are you actually fixing things?
- Repeat incidents — Same root cause twice = systemic failure
- Cost trend — Total incident cost per quarter (should decrease)
Industry-Specific Post-Mortem Considerations
| Industry | Key Focus | Regulatory Requirement |
|---|---|---|
| Fintech | Transaction integrity, audit trail | SOX, PCI-DSS incident reporting |
| Healthcare | PHI exposure, patient safety | HIPAA breach notification (60 days) |
| SaaS | SLA compliance, data integrity | SOC 2 incident management |
| E-commerce | Order integrity, payment processing | PCI-DSS, consumer protection |
| Manufacturing | Safety incidents, production loss | OSHA reporting requirements |
Go Deeper
Your post-mortems reveal where AI agents should be deployed first — the repetitive failures, the manual monitoring gaps, the processes that break under load.
- Find your highest-cost gaps: AI Revenue Leak Calculator
- Industry-specific deployment playbooks: AfrexAI Context Packs — $47
- Pick 3: $97 | All 10: $197 | Everything: $247
- Deploy your first agent: Agent Setup Wizard
Built by AfrexAI — turning incident patterns into automation opportunities.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install afrexai-post-mortem - 安装完成后,直接呼叫该 Skill 的名称或使用
/afrexai-post-mortem触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Post-Mortem & Incident Review 是什么?
Guide structured, blameless post-mortems with root cause analysis, action tracking, and prevention steps to reduce repeat production incidents and outages. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 541 次。
如何安装 Post-Mortem & Incident Review?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install afrexai-post-mortem」即可一键安装,无需额外配置。
Post-Mortem & Incident Review 是免费的吗?
是的,Post-Mortem & Incident Review 完全免费(开源免费),可自由下载、安装和使用。
Post-Mortem & Incident Review 支持哪些平台?
Post-Mortem & Incident Review 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Post-Mortem & Incident Review?
由 1kalin(@1kalin)开发并维护,当前版本 v1.0.0。