Incident Response Runbook
/install incident-response-runbook
Incident Response Runbook
Generate, maintain, and execute incident response runbooks for production systems. Use when setting up incident workflows, responding to outages, or documenting post-incident learnings.
Usage
Generate Runbook
Create an incident response runbook for [service/system].
Infrastructure: [cloud provider, key services].
Common failure modes: [list known issues].
During Incident
Incident: [description]. Severity: [1-4].
Current symptoms: [what's happening].
Help me triage and respond.
Post-Incident
Generate a post-incident review for: [incident summary].
Timeline: [key events with timestamps].
Resolution: [what fixed it].
Runbook Structure
Generated runbooks follow this template:
# [Service] Incident Response Runbook
## Quick Reference
- **On-call:** [rotation link]
- **Dashboards:** [monitoring links]
- **Escalation:** [contact chain]
## Severity Levels
- **SEV1**: Complete outage, revenue impact → respond in 5 min
- **SEV2**: Degraded service, user-facing → respond in 15 min
- **SEV3**: Internal impact, no users affected → respond in 1 hour
- **SEV4**: Cosmetic or minor, no urgency → next business day
## Triage Steps
1. Confirm the issue (check dashboards, reproduce)
2. Assess blast radius (which users/services affected)
3. Assign severity level
4. Start incident channel/thread
5. Communicate to stakeholders
## Failure Modes
### [Failure Mode 1: e.g., Database Connection Pool Exhaustion]
**Symptoms:** [what you'll see]
**Diagnosis:** [commands to run, logs to check]
**Mitigation:** [immediate steps to restore service]
**Root Fix:** [permanent solution]
### [Failure Mode 2: e.g., Memory Leak in Worker Process]
...
## Rollback Procedures
[Service-specific rollback steps]
## Communication Templates
[Internal + external status page templates]
## Post-Incident Review Template
[Blameless review structure]
Scripts
scripts/generate_runbook.py
Generate a runbook skeleton from service metadata:
python3 scripts/generate_runbook.py --service api-gateway \
--provider aws --region us-east-1 \
--monitors datadog,pagerduty \
--output runbook-api-gateway.md
AI Enhancement
When used as an agent skill, the incident responder:
- Guides triage in real-time with diagnostic commands specific to the stack
- Correlates symptoms with known failure modes from the runbook
- Drafts status page updates and internal communications
- Generates post-incident reviews with timeline, root cause analysis, and action items
- Learns from past incidents to improve future runbooks
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install incident-response-runbook - 安装完成后,直接呼叫该 Skill 的名称或使用
/incident-response-runbook触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Incident Response Runbook 是什么?
Create, maintain, and execute detailed incident response runbooks to guide triage, communication, and post-incident reviews for production outages. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 52 次。
如何安装 Incident Response Runbook?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install incident-response-runbook」即可一键安装,无需额外配置。
Incident Response Runbook 是免费的吗?
是的,Incident Response Runbook 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Incident Response Runbook 支持哪些平台?
Incident Response Runbook 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Incident Response Runbook?
由 charlie-morrison(@charlie-morrison)开发并维护,当前版本 v1.0.0。