← 返回 Skills 市场
scytheshan-pixel

Incident Fupan (事故复盘) — Structured Root Cause Analysis

作者 scytheshan-pixel · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
407
总下载
0
收藏
4
当前安装
1
版本数
在 OpenClaw 中安装
/install incident-fupan
功能描述
事故复盘 / Incident Fupan — structured root cause analysis for production failures, outages, bugs, and near-misses. Use when: (1) 事故复盘 or incident review is need...
使用说明 (SKILL.md)

Incident Fupan (事故复盘)

Structured root cause analysis and prevention protocol for production incidents.

Language Rule

Report language matches the user's language. Chinese request → Chinese report. English → English.

When to Trigger

  • Production failure, outage, or unexpected loss
  • Agent mistake with real consequences (wrong deployment, fabricated output acted upon)
  • Near-miss that could have caused damage
  • Periodic review of past incidents to extract patterns

Postmortem Flow

Step 1: Gather Evidence

Before writing anything, collect raw evidence. Do NOT rely on memory or assumptions.

Required evidence (use tools to retrieve):

  • Logs: exec("grep -i 'error\|fatal\|exception' {logfile} | tail -50")
  • Git state: exec("git log --oneline -10"), exec("git diff HEAD~1 --stat")
  • Service state: exec("systemctl status {service}"), exec("ps aux | grep {process}")
  • Data files: read any CSVs, configs, or state files involved

Rule: Every factual claim in the postmortem must cite a source. Format: [Source: {filepath}:{line} or {command output}]

If evidence is unavailable (logs rotated, service restarted), explicitly mark: [Evidence unavailable: {reason}]

Step 2: Build Timeline

Construct a minute-by-minute (or hour-by-hour) timeline from evidence:

HH:MM UTC — {what happened} [Source: {evidence}]
HH:MM UTC — {what happened} [Source: {evidence}]
...
HH:MM UTC — Incident resolved / mitigated

Include: first symptom, detection, escalation, diagnosis, fix, verification.

Mark the detection gap: time between first symptom and human awareness. This is often the real problem.

Step 3: 5 Whys Root Cause

Drill down from symptom to root cause:

Why 1: {symptom happened} → Because {direct cause}
Why 2: {direct cause} → Because {deeper cause}
Why 3: {deeper cause} → Because {systemic issue}
Why 4: {systemic issue} → Because {process/design gap}
Why 5: {process/design gap} → Because {root cause}

Stop when you reach something you can change. If you reach "the model hallucinated" — that's not actionable. Go deeper: why was the output trusted without verification? Why was there no checkpoint?

Step 4: Write Report

Save to: ~/incidents/INC{NNN}_{TOPIC}_{YYYYMMDD}.md

Create ~/incidents/ if it doesn't exist.

See references/report-template.md for the full template with all required sections.

Report must include all 8 sections:

  1. Header (ID, severity, date, duration, status)
  2. Executive Summary (3 sentences max)
  3. Timeline with sources
  4. Impact Assessment (quantified: time lost, money lost, trust lost)
  5. 5 Whys Root Cause
  6. Fix (what was done immediately)
  7. Prevention (new rules, checks, or automation to prevent recurrence)
  8. Action Items (P0/P1/P2 with owners and deadlines)

Step 5: Extract Defensive Rules

The most valuable output of a postmortem is new rules that prevent recurrence.

Pattern: Incident → Rule → Enforcement mechanism

Examples from real incidents:

  • Unauthorized deployment → "All deploys require explicit human approval" → Gate in CI/CD
  • Fabricated data report → "All data claims must cite source file + row count" → L1 in Hallucination Guard
  • Session bloat causing errors → "Compact at 80% context usage" → Automated monitoring

See references/patterns.md for a library of incident-to-rule patterns.

Step 6: Reply and Store

  1. Save report file to ~/incidents/
  2. Reply to user with the full report
  3. Store key lessons to long-term memory
  4. If applicable: update AGENTS.md, TOOLS.md, or relevant skill with new rules

Severity Levels

Level Criteria Response Time
SEV1 Money lost, data corrupted, security breach Immediate
SEV2 Service down, wrong actions taken from bad data Within 1 hour
SEV3 Degraded performance, near-miss, wasted time >2h Within 24 hours
SEV4 Minor issue, caught before impact Next convenient time

Integration

  • Hallucination Guard: Incidents caused by agent fabrication → add to L1/L3 detection rules
  • War Room: After postmortem, use War Room to evaluate proposed prevention strategy
  • Postmortem → new rules → enforcement → fewer incidents → feedback loop

References

安全使用建议
This skill appears to do what it says (structured postmortems) but includes persistent and cross-skill actions that need human controls. Before installing or enabling it: (1) Confirm where 'long-term memory' is stored and whether that storage is visible to third parties; disable automatic memory writes or require explicit approval. (2) Require human review before the agent writes the final report to permanent storage or publishing the report. (3) Require human approval before the agent updates AGENTS.md, TOOLS.md, or any other skill/config — treat those as change requests, not automatic edits. (4) Limit file access: only allow the agent to read explicitly agreed log/config file paths; avoid giving blanket filesystem access or root privileges. (5) Redact or review sensitive data (credentials, PII) before storing or sharing; add a step to mask secrets. (6) Run the skill in a sandboxed or least-privilege environment when possible and audit the created ~/incidents files. If you need higher assurance, ask the skill author to remove automatic memory writes and to require a confirm step before any file writes or skill/document edits.
功能分析
Type: OpenClaw Skill Name: incident-fupan Version: 1.0.0 The skill is classified as suspicious due to multiple critical vulnerabilities. The `SKILL.md` file contains instructions for the agent to execute shell commands (`grep`, `git log`, `git diff`, `systemctl status`, `ps aux`) using `exec()`. If the arguments to these commands (`{logfile}`, `{service}`, `{process}`) are derived from untrusted user input without sanitization, this creates a direct shell injection vulnerability (RCE risk). More critically, the `SKILL.md` explicitly instructs the agent to `update AGENTS.md, TOOLS.md, or relevant skill with new rules`. This is a significant prompt injection vulnerability, as an attacker could craft an incident report that tricks the agent into generating and then applying malicious 'rules' to its own configuration or other skills, potentially leading to persistence, altered behavior, or unauthorized actions within the agent's environment.
能力评估
Purpose & Capability
The name/description (incident postmortem) align with the instructions: collecting logs, git state, service/process status, data files, building a timeline, running 5 Whys, producing a formatted report, and proposing defensive rules. Commands suggested (grep, git, systemctl, ps, file reads) are appropriate for root-cause analysis.
Instruction Scope
SKILL.md gives explicit runtime commands to collect evidence (grep logs, git log/diff, systemctl/ps, read CSVs/configs) and mandates that every factual claim cite a source. That scope is appropriate, but it requires the agent to read arbitrary files and run shell commands — which is expected for an incident review but should be limited to explicitly approved paths and kept read-only. The skill also instructs the agent to 'store key lessons to long-term memory' and to 'update AGENTS.md, TOOLS.md, or relevant skill with new rules' — these steps expand the agent's scope beyond a single report and introduce change/write operations that should be gated by human review.
Install Mechanism
Instruction-only skill with no install spec and no code files; nothing will be written to disk by an installer. This is the lowest install risk.
Credentials
No environment variables, credentials, or config paths are declared or required. However, the instructions require reading potentially sensitive artifacts (logs, configs, data files) and writing a report to ~/incidents plus saving lessons to long-term memory. Those actions are proportionate to a postmortem, but they can expose secrets or PII if not handled carefully — the skill does not provide redaction guidance or limits on what to persist to memory.
Persistence & Privilege
Although always:false, the skill directs the agent to (1) save files under ~/incidents, (2) store lessons to long-term memory, and (3) update AGENTS.md/TOOLS.md or 'relevant skill' with new rules. Writing to long-term memory and modifying other skill/docs represents non-trivial persistence and cross-scope modification. These write/update operations should require explicit human authorization and auditing; otherwise they increase the blast radius if misused.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install incident-fupan
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /incident-fupan 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: Renamed from incident-postmortem for better trigger recognition. Triggers on: 复盘, fupan, postmortem, incident review, 事故分析. 6-step flow with 5 Whys, pattern library of 6 incident classes, report template. Part of the IRIS Skill Trilogy (War Room + Hallucination Guard + Incident Fupan).
元数据
Slug incident-fupan
版本 1.0.0
许可证
累计安装 4
当前安装数 4
历史版本数 1
常见问题

Incident Fupan (事故复盘) — Structured Root Cause Analysis 是什么?

事故复盘 / Incident Fupan — structured root cause analysis for production failures, outages, bugs, and near-misses. Use when: (1) 事故复盘 or incident review is need... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 407 次。

如何安装 Incident Fupan (事故复盘) — Structured Root Cause Analysis?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install incident-fupan」即可一键安装,无需额外配置。

Incident Fupan (事故复盘) — Structured Root Cause Analysis 是免费的吗?

是的,Incident Fupan (事故复盘) — Structured Root Cause Analysis 完全免费(开源免费),可自由下载、安装和使用。

Incident Fupan (事故复盘) — Structured Root Cause Analysis 支持哪些平台?

Incident Fupan (事故复盘) — Structured Root Cause Analysis 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Incident Fupan (事故复盘) — Structured Root Cause Analysis?

由 scytheshan-pixel(@scytheshan-pixel)开发并维护,当前版本 v1.0.0。

💬 留言讨论