COE Root Cause
/install coe-root-cause
COE Root Cause
Use when the user asks for a COE, Correction of Error, postmortem, root-cause analysis, "why did this recur", "what was missed", or "do not let this happen again".
The job is to explain the mechanism that allowed the failure, fix the mechanism where possible, and prove the same class of failure is harder to repeat.
Library Fit
Use this skill for a formal post-failure Correction of Error: a recurring failure, false success, missed work, data loss, brittle automation, or user-visible miss that needs a written record with impact, timeline, root cause, corrective actions, and verification.
Adjacent skills keep their narrower jobs:
- Debugging or investigation skills handle active bugs before the failure mechanism is understood.
- Review skills handle pre-landing diff or PR risk.
- Retrospective skills summarize engineering trends over a time window.
- Skill-creation skills turn a proven workflow or corrective action into a durable skill, script, test, or guardrail.
Rules
- Classify the failure before rerunning or changing anything.
- Do not stop at symptoms like "timeout", "model failed", "tool failed", or "human error".
- Preserve concrete evidence: logs, command output, diffs, tests, screenshots, report paths, source references, or exact user-visible behavior.
- Redact secrets, tokens, personally identifying information, customer data, and private workspace details. Prefer source references or short excerpts over raw dumps, especially in public artifacts.
- Ask before public, destructive, expensive, or externally visible actions.
- Keep private workspace, customer, or user details out of public artifacts unless the user explicitly approves disclosure.
- If the user asked only for a report or analysis, propose corrective actions instead of applying code or workflow changes.
- Every corrective action needs verification evidence. If it cannot be verified, rewrite it.
Failure Classification
Classify the failure before changing anything. Name the primary failure mode:
- Required work failed visibly: command, job, test, or pipeline failed and the required work did not complete.
- Required work silently skipped or falsely succeeded: the system reported done while required work was missing.
- Required work completed incompletely or incorrectly: an artifact exists but is partial, stale, under-extracted, or wrong enough to matter.
- User-visible response missed the expectation: the answer omitted a request, misrouted the work, or gave inaccurate status.
- Optional diagnostic failed only: a non-required search, probe, or log lookup failed while required work is independently verified.
Then identify evidence-backed contributing conditions:
- timeout, rate limit, or transient provider failure
- missing file, schema drift, or dependency drift
- model configuration, policy, or routing mismatch
- source availability or extractor failure
- brittle command, parser, query, or ad hoc script
- unclear ownership, interface, or skill instruction
- absent verification, closeout, or blocked-state gate
- other or unknown, with the evidence still missing
If an optional diagnostic failure hides whether required work happened, reclassify it as false success, incomplete work, or visible failure. Do not let "optional" obscure the primary task.
Evidence Packet
Collect the smallest packet that explains the failure:
- user request or expectation
- promised behavior
- actual behavior
- first bad observable result
- affected scope
- relevant logs, reports, code paths, and tests
- existing guardrail that should have caught it
State uncertainty plainly. Do not bury the answer in unrelated logs.
Analysis Loop
- Build a short timeline with timestamps or ordered events.
- Run at least 5 Whys.
- Continue past 5 if the answer is still a symptom, vague human explanation, or unverifiable guess.
- Separate proximate cause from root cause.
- Name the missing guardrail, unclear interface, unsafe default, or unchecked assumption that let the issue recur or become user-visible.
Bad root causes:
- "the agent forgot"
- "the model made a mistake"
- "we should be more careful"
- "the command failed"
- "the user did not specify enough"
Good root causes identify a durable fix: a test, validator, workflow gate, ownership boundary, safer default, clearer skill instruction, or explicit blocked-state receipt.
Corrective Actions
For each action, include:
- owner or owning surface
- exact change
- status: done, planned, blocked, or rejected
- verification evidence
- expected future detection signal
Prefer class-level safeguards over one-off cleanup.
Verification Gate
Before saying the COE is complete, run the smallest credible verification:
- targeted regression test
- static validation for generated docs or frontmatter
- dry run against the failed case
- closeout checklist mapping each user request to evidence
- local AI/code review for nontrivial diffs
If a gate cannot run, say why and what evidence substitutes for it.
Report Template
# COE: \x3Cfailure name>
Date: \x3Cdate>
Status: done | planned | blocked
Severity: low | medium | high
## Summary
One short paragraph: what failed, why it mattered, and what changed.
## Impact
- Who or what was affected
- What was wrong or missing
- What was not affected
## Timeline
- \x3Ctime/order>: \x3Cevent>
## Failure Classification
Failure mode: \x3Cprimary failure mode from Failure Classification and why>
Contributing conditions: \x3Csupported conditions, or unknown with missing evidence>
## Evidence
- \x3Csource or command>: \x3Cwhat it proves>
## Root Cause
### 5+ Whys
1. Why? ...
### Root Cause Statement
\x3Cmechanism, not blame>
## Corrective Actions
| Action | Status | Verification |
| ------ | -------------------- | ------------ |
| ... | done/planned/blocked | ... |
## Verification
- \x3Cgate>: \x3Cresult>
## Residual Risk
\x3Cwhat could still fail and how it will be noticed>
Closeout
Lead with the root cause and verified fix. Keep the user-facing summary short. If anything remains open, say exactly what evidence is still missing.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install coe-root-cause - 安装完成后,直接呼叫该 Skill 的名称或使用
/coe-root-cause触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
COE Root Cause 是什么?
Run a Correction of Error root-cause analysis for recurring failures, false success, missed work, data loss, and brittle automation. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 38 次。
如何安装 COE Root Cause?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install coe-root-cause」即可一键安装,无需额外配置。
COE Root Cause 是免费的吗?
是的,COE Root Cause 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
COE Root Cause 支持哪些平台?
COE Root Cause 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 COE Root Cause?
由 ghitafilali(@ghitafilali)开发并维护,当前版本 v1.0.0。