/install bug-pattern-diagnosis-en
\r \r
Bug Pattern Diagnosis (Symptom-Based Triage & Experience Accumulation)\r
\r
Skill Purpose\r
\r This skill works like a doctor diagnosing a patient:\r \r
- Collect symptoms — Ask the user to describe the bug (error messages, reproduction rate, environment, log patterns, etc.)\r
- Recall past cases — Look up similar bugs in the
experience/folder as reference memory\r - Diagnose independently — Investigate using the current project's context. Past cases are only sources of inspiration and hypotheses, never reused as-is\r
- Accumulate new experience — After successfully identifying a new bug, save it as a new
BUGxx.mdfor future reference\r \r Core value: Let past troubleshooting experience inspire the direction of new investigations — but never replace independent thinking. Similar symptoms do not guarantee the same root cause. Logs that look identical may hide completely different failures.\r \r
Iron Rule: Experience Is Reference, Not the Answer\r
\r
This is the single most important principle of this skill.\r \r The
BUGxx.mdfiles underexperience/are an old doctor's case notebook, not a prescription template. When examining a new patient:\r \r
- ✅ What you MAY do: Use the notebook to learn "what conditions are worth suspecting given these symptoms", "what methods have worked before", "what traps are easy to fall into"\r
- ❌ What you MUST NOT do: Copy the root cause, copy the fix, or copy code changes just because the symptoms look similar\r \r
Why Direct Reuse Is Forbidden\r
\r
- Symptoms can match, root causes may not: "NPE + intermittent + multi-replica" may be a missing header in one case, a Redis connection pool flake in another, or a GC stop-the-world race in yet another — or multiple overlapping problems.\r
- Projects differ wildly: The same code pattern behaves differently across versions, configs, and dependencies.\r
- AI misdiagnosis is expensive: Copy-pasting a case misleads the user into changing code based on a wrong premise, masking the real bug.\r
- The value of experience is in prompting thought, not giving answers: Good doctors read case files to broaden their thinking, not to copy-paste treatment plans.\r \r
Correct Usage Posture\r
\r
| Situation | Wrong approach | Right approach |\r
|---|---|---|\r
| Symptoms match closely | "This is BUG01, modify invokeRemoteDeviceOpt and add x-token-payload" | "Your description reminds me of BUG01 — one of its signatures was asymmetric logs across replicas. Before proceeding, can you verify: do the failing requests actually produce 'stack trace on one replica, one-liner on another'?" |\r
| Partial feature match | "Not exactly, but the BUG01 fix should still work" | "There's a diagnostic trick from BUG01 that might apply here — send 100 requests in a row and see if the success rate is ≈ 1/N. Let's validate that first." |\r
| Case contains specific code | Paste BUG01's fix and ask user to apply | "BUG01's fix idea was propagating headers, but for your project you need to first confirm: (1) What headers does your receiver actually depend on? (2) Is the serialization identical to the gateway's? We'll need these answers before writing any code." |\r
\r
Case Library Structure\r
\r
bug-pattern-diagnosis-en/\r
├── SKILL.md ← This file (purpose, flow, matching rules)\r
└── experience/ ← Case library\r
├── BUG01.md ← One document per case\r
├── BUG02.md\r
└── ...\r
```\r
\r
Each `BUGxx.md` follows a fixed structure for fast lookup:\r
1. **Case summary** (one paragraph)\r
2. **Symptom / signature checklist** (like "positive findings" in a medical case — used for matching)\r
3. **Detailed explanation** (pathology / root-cause chain)\r
4. **Diagnostic methodology** (investigation flow / key techniques)\r
5. **Remediation plan** (fix + safety net + hardening)\r
6. **Prevention checklist** (to avoid recurrence)\r
7. **Playbook for similar intermittent bugs** (step-by-step for analogous cases)\r
\r
## When to Activate This Skill\r
\r
Prefer this skill when the user describes issues like:\r
\r
- "This endpoint **sometimes** errors and sometimes works"\r
- "Reproduces in production but not locally/in testing"\r
- "The error in the logs looks weird / doesn't match the code / the stack points nowhere useful"\r
- "Multiple pods / instances / replicas / machines behave inconsistently"\r
- "I clearly sent xxx, but the server says xxx is missing"\r
- "Occasional timeouts / occasional 500s / occasional permission denied"\r
- Anything framed as "**intersection / intermittent / non-deterministic**"\r
\r
## Core Flow\r
\r
### Step 1: Read the Case Library Index\r
\r
Read the **symptom / signature checklist** section from **all** `BUG*.md` files under `experience/` (the first 30-50 lines of each file typically suffice). Do not read full files upfront.\r
\r
### Step 2: Structure the Symptoms\r
\r
Extract **key features** from the user's description, covering at least these dimensions:\r
\r
| Dimension | Examples |\r
|---|---|\r
| Error signal | NPE / 500 / 403 / timeout / data corruption / deadlock |\r
| Reproduction rate | 100% / 50% / sporadic / specific conditions |\r
| Environment delta | Doesn't repro locally? repros in test? in prod? |\r
| Multi-instance traits | Single replica / multiple? how many? |\r
| Log distribution | Concentrated on one instance / scattered / one has stack one doesn't |\r
| Triggering conditions | Specific user / specific params / specific time window |\r
| Recent changes | New release? config change? scale up/down? dependency upgrade? |\r
\r
### Step 3: Recall Cases (Not Match Conclusions)\r
\r
Compare the structured features against each case library entry for **similarity**, but **never** treat the result as a conclusion. No matter how high the similarity, a case is only a **"prior experience worth consulting"**, not a **"confirmed answer"**.\r
\r
Similarity handling:\r
\r
- **High overlap** (3+ key features match) → Treat the case as a **"priority suspicion direction"**, but guide the user to **independently verify** whether those key features actually hold\r
- **Partial overlap** (1-2 features match) → Treat the case as a **"possible source of ideas"**; mention that one or two of its diagnostic tricks may apply\r
- **No match** (0 features) → Go to the general methodology and investigate from scratch\r
\r
### Step 4: Offer Investigation Suggestions (Not Diagnostic Conclusions)\r
\r
**Do not** assert "this is BUGxx". Use the tone of "past cases had similar traits, here are **possible directions and ways to verify**".\r
\r
Recommended response structure:\r
\r
- 🧭 **Restate the structured symptoms** (confirm you understood correctly)\r
- 💭 **Experience reference**: Mention 1-2 relevant cases briefly ("this reminds me of BUGxx which had a similar trait X") — but **do not paste root cause or fix code**\r
- 🔬 **Independent verification steps**: List 2-3 quick checks the user can run to validate hypotheses (success rate over 100 requests / cross-replica log comparison / header packet capture, etc.)\r
- 🎯 **Project-specific investigation path**: Based on the user's project structure and code style, give **customized** next steps — not the generic steps from the case\r
- ⚠️ **State uncertainty clearly**: Say plainly "this is just a hypothesis based on similar symptoms; the real root cause needs your verification"\r
\r
### Step 4 — Negative Example (DO NOT OUTPUT)\r
\r
```\r
❌ "Based on your symptoms, this is BUG01 (multi-replica intermittent NPE).\r
The root cause is that invokeRemoteDeviceOpt drops x-token-payload.\r
Apply this fix: [pastes BUG01's code directly]"\r
```\r
\r
### Step 4 — Positive Example\r
\r
```\r
✅ "A few things in your description stand out:\r
1) Intermittent failure with success rate near 50%\r
2) The error looks token-related, but the client clearly sent a token\r
\r
This reminds me of a past case (BUG01) where similar symptoms were\r
caused by 'multi-replica internal calls dropping an identity header'.\r
Your project isn't necessarily the same thing, so **let's validate\r
a couple of hypotheses first**:\r
\r
Q1: How many replicas does your service have? Does the success rate\r
approximate 1/N?\r
Q2: Can you tail logs from all replicas simultaneously and see if\r
the failing request leaves traces on multiple replicas with\r
asymmetric detail levels?\r
\r
Run those two checks, share the results, and we'll pick a path."\r
```\r
\r
### Step 5: Accumulate a New Case (Optional)\r
\r
Proactively **ask the user** whether to capture this investigation as a new case when:\r
\r
- No case in the library matched, but the bug was successfully identified\r
- An existing case partially matched, but there's a significantly new variant\r
- A new diagnostic technique emerged during investigation\r
\r
With user consent, create `BUGxx.md` (numbering = current max + 1) following the **case document template** below.\r
\r
## General Methodology (When No Case Matches)\r
\r
Investigate unknown bugs in this order:\r
\r
### 1. Quantify the Symptom\r
\r
- Success rate? Fire 100 consecutive requests and tally\r
- Reproduction conditions? Can a minimal case reproduce reliably?\r
- Time-of-failure distribution? Specific time window / user / machine?\r
\r
### 2. Inspect Deployment Topology\r
\r
- How many replicas? Does success rate ≈ `1/N` or `(N-1)/N`?\r
- Can a single replica reproduce? If not → strong signal of inter-replica inconsistency\r
- Any canaries / gray releases? Is there a mix of old and new versions?\r
\r
### 3. Cross-Instance Log Comparison\r
\r
- By traceId / time window, pull logs from **all relevant instances** and view side-by-side\r
- Look for cases where "the same request leaves dislocated evidence across instances"\r
- Watch for **asymmetric verbosity** (one instance has full stack, another has a one-liner)\r
\r
### 4. Source vs Deployed Binary Parity\r
\r
- `javap -c -p` the deployed jar's key classes and diff against local source\r
- Verify image tag, git commit hash match expectations\r
- Check ConfigMaps / env vars are identical across all replicas\r
\r
### 5. Protocol Boundary Audit\r
\r
If you suspect context propagation issues:\r
\r
- Draw the full lifecycle of "identity / traceId / MDC / ThreadLocal"\r
- Mark every "cross-thread / cross-process / cross-instance" boundary\r
- For each boundary, does an explicit pack → unpack mechanism exist?\r
- Use tcpdump / print `getHeaderNames()` at the receiver to verify what headers actually arrive\r
\r
### 6. Compare Against Sibling Code\r
\r
- Does the project contain **similar code** that does **not** have this bug?\r
- Diff the buggy code vs the healthy code — the delta is often the culprit\r
\r
## Case Document Template (for new BUG*.md files)\r
\r
```markdown\r
# BUGxx: \x3COne-line title emphasizing the most distinctive symptom>\r
\r
## Case Summary\r
\r
\x3COne paragraph, under 200 words: phenomenon, reproduction conditions, root-cause type, blast radius>\r
\r
## Symptom / Signature Checklist (for matching)\r
\r
> If N+ of these hold simultaneously, high probability of this case\r
\r
- [ ] Feature 1 (concrete, verifiable)\r
- [ ] Feature 2\r
- [ ] Feature 3\r
- [ ] ...\r
\r
### Key Log Fingerprints\r
\r
\x3CPaste typical error log snippets so future investigations can grep-match them>\r
\r
### Exclusion Criteria (Negative Signals)\r
\r
- If \x3Cxxx> appears, this is NOT this case\r
\r
## Detailed Explanation / Root Cause Chain\r
\r
\x3CThe pathology. Use diagrams / tables / code references to show data flow, control flow, state transitions>\r
\r
## Diagnostic Methodology\r
\r
### Techniques Used\r
\r
- \x3CTechnique 1, e.g. "cross-replica log comparison">\r
- \x3CTechnique 2, e.g. "javap bytecode diff">\r
- ...\r
\r
### Diagnostic Steps (in order)\r
\r
1. ...\r
2. ...\r
3. ...\r
\r
## Remediation Plan\r
\r
### Root-Cause Fix\r
\r
\x3CThe core fix, with code reference, impact assessment>\r
\r
### Safety Net / Defense in Depth\r
\r
\x3CSecondary defenses so a missed fix still doesn't crash>\r
\r
### Hardening / Cleanup\r
\r
\x3CLong-term improvements, coding standards, sibling-code sweeps>\r
\r
## Prevention Checklist\r
\r
During development:\r
- [ ] ...\r
\r
During deployment:\r
- [ ] ...\r
\r
During review:\r
- [ ] ...\r
\r
## Playbook for Similar Intermittent Bugs\r
\r
When symptoms look similar, walk through:\r
\r
1. Quantify reproduction (10 min)\r
2. Check topology (5 min)\r
3. Cross-replica log diff (30 min)\r
4. ...\r
\r
## References\r
\r
- Relevant file paths\r
- Relevant PRs / commits\r
- Relevant wiki\r
```\r
\r
## Output Style Constraints\r
\r
- **Lead with direction, not with the conclusion**: Tell the user what's worth suspecting and why, instead of asserting "this is bug X"\r
- **Never speak in 100% certain terms**: Similar symptoms ≠ same root cause. Use phrases like "looks like / worth suspecting / could be / past cases suggest"\r
- **Recommendations must be executable**: Provide concrete commands or quick checks, not abstract theory\r
- **Make the user verify before coding**: Better to ask 2 "please confirm..." questions than to let them edit code based on a guess\r
\r
## Hard Rules (Forbidden Actions)\r
\r
- ❌ **Never paste the fix code from a case directly**: The code in `BUGxx.md` is "how that project fixed it", not necessarily right for the current project\r
- ❌ **Never assert "this is BUGxx"**: At most say "this reminds me of BUGxx" / "has similar traits to BUGxx"\r
- ❌ **Never skip independent verification**: Even if 5 of 5 features match, run at least 1-2 quick checks before discussing fixes\r
- ❌ **Never copy-paste the case's prevention checklist or playbook wholesale**: These are **thinking material**; tailor and rewrite each time based on the current context\r
\r
## Case Accumulation Rules\r
\r
- Don't create a case just to create one — only capture truly novel value (new symptom combinations, new diagnostic techniques)\r
- Don't use generic bug taxonomy terms (OOM, deadlock) — use **symptom descriptions** to organize cases ("same request leaves dislocated evidence across replicas" matches better than "state consistency bug")\r
- Index the library by **phenomenon**, not by "tech stack" or "vulnerability class"\r
- End every case with **applicability boundaries** and **counter-examples** (what situations are NOT this case), to sharpen future identification\r
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install bug-pattern-diagnosis-en - 安装完成后,直接呼叫该 Skill 的名称或使用
/bug-pattern-diagnosis-en触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
bug-pattern-diagnosis-en 是什么?
Diagnoses bugs by matching user-reported symptoms against a curated library of past bug cases under experience/, using prior cases as memory references (neve... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 87 次。
如何安装 bug-pattern-diagnosis-en?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install bug-pattern-diagnosis-en」即可一键安装,无需额外配置。
bug-pattern-diagnosis-en 是免费的吗?
是的,bug-pattern-diagnosis-en 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
bug-pattern-diagnosis-en 支持哪些平台?
bug-pattern-diagnosis-en 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 bug-pattern-diagnosis-en?
由 hgvgfgvh(@hgvgfgvh)开发并维护,当前版本 v1.0.0。