功能描述

Systematic root-cause debugging with verification. Use for errors, stack traces, broken tests, flaky tests, regressions, or anything not working as expected....

使用说明 (SKILL.md)

Debugging

Name: ia-debugging
Author: iliaal

The Iron Law

Never propose a fix without first identifying the root cause. "Quick fix now, investigate later" is forbidden -- it creates harder bugs. This applies ESPECIALLY under time pressure, when "just one quick fix" seems obvious, or when multiple fixes have already failed. Those are the moments this process matters most.

Trivially obvious bugs are their own root cause -- state the cause and fix directly. A bug is trivially obvious only when the cause is in the error message (e.g., ModuleNotFoundError: no module named foo, a typo in a string literal). If the error shows where something fails but not why (e.g., TypeError: Cannot read 'id' of undefined), it is not trivially obvious -- investigate why the value is undefined.

Root Cause Analysis

Root cause identification is the core deliverable of debugging -- not the fix itself.

Trace backward: Start at the symptom, walk the call chain in reverse to find where behavior diverges from expectation
Differential analysis: Compare working vs broken state across dimensions (code version, data, environment, timing, configuration)
Regression hunting: Use git bisect to pinpoint the exact commit that introduced the issue
Evidence-based: Document root cause with file:line references, log output, and concrete reproduction proof. Root cause = the earliest point where behavior diverged from expectation, stated with evidence at least two levels deep (not just "it failed here" but "it failed here because X was null, and X was null because Y never set it")
Competing hypotheses: When the cause is ambiguous, generate multiple hypotheses and rank by evidence strength (see Escalation section below)

Environment Diagnostics

Capture environment state with bash collect-diagnostics.sh (script). Use during differential analysis or attach to bug reports. See specialized-patterns.md for details.

Process

0. Read the error. Read the full error message, stack trace, and line numbers before doing anything. Error messages frequently contain the exact fix. Don't skim -- read the entire output.

1. Reproduce -- make the bug consistent. If intermittent, run N times under stress or simulate poor conditions (slow network, low memory) until it triggers reliably.

2. Form initial hypotheses -- before investigating broadly, form 2-3 hypotheses based on the reproduction. What are the most likely causes given the symptoms? This focuses the investigation on plausible paths rather than searching aimlessly.

3. Reduce -- strip the reproduction to the minimal failing case. Remove unrelated code, data, and configuration until removing one more piece makes the bug disappear. That remaining piece is the trigger.

4. Investigate -- trace backward through the call chain from the symptom. Compare working vs broken state using a differential table (environment, version, data, timing -- what changed?).

Multi-component systems (CI -> build -> deploy, API -> service -> DB): before proposing fixes, instrument each component boundary:

Log what data enters the component
Log what data exits the component
Verify environment/config propagation across the boundary

Run once to gather evidence showing WHERE it breaks, then investigate that specific component. Use console.error() (not logger, which may be suppressed in tests). Log BEFORE the dangerous operation, not after it fails. Include context: cwd, env vars, new Error().stack.

Pre-existing failure proof: Before claiming a test failure is "not related to our changes," prove it. Run git stash && [test command] on clean state to confirm the failure exists on the base branch. Pre-existing without receipts is a lazy claim.

Before external searches (web, docs, forums): strip hostnames, IPs, file paths, SQL fragments, and customer data from the query. Raw stack traces leak privacy and return noise.

5. Hypothesize and test -- one change at a time. If a hypothesis is wrong, fully revert before testing the next. Use git bisect to find regressions efficiently. Scope lock: after forming a hypothesis, identify the narrowest affected directory or file set. Do not edit code outside that scope during the debug session. If the fix requires changes elsewhere, update the hypothesis first.

6. Fix and verify -- create a failing test FIRST, then fix. Run the test. Confirm the original reproduction case passes. No completion claims without fresh verification evidence (see ia-verification-before-completion).

Debug Report

Emit after every resolved bug. For non-trivial production bugs, also write a full Postmortem (see below).

After resolving, output a structured report:

SYMPTOM:    [What was observed]
ROOT CAUSE: [Why it happened -- file:line with evidence]
FIX:        [What changed]
EVIDENCE:   [Verification output proving the fix]
REGRESSION: [Test added to prevent recurrence]
RELATED:    [Prior bugs in same area, known issues, architectural notes]
STATUS:     DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT (definitions in `ia-verification-before-completion`)

Three-Fix Threshold

After 3 failed fix attempts, STOP. An attempt = one complete hypothesis-test cycle (form hypothesis, make minimal change, verify). The problem is likely architectural, not a surface bug. Escalate to the user before attempting further fixes. Step back and question assumptions about how the system works. Read the actual code path end-to-end instead of spot-checking.

Architectural problem indicators -- signals the bug is structural, not a surface fix:

Each fix reveals new shared state or coupling you didn't expect
Fixes require massive refactoring to implement correctly
Each fix creates new symptoms elsewhere in the system

No root cause found: If investigation is exhausted without a clear root cause, say so explicitly. Document what was checked, what was ruled out, and what instrumentation to add for next occurrence. An honest "unknown" with good diagnostics beats a fabricated cause.

Escalation: Competing Hypotheses

When the cause is unclear across multiple components, use Analysis of Competing Hypotheses (ACH). Generate hypotheses across failure categories, collect evidence FOR and AGAINST each, rank by confidence, and investigate the strongest first.

See competing-hypotheses.md for the full methodology: six failure categories, evidence strength scale, confidence scoring, and anti-patterns.

Intermittent Issues

For race conditions, deadlocks, resource exhaustion, and timing-dependent bugs, see specialized-patterns.md. Key signals: shared mutable state, check-then-act, circular lock acquisition, connection pool exhaustion under load.

Defense-in-Depth Validation

After fixing, validate at every layer -- not just where the bug appeared. See defense-in-depth.md for the four-layer pattern (entry, business logic, environment, instrumentation) with examples.

Bug Triage

When multiple bugs exist, prioritize by:

Severity (data loss > crash > wrong output > cosmetic) separately from Priority (blocking release > customer-facing > internal)
Reproducibility: always > sometimes > once. "Sometimes" bugs need instrumentation before fixing.
Quick wins: if a fix is \x3C 5 minutes and unblocks others, do it first

Common Patterns

Async ordering -- missing await, unhandled promise rejection, callback firing before setup completes. The temporal gap between setup and callback is where bugs hide.
Stale state -- cached values, stale closures, outdated config, old build artifacts. When behavior contradicts the code you're reading, verify you're running what you think you're running.
Recurring fix site -- if git log shows 3+ prior fixes in the same file, the file needs redesign, not another patch. Escalate as architectural smell.

Root Cause Tracing

When a bug manifests deep in the call stack, resist fixing where the error appears. Trace backward through the call chain to find the original trigger, then fix at the source. See root-cause-tracing.md for the full technique with stack instrumentation patterns and test pollution detection.

Pattern Comparison

When the cause isn't obvious, find working similar code in the codebase and compare it structurally with the broken path. Read the working reference implementation completely -- don't skim. List every difference between working and broken, however small. Don't assume any difference can't matter. The bug is in one of them.

Anti-Patterns and Red Flags

When you catch yourself doing or thinking these things, stop and return to Step 1 (Reproduce):

What You're Doing / Thinking	What It Really Means
Shotgun debugging / "I see the problem, let me fix it" / "It's probably X"	Reasoning is not evidence. Form a hypothesis, make one change, test, revert if wrong. Trace the actual execution path.
Ignoring intermittent failures ("works on my machine")	Instrument and reproduce under load. Isolation success doesn't explain integration failure.
"I'll clean up the debugging later"	Remove diagnostic code now or it ships to production.
"This failure is pre-existing, not related to our changes"	Prove it: run the test suite on the base branch. No receipts = no claim.
"The test is wrong, not the code"	Verify before dismissing. Read the test's intent. If the test is genuinely wrong, fix it with a clear rationale, not a silent update.
"Reference too long, I'll adapt the pattern"	Partial understanding guarantees bugs. Read the working example completely and apply it exactly.

See specialized-patterns.md for anti-pattern signals and specialized debugging patterns.

Verify

Root cause identified with file:line evidence (not just "it failed here")
Regression test exists and fails without the fix, passes with it
Debug Report emitted with all seven fields (SYMPTOM, ROOT CAUSE, FIX, EVIDENCE, REGRESSION, RELATED, STATUS)
No diagnostic instrumentation left in code (git diff shows no leftover logging)

Integration

This skill is referenced by:

/ia-work -- during task execution for bug investigation
ia-writing-tests -- creating failing tests to reproduce bugs
ia-verification-before-completion -- before claiming a bug is fixed
ia-bug-reproduction-validator agent -- follows Root Cause Analysis methodology
ia-infrastructure-engineer agent -- follows Postmortem template for production incidents
ia-reproduce-bug command -- automated bug reproduction workflow

Postmortem

For non-trivial production bugs, write a lightweight postmortem (timeline, root cause, impact, fix, prevention). See specialized-patterns.md for the template.

安全使用建议

This skill is about structured debugging and is otherwise coherent, but be careful with the diagnostics output before you share it. If you run the bundled script (bash collect-diagnostics.sh): - Inspect the generated report locally first (bash collect-diagnostics.sh diag.md). - Redact or remove any lines containing absolute paths (PWD), the User value, git remote URLs, and any repository URLs or hostnames before attaching or sending the file. - Check the "Project Files Detected" list for .env or other sensitive files and do not share their contents. Consider running the script in a sanitized environment or with sensitive files temporarily moved/ignored. - If you want tighter safety, edit the script to omit PWD/whoami/git remote lines or add automatic redaction (mask hostnames, strip user names) before producing output. If you cannot or will not redact these fields, avoid running the script or only provide narrow excerpts that you have manually sanitized. If you want, I can suggest a version of collect-diagnostics.sh that redacts or omits sensitive fields automatically.

功能分析

Type: OpenClaw Skill Name: compound-eng-debugging Version: 3.0.4 The skill bundle provides a professional and systematic framework for software debugging, emphasizing root cause analysis and evidence-based verification. It includes a diagnostic script (`scripts/collect-diagnostics.sh`) designed to gather environment metadata such as OS version, runtime environments, and Git status, while explicitly limiting environment variable collection to a non-sensitive subset. The instructions in `SKILL.md` further promote security best practices by directing the agent to redact sensitive information like hostnames and customer data before performing external searches.

能力评估

✓ Purpose & Capability

Name, description, references, and the included collect-diagnostics.sh script align with a debugging/root-cause-analysis discipline. The files and instructions are proportional to the stated intent (reproduce, gather environment, trace, report).

⚠ Instruction Scope

SKILL.md explicitly instructs running bash collect-diagnostics.sh and attaching output to bug reports. The script prints machine-specific data (PWD, whoami), git remote URL, and other environment/system state. Asking the agent/user to attach that output risks leaking sensitive/proprietary info; the skill advises redaction for external searches but does not require or automate redaction of the diagnostic output before sharing.

✓ Install Mechanism

No install spec or external downloads; the skill is instruction-only with a small local bash script. There is no network fetch or archive extraction performed by the skill bundle itself.

⚠ Credentials

The skill declares no required credentials or env vars and only reads a small 'safe subset' of environment variables in the script. However, the script also emits PWD, current user, and git remote URL (which can reveal private repo endpoints or organization details). Those outputs are sensitive relative to simply debugging advice and should be redacted or made optional.

✓ Persistence & Privilege

The skill does not request persistent presence (always:false), does not modify other skills or system configurations, and contains no install-time actions.

版本历史

v3.0.4

v3.0.3

v3.0.2

v3.0.1

v3.0.0

v2.56.1

v2.56.0

v2.55.1

v2.55.0

v2.53.2

v2.53.0

元数据

Slug compound-eng-debugging

版本 3.0.4

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 11

常见问题

ia-debugging 是什么？

Systematic root-cause debugging with verification. Use for errors, stack traces, broken tests, flaky tests, regressions, or anything not working as expected.... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 318 次。

如何安装 ia-debugging？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install compound-eng-debugging」即可一键安装，无需额外配置。

ia-debugging 是免费的吗？

是的，ia-debugging 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

ia-debugging 支持哪些平台？

ia-debugging 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 ia-debugging？

由 Ilia Alshanetsky（@iliaal）开发并维护，当前版本 v3.0.4。

ia-debugging