← 返回 Skills 市场
abczsl520

Debug Methodology

作者 abczsl520 · GitHub ↗ · v1.2.0
cross-platform ✓ 安全检测通过
1026
总下载
0
收藏
5
当前安装
4
版本数
在 OpenClaw 中安装
/install debug-methodology
功能描述
Systematic debugging and problem-solving methodology. Activate when encountering unexpected errors, service failures, regression bugs, deployment issues, or...
使用说明 (SKILL.md)

Debug Methodology

Systematic approach to debugging and problem-solving. Distilled from real production incidents and industry best practices.

⚠️ The Root Cause Imperative

Every fix MUST target the root cause. Workarounds are forbidden unless explicitly approved.

Before proposing ANY solution, pass the Root Cause Gate:

┌─────────────────────────────────────────────┐
│            ROOT CAUSE GATE                  │
│                                             │
│  1. What is the ACTUAL problem?             │
│  2. WHY does it happen? (not just WHAT)     │
│  3. Does my fix eliminate the WHY?           │
│     YES → proceed                           │
│     NO  → this is a workaround → STOP       │
│                                             │
│  Workaround test:                           │
│  "If I remove my fix, does the bug return?" │
│     YES → workaround (fix the cause instead)│
│     NO  → genuine fix ✅                    │
└─────────────────────────────────────────────┘

The 5 Whys — Mandatory for Non-Obvious Problems

Problem: API returns 524 timeout
  Why? → Cloudflare cuts connections >100s
  Why? → The API call takes >100s
  Why? → Using non-streaming request, server holds connection silent
  Why? → Code uses regular fetch, not streaming
  Fix: → Use streaming (server sends data continuously, Cloudflare won't cut)

  ❌ WRONG: Switch to faster model (workaround — avoids the timeout instead of fixing it)
  ✅ RIGHT: Use streaming API (root cause — Cloudflare needs ongoing data)

Common Workaround Traps

Problem Workaround (❌) Root Cause Fix (✅)
API timeout Switch to faster model Use streaming / fix the slow query
Data precision loss Search by name instead of ID Fix BigInt parsing
Search returns nothing Try different search strategy Fix the search implementation
Dependency conflict Downgrade / pin version Use correct environment (venv)
Feature doesn't work Remove the feature Debug why it fails

Self-check question: "Am I solving the problem, or avoiding it?"

Phase 1: STOP — Assess Before Acting

Before ANY fix attempt:

□ What is the EXACT symptom? (error message, behavior, screenshot)
□ When did it last work? What changed since then?
□ How is the service running? (process, env, startup command)

For running services:

ps -p \x3CPID> -o command=        # How was it started?
ls .venv/ venv/ env/           # Virtual environment?
which python3 && python3 --version
which node && node --version

NEVER restart a service without first recording its original startup command.

Phase 2: Hypothesize — Form ONE Theory

Priority order:

  1. Did I change something? → diff/revert first
  2. Did the environment change? → versions, deps, configs
  3. Did external inputs change? → API responses, data formats
  4. Genuine new bug? → only after ruling out 1-3

Phase 3: Test — One Change at a Time

Change X → Test → Works? → Done
                → Fails? → REVERT X → new hypothesis

Do NOT stack changes.

Phase 4: Patch-Chain Detection

2 fix attempts failed → STOP. Revert ALL. Back to Phase 1.

You are likely:

  • Fixing symptoms of a wrong fix
  • In the wrong environment entirely
  • Misunderstanding the architecture

Phase 5: Post-Fix Verification

After any fix, verify:

□ Does it solve the ORIGINAL problem? (not just silence the error)
□ Did I introduce new issues? (regression check)
□ Would removing my fix bring the bug back? (confirms causality)
□ Is the fix in the right layer? (not patching symptoms upstream)

Anti-Patterns

🚨 Workaround Addiction (NEW — Most Common!)

Bypassing the problem instead of fixing it. "It's slower but works" / "Use a different approach". → Ask: "Am I solving or avoiding?" If avoiding → find the real fix. → Workarounds are ONLY acceptable when: (1) explicitly approved by user, (2) clearly labeled as temporary, (3) a TODO is created for the real fix.

🚨 Drunk Man Anti-Pattern

Randomly changing things until the problem disappears. → Each change needs a hypothesis.

🚨 Streetlight Anti-Pattern

Looking where comfortable, not where the problem is. → "Is this where the bug IS, or where I KNOW HOW TO LOOK?"

🚨 Cargo Cult Fix

Copying a fix without understanding why it works. → Understand the mechanism first.

🚨 Ignoring the User

User says "it broke after you changed X" → immediately diff X. → User observations are the most valuable data.

Environment Checklist

□ Runtime: system or venv/nvm?
□ Dependencies: match expected versions?
□ Config: .env, config.json — recent changes?
□ Process manager: PM2/systemd — restart method?
□ Logs: tail -f before reproducing
□ Backup: snapshot before any change

Deployment Safety (Hardened SCP Flow)

Iron Rule: NEVER edit files directly on the server. NEVER overwrite server files without backup.

Standard deployment (every time, no exceptions):

1. PULL    scp server:/opt/apps/项目/ ./local-项目/
           (pull the files you need + related files)

2. EDIT    Make changes locally
           (complex multi-line → write full file, never sed)

3. VERIFY  node -c *.js                    # syntax check
           node -e "require('./file')"     # module load check
           (STOP if verification fails — do not proceed)

4. BACKUP  ssh server "cp file file.bak.$(date +%s)"

5. PUSH    scp ./local-file server:/opt/apps/项目/file

6. RESTART pm2 restart \x3Capp>
           (use SAME method as original — check ps/pm2 show first)

7. HEALTH  curl -s http://localhost:\x3Cport>/health
           pm2 logs \x3Capp> --lines 5 --nostream
           (if unhealthy → revert backup immediately)

Pull Scope Rules

Changing 1 file    → pull that file + its imports/importers
Changing routes    → also pull server.js (check mount points)
Changing frontend  → also pull index.html (check script tags)
Changing config    → also pull code that reads the config
Unsure what to pull → pull the whole project directory

What NOT to Do

❌ sed -i for multi-line code on server
❌ Skip node -c after editing .js
❌ pm2 restart before syntax verification
❌ Tell user to refresh before health check passes
❌ Push without backup

🚨 Server Code Modification Rules

Every code change on a server MUST be syntax-verified before restart/reload.

After editing .js files:
  □ node -c \x3Cfile>                          # Syntax check
  □ node -e "require('./\x3Cfile>')"           # Module load check (for route files)
  □ FAIL → DO NOT restart. DO NOT tell user to refresh. Fix first.

After editing .html files:
  □ Check critical tag closure (div/script/style)
  □ grep -c '\x3Cdiv' file && grep -c '\x3C/div' file   # Count match

Complex multi-line changes:
  □ Write complete file locally → scp upload
  □ NEVER use sed for multi-line code insertion (newlines get swallowed)
  □ If sed is unavoidable → verify with node -c immediately after

Restart sequence:
  □ node -c *.js passes → pm2 restart \x3Capp>
  □ Check pm2 logs --lines 5 for startup errors
  □ curl health endpoint to confirm service is up

Why: sed -i multi-line insertion silently corrupts JS (newlines become single line), causing syntax errors that break the entire page with no visible error to the user.

Decision Tree

Problem appears
  ├─ I just edited something? → DIFF → REVERT if suspect
  ├─ Service won't start? → CHECK startup command + env
  ├─ New error after fix? → STOP (patch chain!) → Revert → Phase 1
  ├─ User reports regression? → DIFF before/after
  ├─ Tempted to work around? → ROOT CAUSE GATE → fix the real issue
  └─ Intermittent? → CHECK logs + external deps + timing
安全使用建议
This skill is conceptually coherent and appears to be a safe, instruction-only methodology. Things to consider before enabling it: (1) SKILL.md expects the agent (or operator) to run shell/admin commands (ps, scp, cp, pm2 restart, etc.); ensure you understand whether your agent is allowed to execute such commands automatically — if you want to avoid accidental destructive actions, require manual approval or disable autonomous invocation. (2) The deployment steps include file transfer and restarts: follow the skill's own advice about backups and verification before pushing changes. (3) The SKILL.md is truncated in the provided snippet—review the full file for any additional commands or endpoints not shown. If you want minimal risk, use this as a human-facing checklist rather than granting the agent permission to execute the suggested commands automatically.
功能分析
Type: OpenClaw Skill Name: debug-methodology Version: 1.2.0 The 'debug-methodology' skill bundle provides a systematic framework for AI agents to perform root-cause analysis and safe deployments. It includes defensive practices such as mandatory syntax verification (node -c), automated backups before file modification, and environment checks using standard system commands (ps, which, ls). The instructions in SKILL.md are designed to improve the reliability and security of the agent's actions rather than exploit them, and no indicators of data exfiltration or malicious intent were found.
能力评估
Purpose & Capability
Name/description (systematic debugging) match the content: the SKILL.md and README present a step-by-step debugging and deployment checklist. There are no unrelated env vars, binaries, or opaque installs requested.
Instruction Scope
The instructions ask the agent/operator to inspect local process state, environment, venvs, logs, and to use standard admin commands (ps, ls, which, scp, cp, pm2 restart). Those actions are appropriate and expected for a debugging/deployment methodology. Nothing in SKILL.md instructs the agent to collect or transmit unrelated secrets or to phone-home to unexpected endpoints.
Install Mechanism
No install spec and no code files — instruction-only. This minimizes disk-/network-based install risk.
Credentials
The skill requires no environment variables, credentials, or config paths. The runtime instructions reference local system state and standard tools only, which is proportionate to the debugging purpose.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request permanent presence or attempt to modify other skills or system-wide agent configs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install debug-methodology
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /debug-methodology 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.2.0
Added AI Dev Quality Suite cross-references and install command
v1.1.0
Hardened SCP deployment flow: 7-step mandatory process, pull scope rules, explicit what-not-to-do list
v1.0.1
Strengthen deployment safety: never edit directly on server, explicit 7-step flow
v1.0.0
v3: Root Cause Gate, 5 Whys, workaround detection, server code syntax verification, 5 anti-patterns, deployment safety
元数据
Slug debug-methodology
版本 1.2.0
许可证
累计安装 5
当前安装数 5
历史版本数 4
常见问题

Debug Methodology 是什么?

Systematic debugging and problem-solving methodology. Activate when encountering unexpected errors, service failures, regression bugs, deployment issues, or... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1026 次。

如何安装 Debug Methodology?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install debug-methodology」即可一键安装,无需额外配置。

Debug Methodology 是免费的吗?

是的,Debug Methodology 完全免费(开源免费),可自由下载、安装和使用。

Debug Methodology 支持哪些平台?

Debug Methodology 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Debug Methodology?

由 abczsl520(@abczsl520)开发并维护,当前版本 v1.2.0。

💬 留言讨论