功能描述

Systematic debugging and problem-solving methodology. Activate when encountering unexpected errors, service failures, regression bugs, deployment issues, or...

使用说明 (SKILL.md)

Debug Methodology

Name: Debug Methodology
Author: abczsl520

Systematic approach to debugging and problem-solving. Distilled from real production incidents and industry best practices.

⚠️ The Root Cause Imperative

Every fix MUST target the root cause. Workarounds are forbidden unless explicitly approved.

Before proposing ANY solution, pass the Root Cause Gate:

┌─────────────────────────────────────────────┐
│            ROOT CAUSE GATE                  │
│                                             │
│  1. What is the ACTUAL problem?             │
│  2. WHY does it happen? (not just WHAT)     │
│  3. Does my fix eliminate the WHY?           │
│     YES → proceed                           │
│     NO  → this is a workaround → STOP       │
│                                             │
│  Workaround test:                           │
│  "If I remove my fix, does the bug return?" │
│     YES → workaround (fix the cause instead)│
│     NO  → genuine fix ✅                    │
└─────────────────────────────────────────────┘

The 5 Whys — Mandatory for Non-Obvious Problems

Problem: API returns 524 timeout
  Why? → Cloudflare cuts connections >100s
  Why? → The API call takes >100s
  Why? → Using non-streaming request, server holds connection silent
  Why? → Code uses regular fetch, not streaming
  Fix: → Use streaming (server sends data continuously, Cloudflare won't cut)

  ❌ WRONG: Switch to faster model (workaround — avoids the timeout instead of fixing it)
  ✅ RIGHT: Use streaming API (root cause — Cloudflare needs ongoing data)

Common Workaround Traps

Problem	Workaround (❌)	Root Cause Fix (✅)
API timeout	Switch to faster model	Use streaming / fix the slow query
Data precision loss	Search by name instead of ID	Fix BigInt parsing
Search returns nothing	Try different search strategy	Fix the search implementation
Dependency conflict	Downgrade / pin version	Use correct environment (venv)
Feature doesn't work	Remove the feature	Debug why it fails

Self-check question: "Am I solving the problem, or avoiding it?"

Phase 1: STOP — Assess Before Acting

Before ANY fix attempt:

□ What is the EXACT symptom? (error message, behavior, screenshot)
□ When did it last work? What changed since then?
□ How is the service running? (process, env, startup command)

For running services:

ps -p \x3CPID> -o command=        # How was it started?
ls .venv/ venv/ env/           # Virtual environment?
which python3 && python3 --version
which node && node --version

NEVER restart a service without first recording its original startup command.

Phase 2: Hypothesize — Form ONE Theory

Priority order:

Did I change something? → diff/revert first
Did the environment change? → versions, deps, configs
Did external inputs change? → API responses, data formats
Genuine new bug? → only after ruling out 1-3

Phase 3: Test — One Change at a Time

Change X → Test → Works? → Done
                → Fails? → REVERT X → new hypothesis

Do NOT stack changes.

Phase 4: Patch-Chain Detection

2 fix attempts failed → STOP. Revert ALL. Back to Phase 1.

You are likely:

Fixing symptoms of a wrong fix
In the wrong environment entirely
Misunderstanding the architecture

Phase 5: Post-Fix Verification

After any fix, verify:

□ Does it solve the ORIGINAL problem? (not just silence the error)
□ Did I introduce new issues? (regression check)
□ Would removing my fix bring the bug back? (confirms causality)
□ Is the fix in the right layer? (not patching symptoms upstream)

Anti-Patterns

🚨 Workaround Addiction (NEW — Most Common!)

Bypassing the problem instead of fixing it. "It's slower but works" / "Use a different approach". → Ask: "Am I solving or avoiding?" If avoiding → find the real fix. → Workarounds are ONLY acceptable when: (1) explicitly approved by user, (2) clearly labeled as temporary, (3) a TODO is created for the real fix.

🚨 Drunk Man Anti-Pattern

Randomly changing things until the problem disappears. → Each change needs a hypothesis.

🚨 Streetlight Anti-Pattern

Looking where comfortable, not where the problem is. → "Is this where the bug IS, or where I KNOW HOW TO LOOK?"

🚨 Cargo Cult Fix

Copying a fix without understanding why it works. → Understand the mechanism first.

🚨 Ignoring the User

User says "it broke after you changed X" → immediately diff X. → User observations are the most valuable data.

Environment Checklist

□ Runtime: system or venv/nvm?
□ Dependencies: match expected versions?
□ Config: .env, config.json — recent changes?
□ Process manager: PM2/systemd — restart method?
□ Logs: tail -f before reproducing
□ Backup: snapshot before any change

Deployment Safety (Hardened SCP Flow)

Iron Rule: NEVER edit files directly on the server. NEVER overwrite server files without backup.

Standard deployment (every time, no exceptions):

1. PULL    scp server:/opt/apps/项目/ ./local-项目/
           (pull the files you need + related files)

2. EDIT    Make changes locally
           (complex multi-line → write full file, never sed)

3. VERIFY  node -c *.js                    # syntax check
           node -e "require('./file')"     # module load check
           (STOP if verification fails — do not proceed)

4. BACKUP  ssh server "cp file file.bak.$(date +%s)"

5. PUSH    scp ./local-file server:/opt/apps/项目/file

6. RESTART pm2 restart \x3Capp>
           (use SAME method as original — check ps/pm2 show first)

7. HEALTH  curl -s http://localhost:\x3Cport>/health
           pm2 logs \x3Capp> --lines 5 --nostream
           (if unhealthy → revert backup immediately)

Pull Scope Rules

Changing 1 file    → pull that file + its imports/importers
Changing routes    → also pull server.js (check mount points)
Changing frontend  → also pull index.html (check script tags)
Changing config    → also pull code that reads the config
Unsure what to pull → pull the whole project directory

What NOT to Do

❌ sed -i for multi-line code on server
❌ Skip node -c after editing .js
❌ pm2 restart before syntax verification
❌ Tell user to refresh before health check passes
❌ Push without backup

🚨 Server Code Modification Rules

Every code change on a server MUST be syntax-verified before restart/reload.

After editing .js files:
  □ node -c \x3Cfile>                          # Syntax check
  □ node -e "require('./\x3Cfile>')"           # Module load check (for route files)
  □ FAIL → DO NOT restart. DO NOT tell user to refresh. Fix first.

After editing .html files:
  □ Check critical tag closure (div/script/style)
  □ grep -c '\x3Cdiv' file && grep -c '\x3C/div' file   # Count match

Complex multi-line changes:
  □ Write complete file locally → scp upload
  □ NEVER use sed for multi-line code insertion (newlines get swallowed)
  □ If sed is unavoidable → verify with node -c immediately after

Restart sequence:
  □ node -c *.js passes → pm2 restart \x3Capp>
  □ Check pm2 logs --lines 5 for startup errors
  □ curl health endpoint to confirm service is up

Why: sed -i multi-line insertion silently corrupts JS (newlines become single line), causing syntax errors that break the entire page with no visible error to the user.

Decision Tree

Problem appears
  ├─ I just edited something? → DIFF → REVERT if suspect
  ├─ Service won't start? → CHECK startup command + env
  ├─ New error after fix? → STOP (patch chain!) → Revert → Phase 1
  ├─ User reports regression? → DIFF before/after
  ├─ Tempted to work around? → ROOT CAUSE GATE → fix the real issue
  └─ Intermittent? → CHECK logs + external deps + timing

安全使用建议

This skill is conceptually coherent and appears to be a safe, instruction-only methodology. Things to consider before enabling it: (1) SKILL.md expects the agent (or operator) to run shell/admin commands (ps, scp, cp, pm2 restart, etc.); ensure you understand whether your agent is allowed to execute such commands automatically — if you want to avoid accidental destructive actions, require manual approval or disable autonomous invocation. (2) The deployment steps include file transfer and restarts: follow the skill's own advice about backups and verification before pushing changes. (3) The SKILL.md is truncated in the provided snippet—review the full file for any additional commands or endpoints not shown. If you want minimal risk, use this as a human-facing checklist rather than granting the agent permission to execute the suggested commands automatically.

功能分析

Type: OpenClaw Skill Name: debug-methodology Version: 1.2.0 The 'debug-methodology' skill bundle provides a systematic framework for AI agents to perform root-cause analysis and safe deployments. It includes defensive practices such as mandatory syntax verification (node -c), automated backups before file modification, and environment checks using standard system commands (ps, which, ls). The instructions in SKILL.md are designed to improve the reliability and security of the agent's actions rather than exploit them, and no indicators of data exfiltration or malicious intent were found.

能力评估

✓ Purpose & Capability

Name/description (systematic debugging) match the content: the SKILL.md and README present a step-by-step debugging and deployment checklist. There are no unrelated env vars, binaries, or opaque installs requested.

✓ Instruction Scope

The instructions ask the agent/operator to inspect local process state, environment, venvs, logs, and to use standard admin commands (ps, ls, which, scp, cp, pm2 restart). Those actions are appropriate and expected for a debugging/deployment methodology. Nothing in SKILL.md instructs the agent to collect or transmit unrelated secrets or to phone-home to unexpected endpoints.

✓ Install Mechanism

No install spec and no code files — instruction-only. This minimizes disk-/network-based install risk.

✓ Credentials

The skill requires no environment variables, credentials, or config paths. The runtime instructions reference local system state and standard tools only, which is proportionate to the debugging purpose.

✓ Persistence & Privilege

always is false and the skill is user-invocable. It does not request permanent presence or attempt to modify other skills or system-wide agent configs.

版本历史

v1.2.0

Added AI Dev Quality Suite cross-references and install command

v1.1.0

Hardened SCP deployment flow: 7-step mandatory process, pull scope rules, explicit what-not-to-do list

v1.0.1

Strengthen deployment safety: never edit directly on server, explicit 7-step flow

v1.0.0

v3: Root Cause Gate, 5 Whys, workaround detection, server code syntax verification, 5 anti-patterns, deployment safety

元数据

Slug debug-methodology

版本 1.2.0

许可证 —

累计安装 5

当前安装数 5

历史版本数 4

常见问题