← 返回 Skills 市场

gstack Diagnose

Name: gstack Diagnose
Author: ilmych

作者 ilmych · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gstack-openclaw-diagnose

功能描述

Structured diagnosis for hard bugs and performance regressions. Builds a deterministic feedback loop FIRST, then reproduces, hypothesises (3-5 ranked), instr...

使用说明 (SKILL.md)

Diagnose

A discipline for hard bugs. Skip phases only when explicitly justified.

Core insight: If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause. Everything else is mechanical. If you don't have one, no amount of staring at code will save you.

Phase 1 — Build a feedback loop

This is the skill. Spend disproportionate effort here.

Construction strategies — try roughly in this order

Failing test at whatever seam reaches the bug — unit, integration, e2e.
Curl / HTTP script against a running dev server.
CLI invocation with a fixture input, diffing stdout against a known-good snapshot.
Headless browser script (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network.
Replay a captured trace. Save a real request/payload/event log to disk; replay through the code path in isolation.
Throwaway harness. Spin up a minimal subset of the system (one service, mocked deps) exercising the bug path with a single function call.
Property / fuzz loop. "Sometimes wrong output" → run 1000 random inputs and look for the failure mode.
Bisection harness. Bug appeared between two known states → automate "boot at state X, check, repeat" so you can git bisect run it.
Differential loop. Same input through old-version vs new-version (or two configs), diff outputs.
HITL script. Last resort. If a human must click, drive them with a structured bash script so the loop is still reproducible. Captured output feeds back to you.

Iterate on the loop itself

Treat the loop as a product:

Faster? Cache setup, skip unrelated init, narrow scope.
Sharper signal? Assert on the specific symptom, not "didn't crash."
More deterministic? Pin time, seed RNG, isolate filesystem, freeze network.

A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.

Non-deterministic bugs

Goal: raise reproduction rate. Loop 100x, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake is debuggable; 1% is not.

When you genuinely cannot build a loop

Stop and say so. List what you tried. Ask the user for: (a) access to the reproducing environment, (b) a captured artifact (HAR file, log dump, core dump, screen recording), or (c) permission to add temporary production instrumentation. Do NOT proceed to hypothesise without a loop.

Phase 2 — Reproduce

Run the loop. Watch the bug appear. Confirm:

The failure matches what the user described — not a nearby different failure.
Reproducible across multiple runs (or high enough rate for non-deterministic bugs).
Exact symptom captured (error message, wrong output, timing) for later verification.

Phase 3 — Hypothesise

Generate 3-5 ranked hypotheses before testing any. Single-hypothesis generation anchors on the first plausible idea.

Each hypothesis must be falsifiable:

"If \x3CX> is the cause, then \x3Cchanging Y> will make it disappear / \x3Cchanging Z> will make it worse."

If you can't state the prediction, the hypothesis is a vibe — discard or sharpen it.

Show the ranked list to the user before testing. They often re-rank instantly with domain knowledge. Don't block — proceed with your ranking if AFK.

Phase 4 — Instrument

Each probe maps to a specific prediction from Phase 3. One variable at a time.

Tool preference:

Debugger / REPL if available. One breakpoint beats ten logs.
Targeted logs at boundaries that distinguish hypotheses.
Never "log everything and grep."

Tag every debug log with a unique prefix: [DEBUG-a4f2]. Cleanup = single grep. Untagged logs survive; tagged logs die.

Performance bugs: logs are usually wrong. Establish a baseline measurement (timing harness, profiler, query plan), then bisect. Measure first, fix second.

Phase 5 — Fix + regression test

Write the regression test before the fix — but only if there's a correct seam.

A correct seam exercises the real bug pattern as it occurs at the call site. If the only seam is too shallow, a regression test there gives false confidence.

If no correct seam exists, that itself is the finding — note it.

Turn the minimised repro into a failing test at the seam.
Watch it fail.
Apply the fix.
Watch it pass.
Re-run the Phase 1 loop against the original scenario.

Phase 6 — Cleanup + post-mortem

Before declaring done:

Original repro no longer reproduces (re-run Phase 1 loop)
Regression test passes (or absence of seam is documented)
All [DEBUG-...] instrumentation removed (grep the prefix)
Throwaway harnesses deleted
Root cause stated in the commit/PR message

Then ask: what would have prevented this bug? If the answer involves architectural change, note it for the user — don't bundle it into this fix.

Completion Status

DONE — root cause found, fix applied, regression test written, all tests pass
DONE_WITH_CONCERNS — fixed but cannot fully verify (intermittent, needs staging)
BLOCKED — root cause unclear after investigation, escalated

安全使用建议

This skill appears safe to install if you want a disciplined debugging workflow. It may prompt your agent to create tests, run commands, add temporary instrumentation, or build throwaway harnesses while diagnosing a bug, so review proposed changes and make sure any production instrumentation or environment access is explicitly approved.

能力评估

✓ Purpose & Capability

The stated purpose is structured diagnosis for hard bugs and performance regressions, and the artifact content consistently provides a debugging workflow: build a deterministic feedback loop, reproduce, hypothesize, instrument, fix, test, and clean up.

✓ Instruction Scope

The instructions are scoped to user-directed debugging work. They may involve tests, CLI runs, browser scripts, temporary harnesses, and instrumentation, but these are coherent with diagnosis and include cleanup and permission language for production instrumentation.

✓ Install Mechanism

The package contains only SKILL.md as markdown; metadata reports no executable scripts, no declared dependencies, and clean dependency/static scans.

✓ Credentials

The skill can lead an agent to inspect and modify a user's project during debugging, but that authority is expected for the purpose and is bounded by reproduction, verification, and cleanup steps.

✓ Persistence & Privilege

No artifact evidence shows background workers, persistence, credential access, privilege escalation, exfiltration, or automatic execution outside the user's debugging task.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gstack-openclaw-diagnose
安装完成后，直接呼叫该 Skill 的名称或使用 /gstack-openclaw-diagnose 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release: structured diagnosis for hard bugs with 10-strategy feedback loop taxonomy, 6-phase methodology (loop → reproduce → hypothesise → instrument → fix → cleanup)

元数据

Slug gstack-openclaw-diagnose

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

gstack Diagnose 是什么？

Structured diagnosis for hard bugs and performance regressions. Builds a deterministic feedback loop FIRST, then reproduces, hypothesises (3-5 ranked), instr... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 46 次。

如何安装 gstack Diagnose？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gstack-openclaw-diagnose」即可一键安装，无需额外配置。

gstack Diagnose 是免费的吗？

是的，gstack Diagnose 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

gstack Diagnose 支持哪些平台？

gstack Diagnose 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 gstack Diagnose？

由 ilmych（@ilmych）开发并维护，当前版本 v1.0.0。