← 返回 Skills 市场
anderskev

Gen Test Plan

作者 Kevin Anderson · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
119
总下载
0
收藏
1
当前安装
2
版本数
在 OpenClaw 中安装
/install gen-test-plan
功能描述
Analyze repo, detect stack, trace changes to user-facing entry points, generate E2E YAML test plan
使用说明 (SKILL.md)

Generate Test Plan

Analyze the repository's tech stack, branch changes vs default, and generate an executable YAML test plan focused on user-facing impact.

This is an E2E test plan — not an automated test wrapper. The generated plan will be executed by an autonomous agent acting exactly as a human QA tester would: launching real binaries, hitting real endpoints, interacting with real databases, and verifying real observable behavior.

Critical Rule: No Automated Test Duplication

NEVER generate test steps that re-run the project's existing automated test suite. This means:

  • No cargo test, pytest, npm test, go test, mix test, or equivalent commands as test steps
  • No wrapping unit/integration test modules in a test case
  • No "run the tests and check they pass" — that's CI's job, not QA's

If you find yourself writing a test step that invokes the project's test runner, stop and rethink. Ask: "What would a human tester do to verify this feature works?" The answer is never "run the unit tests."

What E2E test steps look like:

  • Build the binary and run it with real arguments, check stdout/stderr/exit code
  • Start a server and hit it with curl
  • Run a CLI command that writes to a real database, then query the database to verify
  • Launch the TUI and verify it renders (via screenshot or process lifecycle)
  • Chain multiple commands that exercise a full user workflow end-to-end

Hard gates

Complete these in order. Do not advance to the next gate until its Pass condition is met (each pass should leave retrievable evidence: pasted command output, a written list, or the generated file on disk). Scheduling: Gate 1 before Step 2; Gate 2 before Step 5; Gate 3 before Step 7; Gates 4–5 during Step 8 (after the Step 7 summary).

  1. Diff and base pinned (after Step 1) — Resolve the base branch from --base when provided, otherwise use the repo default (main or master per Step 1). Compare HEAD to $(git merge-base HEAD origin/\x3Cbase_branch>) (or equivalent if the remote ref differs). Pass: You record current_branch, base_branch, the merge-base SHA or range used, and changed_files from git diff --name-only \x3Cmerge-base>..HEAD (empty list allowed if you paste or quote that output and state “no file changes vs base”).

  2. Trace complete (after Step 4)Pass: Every affected entry point you will test has a Core functionality vs Configuration/admin classification, and the Step 4 requirement holds: at least one test targets a core entry point or you document why that is impossible and flag manual review.

  3. Plan file valid (after Step 6, before Step 7)Pass: docs/testing/test-plan.yaml exists and the following command exits 0 (parses the YAML and asserts all four top-level keys are present — a single grep -E with alternations would pass on any one match, so do not substitute it):

    python3 -c "import sys, yaml; d = yaml.safe_load(open('docs/testing/test-plan.yaml')) or {}; missing = [k for k in ('version', 'metadata', 'setup', 'tests') if k not in d]; sys.exit('Missing keys: ' + ', '.join(missing) if missing else 0)"
    
  4. No automated-test duplication (Step 8)Pass: Every run: step and every services: command: is scanned for project test runners (cargo test, pytest, npm test, go test, mix test, jest, vitest, mocha, etc.); zero invocations. If any appear, remove or replace them with real E2E actions and re-run Gate 3.

  5. Behavioral coverage (Step 8)Pass: Re-read metadata.changes_summary and recent commit messages; at least one test’s context/steps exercises the primary user-visible behavior they describe. If they describe a capability (e.g., a new provider) but no step invokes it, add that test or fail verification.

Arguments

  • --base \x3Cbranch>: Base branch to diff against (default: main)
  • Path: Target directory (default: current working directory)

Step 1: Gather Repository Context

# Get current branch
git rev-parse --abbrev-ref HEAD

# Resolve base branch: use --base if supplied, otherwise default (main → master)
BASE_BRANCH="${BASE_BRANCH:-$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo main || echo master)}"
MERGE_BASE="$(git merge-base HEAD "origin/${BASE_BRANCH}")"

# Get changed files vs base
git diff --name-only "${MERGE_BASE}"..HEAD

# Get commit messages for context
git log --oneline "${MERGE_BASE}"..HEAD

Capture:

  • current_branch: Branch name
  • base_branch: Default branch to compare against
  • changed_files: List of modified files
  • commit_messages: What the PR is about

Step 2: Detect Tech Stack

See references/stack-discovery.md for stack detection commands, entrypoint discovery, port discovery, and trace rules.

Step 3: Discover User-Facing Entry Points

A "user-facing entry point" is anything a human interacts with: CLI subcommands, HTTP endpoints, UI routes, TUI screens, gRPC services, database migrations, or configuration files that affect runtime behavior.

CLI Applications (Rust/clap, Python/argparse/click, Go/cobra)

# Rust (clap) — look for Subcommand derives and command enums
grep -rn "Subcommand\|#\[command\]" --include="*.rs" | head -20

# Python (click/typer/argparse)
grep -rn "@click.command\|@app.command\|add_parser\|add_subparser" --include="*.py" | head -20

# Go (cobra)
grep -rn "cobra.Command\|AddCommand" --include="*.go" | head -20

Build a map of:

  • CLI subcommands: command name + description + file:line
  • Required arguments and flags per subcommand
  • Environment variables the binary reads (grep for env, std::env::var, os.Getenv, os.environ)

HTTP/API Services

Python (FastAPI/Flask):

grep -rn "@app\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20
grep -rn "@router\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20

Node.js (Express/Fastify):

grep -rn "app\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20
grep -rn "router\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20

Rust (axum/actix/rocket):

grep -rn "Router::new\|\.route(\|#\[get\]\|#\[post\]\|HttpServer" --include="*.rs" | head -20

Go (net/http, gin, chi):

grep -rn "http.HandleFunc\|r.GET\|r.POST\|router.Get\|router.Post" --include="*.go" | head -20

Elixir (Phoenix):

grep -rn "get \"/\|post \"/\|pipe_through\|live \"/\|scope \"/\"" --include="*.ex" | head -20

Browser UI Routes

grep -rn "createBrowserRouter\|\x3CRoute\|path=" --include="*.tsx" --include="*.jsx" | head -20

Database and Migrations

# SQL migrations
ls migrations/ db/migrate/ priv/repo/migrations/ 2>/dev/null
# Schema files
ls schema.sql schema.prisma 2>/dev/null

Build a consolidated map of:

  • CLI subcommands: name + args + file:line
  • API endpoints: method + path + file:line
  • UI routes: path + component + file:line
  • Database migrations: filename + what they create/alter
  • Configuration: env vars and config files that affect behavior

Step 4: Trace Changes to Entry Points

For each changed file, determine if it affects user-facing functionality:

  1. Direct entry point change — File contains route definitions
  2. Import chain analysis — Find what imports the changed file and trace up to entry points
  3. Architecture-aware tracing — Read the project's CLAUDE.md, README, or architecture docs to understand data flow and module relationships, rather than relying solely on grep
  4. Document the trace path in test context

Import Chain Analysis by Ecosystem

# Rust — use/mod/crate references and workspace deps
grep -rn "use.*\x3Ccrate>\|mod \x3Cmodule>" --include="*.rs"
grep -rn "\x3Ccrate-name>" --include="Cargo.toml"

# Python — from/import
grep -rn "from.*\x3Cmodule>\|import.*\x3Cmodule>" --include="*.py"

# TypeScript/JavaScript — import/require
grep -rn "from.*\x3Cmodule>\|require.*\x3Cmodule>" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx"

# Elixir — alias/import/use
grep -rn "alias.*\x3CModule>\|import.*\x3CModule>\|use.*\x3CModule>" --include="*.ex" --include="*.exs"

# Go — package references
grep -rn "\x3Cpackage>\." --include="*.go"

If the ecosystem is not covered above, or grep results are inconclusive, read the project's CLAUDE.md, README, or architecture docs to understand the module graph and trace the data flow from changed files to user-facing entry points.

Classify Affected Entry Points

After identifying all affected entry points, classify each one:

Category Description Examples Priority
Core functionality Entry points where the feature does its actual work for the end user Chat endpoint, API action, data processing pipeline, generation flow High — test first
Configuration/admin Entry points where the feature is set up, toggled, or configured Settings page, admin dashboard, preference toggles, dropdown selections Lower — test after core

Classification rules:

  • Ask: "If a user wanted to use this feature (not configure it), which entry point would they interact with?" — that's core functionality
  • A settings page that adds a new dropdown option is configuration; the endpoint that actually uses that option is core functionality
  • The same changed file (e.g., a new provider module) may affect both a settings page and a functional endpoint — both must be traced

Requirement: At least one test must target a core functionality entry point before generating configuration/admin tests. If no core functionality entry point can be identified, explicitly document why and flag this for manual review.

Output: For each affected entry point, document:

  • Which changed files affect it
  • The import/dependency chain
  • Classification: Core functionality or Configuration/admin
  • Why this entry point needs testing

Step 5: Generate Test Cases

See references/test-case-generation.md for the detailed API/browser templates, prioritization rules, and test-case guidelines.

Step 6: Write YAML Test Plan

Create the test plan file:

mkdir -p docs/testing

Write to docs/testing/test-plan.yaml:

version: 1
metadata:
  branch: \x3Ccurrent_branch>
  base: \x3Cbase_branch>
  generated: \x3CISO timestamp>
  changes_summary: |
    \x3CSummary of what this PR changes based on commit messages and diff>

setup:
  stack:
    - type: \x3Crust|node|python|go|elixir|docker>
      package_manager: \x3Ccargo|pnpm|npm|yarn|uv|poetry|mix|none>
  prerequisites:
    # Services or infrastructure the tests need running
    - name: \x3Ce.g., PostgreSQL>
      check: \x3Ccommand to verify it's available, e.g., "pg_isready -h localhost">
  build:
    # Commands to build the project artifacts (binaries, assets, etc.)
    - \x3Cbuild command, e.g., "cargo build --workspace">
  services:
    # Long-running processes to start before tests (servers, watchers, etc.)
    # Omit if the project is a CLI tool or library with no server component
    - command: \x3Cstart command>
      health_check:
        url: http://localhost:\x3Cport>/health
        timeout: 30
  env:
    # Environment variables needed by tests (use ${VAR} for secrets)
    DATABASE_URL: "${DATABASE_URL}"

tests:
  # CLI test example — run the built binary with real arguments:
  - id: TC-01
    name: \x3CCLI test name>
    context: |
      \x3CWhy this test exists, which changes affect it>
    steps:
      - run: \x3Ccommand that a human would type in their terminal>
      - run: \x3Cfollow-up command to verify the effect>
    expected: |
      \x3CExpected behavior: exit code, stdout content, side effects>

  # API test example:
  - id: TC-02
    name: \x3CAPI test name>
    context: |
      \x3CWhy this test exists, which changes affect it>
    steps:
      - action: curl
        method: GET
        url: http://localhost:\x3Cport>/\x3Cpath>
    expected: |
      \x3CExpected behavior in natural language>

  # Database verification example:
  - id: TC-03
    name: \x3CDatabase test name>
    context: |
      \x3CWhy this test exists, which changes affect it>
    steps:
      - run: \x3Ccommand that writes to the database>
      - run: psql "${DATABASE_URL}" -c "SELECT ... FROM ... WHERE ..."
    expected: |
      \x3CExpected rows, schema state, or migration effect>

  # Browser test example (always use agent-browser CLI commands):
  - id: TC-04
    name: \x3CUI test name>
    context: |
      \x3CWhy this test exists, which changes affect it>
    steps:
      - run: agent-browser open http://localhost:\x3Cport>/\x3Cpath>
      - run: agent-browser snapshot -i
      - run: agent-browser click @\x3Cref>
      - run: agent-browser snapshot -i
      - run: agent-browser screenshot evidence/tc-04.png
    expected: |
      \x3CExpected behavior in natural language>
    evidence:
      screenshot: evidence/tc-04.png

Step 7: Report Summary

After generating the test plan:

## Test Plan Generated

**File:** `docs/testing/test-plan.yaml`
**Branch:** \x3Ccurrent_branch> → \x3Cbase_branch>

### Detected Stack

| Component | Type | Port |
|-----------|------|------|
| \x3Ccomponent> | \x3Ctype> | \x3Cport> |

### Tests Generated

| ID | Name | Type | Affected By |
|----|------|------|-------------|
| TC-01 | \x3Cname> | curl/browser | \x3Cfiles> |

### Entry Point Coverage

- **Covered:** \x3CN> entry points with tests
- **Unchanged:** \x3CM> entry points not affected by this PR

### Next Steps

1. Review the generated test plan at `docs/testing/test-plan.yaml`
2. Adjust test values and expectations as needed
3. Run tests with:

/beagle-testing:run-test-plan

Step 8: Verification

Confirm Hard gates 1–5 are satisfied with evidence (see Hard gates above) before treating the plan as complete. Then run:

# Verify file was created
ls -la docs/testing/test-plan.yaml

# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" && echo "Valid YAML"

# Check required fields
grep -E "^version:|^metadata:|^setup:|^tests:" docs/testing/test-plan.yaml

Verification Checklist:

  • Test plan file created at docs/testing/test-plan.yaml
  • YAML is syntactically valid
  • At least one test case generated
  • Setup commands match detected stack
  • Each test has id, name, steps, and expected fields
  • No automated test duplication: Grep every run: and command: step in the plan for test runner invocations (cargo test, pytest, npm test, go test, mix test, jest, vitest, mocha, etc.). If ANY step invokes the project's test runner, the plan fails verification. Remove those steps and replace them with real E2E actions.
  • Behavioral coverage: At least one test exercises the primary behavioral change described in changes_summary. Re-read the changes_summary and commit messages — if they describe a capability (e.g., "adds Claude Code as a new LLM provider") but no test invokes that capability (e.g., sends a message through the provider), the plan fails verification. Add the missing core functionality test before completing.
  • No config-only plans: If all tests target configuration/admin entry points and zero tests target core functionality entry points, the plan is incomplete. Go back to Step 4, identify the core functionality entry points, and add tests for them.

Rules

  • E2E only — every test step must exercise the real built artifact (binary, server, UI) as a human would. Never wrap automated test suites.
  • Always create docs/testing/ directory if it doesn't exist
  • Generate at least one test per affected entry point
  • Include context explaining why each test matters (trace from changes)
  • Use natural language for expected field (agent will interpret)
  • CLI projects: Test steps should invoke the actual binary with real arguments and verify stdout, stderr, exit codes, and side effects (files created, database rows written, processes spawned)
  • Server projects: Start the server in setup, test via curl/agent-browser
  • Library-only projects with no binary or server: If the change is purely internal library code with no user-facing entry point (no CLI, no server, no UI), state this explicitly and generate tests that exercise the library through its public API via a small driver script — not by running the test suite
  • Default to conservative port detection (8000 for API, 5173/3000 for frontend)
  • Browser automation steps MUST use agent-browser CLI commands (e.g., agent-browser open, agent-browser snapshot -i, agent-browser click @ref) — never use abstract action syntax
  • Always agent-browser snapshot -i before interacting with elements and after navigation/DOM changes
  • Use agent-browser screenshot \x3Cpath> to capture evidence for browser tests
  • Use ${ENV_VAR} syntax for secrets, never hardcode credentials
  • If no user-facing changes detected, explain why and suggest manual verification
安全使用建议
This skill's functionality (producing an E2E test-plan YAML) is reasonable, but the SKILL.md expects many local CLI tools, git remote access, database access, and environment variables while the metadata declares none. Before installing or running it: (1) run it in a safe/sandboxed environment (no production DBs or secrets), (2) require the author to list exact required binaries and environment variables, (3) confirm whether the agent will access remote git origin (network access) and whether evidence outputs might contain secrets, and (4) prefer a scope-limited dry-run mode that only analyzes files without executing commands that contact externals. If you must use it, test on a fork/CI copy with sanitized environment variables and no access to production credentials.
功能分析
Type: OpenClaw Skill Name: gen-test-plan Version: 1.0.1 The 'gen-test-plan' skill bundle is a legitimate tool designed to analyze a repository's tech stack and generate end-to-end (E2E) test plans. It uses standard shell utilities (git, grep, ls) to discover entry points and trace code changes, and it includes rigorous validation steps (Gate 1–5) to ensure the generated YAML is well-formed and focuses on real user-facing behavior rather than simply wrapping existing automated tests. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the logic is transparent and strictly aligned with its stated purpose of QA automation.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The name/description (generate an E2E test plan from a repo diff) aligns with the SKILL.md's actions (scan the repo, discover stack, trace entry points, and emit YAML). However the SKILL.md expects the agent to run many local tooling commands (git, grep, python3, psql, docker-compose, agent-browser, build tools, etc.) even though the skill metadata lists no required binaries or install steps. That mismatch (unlisted runtime dependencies) is unexpected and should be corrected.
Instruction Scope
The instructions explicitly tell the agent to run a broad set of commands that read the repo, query git remotes (git merge-base origin/<base>), grep source files, and produce/paste command output as evidence. The doc also includes examples that reference environment variables and database access (e.g., psql "${DATABASE_URL}") and use of agent-browser. Those are within the high-level purpose but broaden the operational scope (network/git remote access, reading local files, possible access to DBs and credentials). The SKILL.md also instructs to capture and paste command outputs verbatim, which can expose secrets if present in output.
Install Mechanism
This is an instruction-only skill with no install spec and no code files, which is the lowest install risk. Nothing is downloaded or written to disk by the skill bundle itself.
Credentials
The skill declares no required environment variables or primary credential, yet the instructions and examples reference environment variables and services (DATABASE_URL, ANTHROPIC_API_KEY, docker-compose, psql, etc.) and assume those will be available. That is a proportionality mismatch: generating a plan may legitimately need to read repository files and infer ports, but it should explicitly declare any env vars or credentials it will read or recommend. As written, the skill may prompt the agent to use undisclosed secrets or system-state.
Persistence & Privilege
The skill does not request persistent presence (always: false) and has disable-model-invocation: true (it cannot be autonomously invoked by the model), which reduces risk. It does not modify other skills or system-wide settings according to the provided metadata.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gen-test-plan
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gen-test-plan 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Introduced a new "Hard gates" section outlining mandatory checkpoints that must be passed in order, covering branch diffing, entry point trace, YAML plan validity, prohibition of automated test duplication, and behavioral coverage. - Detailed the pass conditions and verification steps for each hard gate, ensuring stricter E2E plan quality and reproducibility. - Clarified diff and base branch resolution logic in Step 1 and updated all related example commands for consistency. - Specified where in the workflow each hard gate occurs and the expected evidence to retain after passing each. - Tightened requirements for YAML output: plan file must have specific keys and parse without error. - Added explicit procedural safeguards against generating non-E2E or duplicate test-runner steps.
v1.0.0
Initial release of gen-test-plan – a skill for generating end-to-end YAML test plans by analyzing code and repository changes, focused on real user-facing behavior. - Analyzes repo changes and diff vs base branch to determine impacted features. - Detects tech stack, user-facing entrypoints (CLIs, APIs, UI routes, migrations), and traces how code changes affect those entrypoints. - Strictly avoids steps that duplicate automated/unit tests; all E2E steps mimic human QA actions. - Outputs actionable test cases prioritized by core user functionality and configuration/admin impact. - Produces YAML-formatted test plans for execution by autonomous QA agents.
元数据
Slug gen-test-plan
版本 1.0.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 2
常见问题

Gen Test Plan 是什么?

Analyze repo, detect stack, trace changes to user-facing entry points, generate E2E YAML test plan. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 119 次。

如何安装 Gen Test Plan?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gen-test-plan」即可一键安装,无需额外配置。

Gen Test Plan 是免费的吗?

是的,Gen Test Plan 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Gen Test Plan 支持哪些平台?

Gen Test Plan 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Gen Test Plan?

由 Kevin Anderson(@anderskev)开发并维护,当前版本 v1.0.1。

💬 留言讨论