Description

Incremental backend API + frontend browser testing with persistent memory. Monitors every commit, enriches insufficient messages, and runs targeted tests sco...

README (SKILL.md)

Argus — Automated Testing Skill

Name: argus
Author: tiansyao

Hundred-eyed. Never sleeps. Every fixed bug becomes a permanent eye.

Command Routing

Parse the user's invocation and jump to the correct phase:

Command	Action
`/argus init`	→ Phase 1: Bootstrap
`/argus`	→ Phase 3 → 4 → 5 → 6 → 7 (full run)
`/argus test --backend`	→ Phase 5 only
`/argus test --frontend`	→ Phase 6 only
`/argus test --diff`	→ Phase 5 + 6, scoped to current branch diff
`/argus catalog`	→ Phase 3 only (update catalog, no tests)
`/argus report`	→ Phase 7 only (show last report)

If no .argus/catalog.md exists and command is not init, say:

"Argus has not been initialized. Run /argus init first."

File Layout

.argus/
  catalog.md          # test knowledge base — source of truth
  baseline.json       # health score history
  reports/
    YYYY-MM-DD.md     # per-run reports
  commit-hook.sh      # installed into .git/hooks/post-commit

tests/
  backend/
    conftest.py
    test_{module}.py
  frontend/
    test_{flow}.py

catalog.md Format

First line is always the scan cursor:

last_scanned_commit: {SHA}

Each test entry:

## {test_function_name}
- Type: backend | frontend
- Source: fix commit {SHA} — {description} | generated (routes scan) | manual | adversarial
- Protection: locked | regenerable | deprecated
- Covers: {endpoint or file list}
- File: tests/{path}::{function_name}
- Status: pending | generated | active ✅ | failing ❌ | deprecated
- Last run: {YYYY-MM-DD} {passed|failed}

Protection rules (never violate):

Protection	Source	Auto-delete	Auto-modify
`locked`	fix commit / manual	❌ Never	❌ Never
`regenerable`	generated / adversarial	✅ Yes	✅ Yes
`deprecated`	endpoint removed	Confirm with user	—

Phase 1 — Bootstrap (`/argus init`)

Step 1: Scan routes for endpoints

Read all files matching backend/app/routes/*.py and backend/app/routers/*.py (and equivalent paths). For each file extract:

HTTP method + path (from @router.get(...), @router.post(...), etc.)
Auth requirement (look for Depends(get_current_user) etc.)
Key business logic (rate limits, SSE, file operations)

Do NOT use OpenAPI spec. Source code is ground truth.

Step 2: Mine git history for bugs

git log --oneline --all | head -100

Filter commits whose message contains: fix, bug, 修复, 修正, hotfix, patch.

For each matched commit:

git show {SHA} --stat --format="%s%n%b"

Extract: changed files, affected endpoints, what broke.

Step 3: Read bugfix.md if present

cat bugfix.md 2>/dev/null || cat BUGFIX.md 2>/dev/null

Extract any documented regression risks and key protected files.

Step 4: Generate catalog.md

Create .argus/catalog.md. For each discovered test case:

fix commit → Protection: locked
routes scan → Protection: regenerable
Set all Status: pending
Set last_scanned_commit to current HEAD SHA

git rev-parse HEAD

Step 5: Generate tests/backend/conftest.py

Read existing tests/ directory if present. If conftest.py exists, do not overwrite.

Generate a conftest.py with:

base_url fixture reading from env TEST_BASE_URL (default http://localhost:8000)
client fixture using httpx.AsyncClient
guest_client fixture (unauthenticated)
auth_headers fixture (reads TEST_AUTH_TOKEN from env)

Step 6: Install git hook

Write .argus/commit-hook.sh:

#!/bin/bash
# Argus post-commit hook
# Enriches insufficient commit messages and runs incremental tests

COMMIT_MSG=$(git log -1 --format="%s%n%b")
CHANGED_FILES=$(git diff HEAD~1 HEAD --name-only 2>/dev/null || echo "")

# Pass to argus for analysis
echo "[Argus] Analyzing commit..."
# Claude will be invoked here via: claude -p "argus post-commit"
# For now, log for manual review
echo "[Argus] Changed files: $CHANGED_FILES" >> .argus/commit-log.txt
echo "[Argus] Message: $COMMIT_MSG" >> .argus/commit-log.txt

Symlink or copy to .git/hooks/post-commit:

cp .argus/commit-hook.sh .git/hooks/post-commit
chmod +x .git/hooks/post-commit

Step 7: Confirm

Print summary:

Argus initialized.
  Endpoints discovered: {N}
  Fix commits mined: {N}
  Catalog entries created: {N}  (locked: {N}, regenerable: {N})
  Hook installed: .git/hooks/post-commit

Next: run /argus to generate test code and execute.

Phase 2 — Commit Monitoring + Enrichment

Triggered by: post-commit hook or manually reviewing the last commit.

Step 1: Read the last commit

git log -1 --format="%H%n%s%n%b"
git diff HEAD~1 HEAD --name-only
git diff HEAD~1 HEAD --stat

Step 2: Score the commit message

A commit message is INSUFFICIENT if any of these are true:

Subject line is fewer than 15 characters
Subject is generic: "update", "fix", "wip", "test", "changes", "misc", "cleanup" with nothing after
Diff touches ≥ 3 files but message gives no indication of what changed
Diff contains route/API changes but no endpoint is mentioned
Message contains "fix" or "bug" or "修复" but describes no specific behavior

Step 3: If INSUFFICIENT — enrich

Analyze the diff deeply:

Which routes/endpoints changed?
What business logic was added or modified?
Is there a rate limit, auth check, or data validation change?
Is this a bug fix? What was the broken behavior?

Generate enrichment block. Amend the commit (only safe before push):

# Check if already pushed
LOCAL=$(git rev-parse HEAD)
REMOTE=$(git rev-parse origin/$(git branch --show-current) 2>/dev/null || echo "none")

if [ "$LOCAL" != "$REMOTE" ]; then
  # Safe to amend
  git commit --amend --no-edit -m "$(git log -1 --format='%s%n%n%b')

[Argus] Auto-enriched
Changed:
  {list of changed endpoints or files with brief description}

TESTABLE:
  endpoint: {most testable endpoint changed}
  scenario: {concrete behavior that should be verified}
  risk: {low|medium|high}"
fi

If already pushed: write enrichment to .argus/commit-notes/{SHA}.md instead, and note:

"Commit {SHA} already pushed. Enrichment saved to .argus/commit-notes/{SHA}.md"

Step 4: If SUFFICIENT

If message already has TESTABLE: block: extract and queue for Phase 3. If message is clear but has no TESTABLE: block: generate one and append to the amend.

Phase 3 — Incremental Catalog Update

Step 1: Determine scan range

Read last_scanned_commit from .argus/catalog.md.

git log {last_scanned_commit}..HEAD --format="%H %s"

If last_scanned_commit is empty or not found, scan last 20 commits.

Step 2: Process each new commit

For each commit in range:

git show {SHA} --format="%s%n%b" --stat

Extract:

Any TESTABLE: block in the message body
Whether it's a fix/bug commit (even without TESTABLE block)
Which files changed

Step 3: For fix commits without TESTABLE block

Read the diff:

git show {SHA} --unified=5

Infer what should be tested from the code change. Generate a catalog entry with:

Source: fix commit {SHA}
Protection: locked
Status: pending

Step 4: For TESTABLE blocks

Parse each field. Create catalog entry:

Source: fix commit {SHA} — {commit subject}
Protection: locked
Covers: the endpoint from TESTABLE block
Status: pending

Step 5: Deduplication

Before appending any entry, check if a test with the same function name or covering the same endpoint already exists in catalog. Skip duplicates.

Step 6: Update catalog.md

Append new entries. Update last_scanned_commit to HEAD.

Print:

Catalog updated.
  New entries: {N}
  Skipped (duplicate): {N}
  last_scanned_commit → {SHA}

Phase 4 — Test Code Generation

Step 1: Find pending entries

Read catalog.md. Collect all entries where Status: pending.

Sort by priority:

locked + backend first
locked + frontend
regenerable + backend
regenerable + frontend

Step 2: Read existing test files

Before generating, read the target test file if it exists. Identify existing function names. Never write a function that already exists.

Step 3: Generate backend test functions

For each pending backend entry:

# [Argus] {test_function_name}
# Source: {source}
# Protection: {protection} — {"DO NOT DELETE OR MODIFY" if locked else "auto-generated"}
# Intent: {what this test verifies}
async def {test_function_name}({fixtures}):
    # Arrange
    {setup}

    # Act
    response = await client.{method}("{path}", {params})

    # Assert
    assert response.status_code == {expected_status}
    {additional assertions derived from intent}

Use httpx.AsyncClient for all requests. Use fixtures from conftest.py.

For SSE endpoints, use client.stream().

For auth-required endpoints, use auth_headers fixture.

Step 4: Generate frontend test functions

For each pending frontend entry, generate a Playwright test outline:

# [Argus] {test_function_name}
# Source: {source}
# Protection: {protection}
# Intent: {what user flow this verifies}
def {test_function_name}():
    # This test requires: /argus test --frontend
    # Browser steps:
    # 1. {step}
    # 2. {step}
    # Assert: {what to verify in UI}
    pass  # Implemented via Playwright in Phase 6

Frontend test functions are stubs — actual execution uses Playwright in Phase 6.

Step 5: Write files

Append generated functions to the appropriate test file. Update catalog entries:

Status: generated
File: tests/{path}::{function_name}

Phase 5 — Backend Test Execution

Step 1: Check server is running

curl -s http://localhost:8000/health || curl -s http://localhost:8000/api/health || curl -s http://localhost:8000/docs

If no response: ask user to start the backend server.

Step 2: Determine which tests to run

/argus or /argus test --backend → all backend tests
/argus test --diff → scoped tests only

For --diff mode:

git diff main...HEAD --name-only

Match changed files against catalog Covers fields. Run only matched tests.

Special case: if any of these files changed, run ALL backend tests:

conftest.py, database.py, config.py, dependencies.py, main.py (These are foundational — changes affect everything)

Step 3: Run pytest

cd {project_root}
python -m pytest tests/backend/ -v --tb=short --no-header 2>&1

Or for scoped run:

python -m pytest {specific test files} -v --tb=short --no-header 2>&1

Step 4: Parse results

For each test, extract: function name, passed/failed, error message if failed.

Update catalog.md for each test:

Status: active ✅ or failing ❌
Last run: today's date + result

Step 5: For each FAILING test

Record in report:

BUG-{YYYY-MM-DD}-{NNN}
Test: {function_name}
Intent: {from catalog}
Source: {from catalog}
Error: {pytest output}
Covers: {endpoint}
Severity: high (if locked) | medium (if regenerable)

Do NOT attempt to fix bugs. Argus reports, does not repair.

Phase 6 — Frontend Browser Test Execution

Note: Frontend tests are NEVER run automatically on commit hook. Only on manual /argus or /argus test --frontend.

Step 1: Ensure test environment ready

Argus manages its own dependencies. Check and install if needed:

cd {project_root}

# Check if pytest-playwright is available
if ! python -c "import pytest_playwright" 2>/dev/null; then
    echo "[Argus] Installing browser testing dependencies..."
    pip install pytest-playwright playwright -q
    playwright install chromium 2>/dev/null || echo "[Argus] Chromium may need manual install: playwright install chromium"
fi

Step 2: Read frontend test stubs

Read all files in tests/frontend/. Collect test functions and their intent comments.

Step 3: Generate Playwright tests from stubs

For each frontend test stub, generate a Playwright test if not already generated:

File: tests/frontend/test_{flow}.py

"""Frontend browser tests — generated by Argus."""
import pytest


# [Argus] {test_name}
# Source: {source}
# Protection: {protection}
# Intent: {intent}
@pytest.mark.asyncio
async def test_{name}(page):
    """{intent}"""
    # Navigate to app URL (from TEST_APP_URL env, default: http://localhost:3000)
    base_url = os.environ.get("TEST_APP_URL", "http://localhost:3000")
    await page.goto(base_url)

    # Execute steps from intent:
    # {steps extracted from stub comments}

    # Screenshot on completion
    await page.screenshot(path=f".argus/reports/screenshots/{date}/{test_name}.png")

Step 4: Run Playwright tests

cd {project_root}
python -m pytest tests/frontend/ -v --browser chromium --headed=false \
    --screenshot=only-on-failure \
    --output=.argus/reports/screenshots/{date}/ 2>&1

Step 5: Record results

Parse pytest output:

Pass: catalog.md → Status: active ✅, Last run: today passed
Fail: catalog.md → Status: failing ❌, Last run: today failed, screenshot saved to .argus/reports/screenshots/{date}/{test_name}_fail.png

Phase 7 — Report Generation

Generate .argus/reports/{YYYY-MM-DD}.md:

# Argus Report — {YYYY-MM-DD}

## Health Score: {score}/100

| Category | Score | Weight |
|---|---|---|
| Locked tests passing | {X}/100 | 40% |
| Endpoint coverage | {X}/100 | 25% |
| High-risk paths covered | {X}/100 | 20% |
| Test stability (no flaky) | {X}/100 | 15% |

Previous: {prev_score} ({delta:+d})

## Summary
✅ Passed: {N}
❌ Failed: {N}
⚠️  Skipped: {N}
🔒 Locked tests: {N} ({N} passing)

## Failed Tests

{for each failing test:}
### BUG-{YYYY-MM-DD}-{NNN}
- Test: {function_name}
- Intent: {catalog intent}
- Source: {catalog source}
- Covers: {endpoint}
- Severity: {high|medium|low}
- Error:

{pytest error output}


## New Tests Added This Run
{list of new catalog entries}

## Coverage Gaps
{endpoints in routes with no catalog entry}

Health score calculation:

locked_score   = (locked_passing / total_locked) * 100
coverage_score = (endpoints_with_tests / total_endpoints) * 100
highrisk_score = (highrisk_covered / total_highrisk) * 100
stability_score = 100 if no_flaky else max(0, 100 - (flaky_count * 20))

health = (
  locked_score   * 0.40 +
  coverage_score * 0.25 +
  highrisk_score * 0.20 +
  stability_score * 0.15
)

High-risk paths are endpoints that:

Handle authentication
Handle payments or subscriptions
Use SSE streaming
Write to database

Update baseline.json:

{
  "runs": [
    {"date": "YYYY-MM-DD", "score": 78, "passed": 12, "failed": 3},
    ...
  ]
}

If score dropped vs previous run, print:

"⚠️ Health score dropped {delta} points. Check failing tests above."

Print ASCII trend (last 5 runs):

Score trend (last 5):
  71 ██████████████
  74 ███████████████
  78 ████████████████ ← today

Trigger Matrix

Trigger	Phases	Tests run	Max time
`post-commit` hook	2 → 3	Incremental backend only	30s
`/argus`	3 → 4 → 5 → 6 → 7	Full catalog	no limit
`/argus test --backend`	5 → 7	All backend	~2min
`/argus test --frontend`	6 → 7	All frontend	~5min
`/argus test --diff`	5 → 7	Diff-scoped	~1min
`/argus catalog`	3 only	None	~10s
`/argus report`	7 only	None	instant
`/argus init`	1 only	None	~30s

Rules

Never delete a locked test. Ever. Even if the endpoint no longer exists — mark it deprecated and ask the user.
Never fix bugs. Argus finds and reports. /qa fixes.
Never run frontend tests in the commit hook. Too slow.
Never overwrite an existing function. Check before writing.
Amend only before push. Check remote SHA before any git commit --amend.
Catalog is append-only for locked entries. Regenerable entries can be rewritten.
If server is down, report clearly and stop. Do not fail silently.

Usage Guidance

Before installing or running this skill: 1) Review every file the skill will write (especially .argus/commit-hook.sh and catalog.md) and do not install the hook until you are comfortable with its commands. 2) Confirm whether you want a post-commit hook that may amend commits — amending history can be surprising; prefer manual enrichment or require a pre-push/manual review. 3) Check if a 'claude' (or other assistant) CLI is present or configured on your machine; the SKILL.md references invoking such a tool, which could cause repository contents to be sent to an external service. 4) Inspect generated tests (conftest.py) for references to TEST_AUTH_TOKEN or other secrets and do not set those env vars unless necessary and safe. 5) Test Argus in a disposable clone/branch first (or on a repo without sensitive data). 6) Back up your repository and ensure your CI/push workflow is not automatically exposing .argus artifacts. 7) If you want to proceed, consider editing the hook to require manual confirmation before sending data externally, or remove the auto-amend behavior. If you want, I can highlight the exact lines in the SKILL.md that perform the risky actions and propose a safer hook implementation.

Capability Analysis

Type: OpenClaw Skill Name: argus-qa Version: 1.0.0 The Argus skill performs intrusive repository modifications, including the installation of a git post-commit hook (.git/hooks/post-commit) and the automatic amendment of git commit messages (git commit --amend). While these actions are documented as part of its QA and 'commit enrichment' functionality, they represent high-risk behaviors that modify the user's environment and version history. The skill also manages sensitive environment variables such as TEST_AUTH_TOKEN and performs automated package installations via pip and playwright.

Capability Assessment

ℹ Purpose & Capability

The skill's name and description (incremental backend/frontend testing, commit monitoring, generating tests) align with the actions described in SKILL.md: scanning routes, mining fix commits, generating tests, and installing a git hook. However, SKILL.md references invoking an external assistant ('claude -p "argus post-commit"') and includes generation of test fixtures that read env vars (TEST_BASE_URL, TEST_AUTH_TOKEN) even though requires.env lists none — minor mismatches to be aware of.

⚠ Instruction Scope

The instructions direct the agent to read large parts of the repository (routes, tests, git history, optional BUGFIX files), write and install a .git/hooks/post-commit hook, and—critically—describe enriching commit messages and (in places) amending commits. The SKILL.md also contains language indicating inclusion of 'Full source of all included files' for review. Reading/writing repository files and installing hooks is coherent for testing, but amending commits and the explicit suggestion to surface full source contents are high-impact actions and broaden scope beyond simple test generation. The hook is intended to run on every commit and could be used to trigger external analysis if an external tool (claude CLI) is present.

✓ Install Mechanism

This is an instruction-only skill with no install specification and no code files to execute. That lowers supply-chain risk: nothing will be automatically downloaded or installed by the platform beyond writing files in the repository per its instructions.

ℹ Credentials

The registry metadata declares no required environment variables or credentials, but SKILL.md-generated test fixtures reference TEST_BASE_URL and TEST_AUTH_TOKEN. The skill also logs commit messages and changed-file lists to .argus/commit-log.txt. There is a partial, implicit dependency on an external CLI ('claude') if present. While no explicit credentials are requested, the skill's behavior (reading and potentially exporting full source and commit data) could expose secrets if later sent to an external service — the absence of declared env var requirements is inconsistent with the generated test code and the external-invocation hints.

ℹ Persistence & Privilege

The skill does not request platform-level 'always' presence, but it does create persistent artifacts inside the repository (.argus/, .git/hooks/post-commit) and installs a post-commit hook that runs automatically on every commit. That persistence is plausible for a commit-monitoring testing tool, but because the hook is triggered on each commit and SKILL.md contemplates invoking an external assistant, this persistence increases the blast radius if the hook is configured or modified to call external services.

Version History

v1.0.0

Version 1.0.0 marks a major overhaul of the Argus skill's focus from blockchain risk analysis to automated incremental testing with persistent memory. - Removed all blockchain intelligence and risk scanning features. - Introduced a new workflow for automated backend and frontend testing, tied to git commit history and file changes. - Implements cataloging of tests, commit monitoring, enrichment of insufficient commit messages, and command-based phase routing. - Added support for persistent test catalogs, post-commit hooks, and protection rules for test coverage. - No external API costs or blockchain-related environment variables remain.

Metadata

Slug argus-qa

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is argus?

Incremental backend API + frontend browser testing with persistent memory. Monitors every commit, enriches insufficient messages, and runs targeted tests sco... It is an AI Agent Skill for Claude Code / OpenClaw, with 93 downloads so far.

How do I install argus?

Run "/install argus-qa" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is argus free?

Yes, argus is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does argus support?

argus is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created argus?

It is built and maintained by tiansyao (@tiansyao); the current version is v1.0.0.

More Skills

argus