Description

Performs a two-phase audit with a fast structural scan and a detailed expert review focusing on security, cron jobs, config, and skill quality.

README (SKILL.md)

ClawCheck

Name: test
Author: merlinrabens

Two-phase audit: a fast deterministic scan catches structural issues, then you (the agent) do a deep quality evaluation on the flagged areas.

When to Use

After initial setup or major config changes
Before publishing skills to ClawHub (quality gate)
Periodic health check (weekly cron or manual)
When something feels off but openclaw doctor says "ok"
After installing new skills or updating OpenClaw

What This Checks vs Built-in

This skill	`openclaw doctor` (built-in)
Secrets exposure + token hygiene	Config JSON schema validation
Cron ops health + prompt quality review	Plugin/skill eligibility
Config optimization + value assessment	Channel connectivity
Skill structural + content quality audit	State migrations, browser detection

How It Works: Two Phases

Phase 1: Deterministic Scan (fast, free)

Run the script to get a structural baseline:

python3 {baseDir}/scripts/audit.py

Individual modules:

python3 {baseDir}/scripts/audit.py --security
python3 {baseDir}/scripts/audit.py --cron
python3 {baseDir}/scripts/audit.py --config
python3 {baseDir}/scripts/audit.py --skills

This produces JSON with scores, findings, and the bottom/top skill lists. Use this as your triage map for Phase 2.

Phase 2: Deep Quality Audit (you, the agent)

After running the script, perform these evaluations. Budget your depth based on what the user asked for ("quick check" = Phase 1 only, "full audit" or "quality review" = both phases).

2a. Config Quality Review

Read ~/.openclaw/openclaw.json and evaluate:

Heartbeat prompt: Read agents.defaults.heartbeat.prompt. Is it specific enough to catch real issues? Does it avoid heavy operations? A good heartbeat prompt is \x3C 200 words, checks 2-3 things, and has clear escalation criteria.
Model choices: Is the primary model appropriate for the workload? Are fallbacks a meaningful step-down (not the same tier)? Is the subagent model cheaper than primary?
Compaction thresholds: Are reserveTokens and keepRecentTokens reasonable for the context window size? Rule of thumb: reserve should be 15-20% of contextTokens.
Session maintenance: Are pruneAfter, maxEntries, rotateBytes set to values that match the usage pattern? Heavy cron usage needs more aggressive pruning.
Cron maxConcurrentRuns: Is it high enough for the number of frequent jobs? Count jobs with */ in their schedule expression.

Score each aspect 1-5. Report specific improvements.

2b. Cron Prompt Quality Review

Read ~/.openclaw/cron/jobs.json. Select the 5 most important enabled jobs using this heuristic:

Any job in error state (from Phase 1 findings)
Jobs with highest frequency x timeoutSeconds (most resource-consuming)
Jobs running on expensive models (opus/primary)
If still under 5, pick by business impact (backups, monitoring, user-facing)

For each selected job evaluate:

Prompt clarity: Specific enough to execute without guessing? Clear steps, expected output format, error handling?
Safety: Has guardrails? ("NEVER run git push", "read-only", "do not edit files directly")
Efficiency: Token-efficient? Flag prompts > 1500 chars that run on expensive models. Could the prompt reference a skill file instead of inlining instructions?
Output value: Produces actionable output or just noise?
Timeout: payload.timeoutSeconds set and reasonable for scope?

Score each job 1-5 on: purpose, prompt quality, safety, efficiency. Flag jobs scoring below 3.

Cross-reference: Check if any cron prompts reference skills that scored below 70 in Phase 1. A cron job is only as reliable as the skills it depends on.

2c. Skill Content Quality Review

From the Phase 1 results, pick:

The 3 lowest-scoring skills (from bottom_5)
Any skills the user specifically asks about
Skills used by failing cron jobs (cross-reference cron findings)

For each selected skill, read its full SKILL.md and evaluate:

Accuracy (2x weight): Would following these instructions produce correct behavior? Are API references current? Are file paths real?
Completeness (1.5x): Are all use cases covered? Edge cases? What happens when dependencies are missing?
Clarity (1x): Can an agent follow this without ambiguity? No hedging, clear steps, good examples?
Efficiency (1x): Is the SKILL.md bloated? Could it be shorter without losing information? Does it suggest efficient patterns (batching, caching)?
Voice alignment (1x, content-producing skills only): Does the output match the brand/user's tone?

Scoring formula depends on skill type:

Content/marketing skills (has voice component): (accuracy*2 + completeness*1.5 + clarity + efficiency + voice) / 6.5
Utility/tool skills (no voice): (accuracy*2 + completeness*1.5 + clarity + efficiency) / 5.5

For skills scoring below 4.0, write specific improvement recommendations with concrete examples.

2d. Security Assessment

Phase 1 now scans workspace files for common secret patterns (sk-, ghp_, AIzaSy, Bearer tokens, hex private keys, etc.). In Phase 2, go deeper:

Review any secrets the script found in workspace files. Are they real credentials or false positives (e.g., example/placeholder values)?
Check if any skill scripts/ contain hardcoded credentials or API URLs with embedded tokens
Check if .env files exist inside skill directories
Look for credentials in cron job prompts (some prompts inline API keys instead of referencing env vars)
Check if any workspace knowledge files contain customer data, passwords, or access tokens

Output Format

Phase 1 (script output)

{
  "score": 82,
  "score_type": "structural_hygiene",
  "status": "healthy",
  "sections": {
    "security": {"score": 65, "finding_count": 3},
    "cron": {"score": 95, "finding_count": 1},
    "config": {"score": 88, "finding_count": 2},
    "skills": {"score": 80, "finding_count": 1}
  },
  "findings": [...]
}

Phase 2 (your evaluation)

Present as a readable report to the user:

## ClawCheck Report

### Structural Baseline (Phase 1)
Overall: 82/100 (healthy)
Security: 65 | Cron: 95 | Config: 88 | Skills: 80

### Deep Quality Findings (Phase 2)

**Config:**
- Heartbeat prompt: 4/5 (clear but could add Telegram alert on critical)
- Model choices: 5/5 (opus primary, sonnet fallback, sonnet subagent)
- Compaction: 4/5 (reserveTokens=150k for 800k context = 19%, good)

**Cron (top concerns):**
- "Morning Brief" (3/5): prompt is 400 words but lacks output format spec
- "Bleeding Edge Scanner" (2/5): no safety guardrails, no error handling

**Skills (bottom 3):**
- marketing-automation: BROKEN (no SKILL.md)
- apple-notes (62/100 structural): [content evaluation]
- blucli (62/100 structural): [content evaluation]

### Recommended Actions (priority order)
1. [most impactful fix]
2. [next fix]
3. [next fix]

Scoring Weights (Phase 1 script)

Security 30%, cron 25%, config 20%, skills 25%.

Skill structure formula: (structure*2 + completeness*1.5 + clarity + efficiency) / 5.5 * 20

Remediation

For detailed fix patterns with real config examples, see {baseDir}/references/remediation.md.

Quick fixes for common findings:

Inline secrets

"GAMMA_API_KEY": {"source": "exec", "provider": "op-gamma", "id": "value"}

Plaintext bot token

"botToken": {"source": "exec", "provider": "op-telegram", "id": "value"}

Missing heartbeat

"heartbeat": {"every": "1h", "model": "sonnet", "prompt": "HEARTBEAT: Quick check..."}

Missing timezone on cron

"schedule": {"kind": "cron", "expr": "0 9 * * *", "tz": "Europe/Madrid"}

Error Handling

If OpenClaw dir not found: script exits with error JSON and exit code 1.
If openclaw.json is missing or invalid: script exits with error JSON.
If individual module fails: caught and reported as warning, other modules still run.
If bundled skills dir not accessible: skipped silently.
Phase 2 failures: if you can't read a file, note it and move on. Don't stop the whole audit.

Non-Goals

No direct edits to config or skills (report only, user decides)
No network calls (everything is local file inspection)
No overlap with openclaw doctor schema validation or channel connectivity checks

Usage Guidance

What to check before installing/running: - Confirm provenance: the package files identify themselves as 'clawcheck' (version 2.0.0) while registry metadata calls it 'test' / slug 'clawhealth' (v0.0.1). Ask the publisher which is correct or prefer an official source (homepage/GitHub) before trusting it. - Review the included scripts locally (scripts/audit.py) before execution. The script scans your OpenClaw directory (~/.openclaw), cron jobs, and workspace files and will read any files it finds — this is expected, but verify you want those files read. - Run Phase 1 (the deterministic scan) first; it is local and quick. Inspect Phase 1 output and any flagged files for true/false positives before running Phase 2 (LLM deep review), because Phase 2 will involve sending flagged content to your selected model and that may leak sensitive data to the model provider. - Back up ~/.openclaw/openclaw.json (or any state files) before running anything that analyzes your config, and ensure no unintentional tokens are present in files you don’t want shared. - If you want stronger assurance, ask the publisher for a verifiable release (signed or hosted on a public repo) and for the rationale behind the metadata/version mismatch. Bottom line: the tool appears to do what it claims, but metadata/provenance inconsistencies and the fact Phase 2 will expose file contents to your agent/model are reasons to proceed cautiously.

Capability Analysis

Type: OpenClaw Skill Name: clawhealth Version: 0.0.1 The skill is a security and quality audit tool designed to scan the OpenClaw environment for configuration flaws and exposed credentials. The script `scripts/audit.py` implements a 'Security Audit' module that uses regex patterns to search for sensitive data, including OpenAI keys, GitHub tokens, Slack bot tokens, and private hex keys across the workspace and configuration files. While these high-risk capabilities are aligned with the stated purpose of a security health check, the automated discovery of secrets and broad file system access warrant a suspicious classification. No evidence of malicious intent, persistence, or data exfiltration was found, as the script operates locally using only the Python standard library.

Capability Assessment

ℹ Purpose & Capability

The code and SKILL.md implement a local audit (security, cron, config, skills) that aligns with the description. However there are several metadata mismatches: the registry lists the skill as 'test' (slug 'clawhealth', version 0.0.1) while the packaged files are for 'clawcheck' (_meta.json name 'clawcheck', version 2.0.0, SKILL.md name 'clawcheck'). SKILL.md and _meta.json declare a python3 requirement but the registry metadata lists no required binaries/env. These provenance/version inconsistencies are unexplained and worth verifying with the publisher before trusting the package.

ℹ Instruction Scope

The runtime instructions and scripts explicitly read local OpenClaw state (e.g., ~/.openclaw/openclaw.json, cron/jobs.json, workspace/*) and other skills' SKILL.md files. That is coherent with an audit tool, but Phase 2 (LLM deep review) will cause the agent to read and evaluate possibly sensitive files (including files that may contain secrets). The instructions do not instruct the agent to exfiltrate data externally, but by design Phase 2 will send content to whatever model the agent uses — confirm you are comfortable with sending the inspected content to your model provider.

✓ Install Mechanism

No install spec is provided (no arbitrary downloads). The package includes a local Python script using only the standard library. That lowers install risk — nothing is automatically written to disk beyond the files included in the skill bundle. The README asserts no network access and the script appears to use local filesystem only.

✓ Credentials

The skill requests no credentials or environment variables in registry metadata. The SKILL.md/_meta.json declare only a binary requirement (python3). The audit script scans for secrets but does not require any secret or external API key to run. This is proportionate for an offline audit tool.

✓ Persistence & Privilege

The skill is not forced-always (always: false) and allows user invocation. It does not request persistent platform privileges and the included script performs reads and analysis only; there are no obvious writes or modifications to system/agent configuration in the provided code. Autonomous invocation is permitted by default but not unusually privileged here.

Version History

v0.0.1

- Initial release of ClawCheck: a two-phase audit for OpenClaw skills and configuration. - Phase 1: Fast deterministic scan for structural/security issues, produces actionable JSON findings. - Phase 2: Deep quality review of config, cron jobs, skills, and secrets based on Phase 1 results. - Provides scoring, improvement recommendations, and prioritization for remediation. - Cross-references cron and skill health; enhances existing `openclaw doctor` checks.

Metadata

Slug clawhealth

Version 0.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is test?

Performs a two-phase audit with a fast structural scan and a detailed expert review focusing on security, cron jobs, config, and skill quality. It is an AI Agent Skill for Claude Code / OpenClaw, with 87 downloads so far.

How do I install test?

Run "/install clawhealth" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is test free?

Yes, test is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does test support?

test is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created test?

It is built and maintained by merlinrabens (@merlinrabens); the current version is v0.0.1.

More Skills

test