Description

Performs a two-phase audit combining a fast deterministic scan and a deep LLM quality review of security, cron jobs, config, and skills.

README (SKILL.md)

ClawCheck

Name: ClawCheck
Author: merlinrabens

Two-phase audit: a fast deterministic scan catches structural issues, then you (the agent) do a deep quality evaluation on the flagged areas.

When to Use

After initial setup or major config changes
Before publishing skills to ClawHub (quality gate)
Periodic health check (weekly cron or manual)
When something feels off but openclaw doctor says "ok"
After installing new skills or updating OpenClaw

What This Checks vs Built-in

This skill	`openclaw doctor` (built-in)
Secrets exposure + token hygiene	Config JSON schema validation
Cron ops health + prompt quality review	Plugin/skill eligibility
Config optimization + value assessment	Channel connectivity
Skill structural + content quality audit	State migrations, browser detection

How It Works: Two Phases

Phase 1: Deterministic Scan (fast, free)

Run the script to get a structural baseline:

python3 {baseDir}/scripts/audit.py

Individual modules:

python3 {baseDir}/scripts/audit.py --security
python3 {baseDir}/scripts/audit.py --cron
python3 {baseDir}/scripts/audit.py --config
python3 {baseDir}/scripts/audit.py --skills

This produces JSON with scores, findings, and the bottom/top skill lists. Use this as your triage map for Phase 2.

Phase 2: Deep Quality Audit (you, the agent)

After running the script, perform these evaluations. Budget your depth based on what the user asked for ("quick check" = Phase 1 only, "full audit" or "quality review" = both phases).

2a. Config Quality Review

Read ~/.openclaw/openclaw.json and evaluate:

Heartbeat prompt: Read agents.defaults.heartbeat.prompt. Is it specific enough to catch real issues? Does it avoid heavy operations? A good heartbeat prompt is \x3C 200 words, checks 2-3 things, and has clear escalation criteria.
Model choices: Is the primary model appropriate for the workload? Are fallbacks a meaningful step-down (not the same tier)? Is the subagent model cheaper than primary?
Compaction thresholds: Are reserveTokens and keepRecentTokens reasonable for the context window size? Rule of thumb: reserve should be 15-20% of contextTokens.
Session maintenance: Are pruneAfter, maxEntries, rotateBytes set to values that match the usage pattern? Heavy cron usage needs more aggressive pruning.
Cron maxConcurrentRuns: Is it high enough for the number of frequent jobs? Count jobs with */ in their schedule expression.

Score each aspect 1-5. Report specific improvements.

2b. Cron Prompt Quality Review

Read ~/.openclaw/cron/jobs.json. Select the 5 most important enabled jobs using this heuristic:

Any job in error state (from Phase 1 findings)
Jobs with highest frequency x timeoutSeconds (most resource-consuming)
Jobs running on expensive models (opus/primary)
If still under 5, pick by business impact (backups, monitoring, user-facing)

For each selected job evaluate:

Prompt clarity: Specific enough to execute without guessing? Clear steps, expected output format, error handling?
Safety: Has guardrails? ("NEVER run git push", "read-only", "do not edit files directly")
Efficiency: Token-efficient? Flag prompts > 1500 chars that run on expensive models. Could the prompt reference a skill file instead of inlining instructions?
Output value: Produces actionable output or just noise?
Timeout: payload.timeoutSeconds set and reasonable for scope?

Score each job 1-5 on: purpose, prompt quality, safety, efficiency. Flag jobs scoring below 3.

Cross-reference: Check if any cron prompts reference skills that scored below 70 in Phase 1. A cron job is only as reliable as the skills it depends on.

2c. Skill Content Quality Review

From the Phase 1 results, pick:

The 3 lowest-scoring skills (from bottom_5)
Any skills the user specifically asks about
Skills used by failing cron jobs (cross-reference cron findings)

For each selected skill, read its full SKILL.md and evaluate:

Accuracy (2x weight): Would following these instructions produce correct behavior? Are API references current? Are file paths real?
Completeness (1.5x): Are all use cases covered? Edge cases? What happens when dependencies are missing?
Clarity (1x): Can an agent follow this without ambiguity? No hedging, clear steps, good examples?
Efficiency (1x): Is the SKILL.md bloated? Could it be shorter without losing information? Does it suggest efficient patterns (batching, caching)?
Voice alignment (1x, content-producing skills only): Does the output match the brand/user's tone?

Scoring formula depends on skill type:

Content/marketing skills (has voice component): (accuracy*2 + completeness*1.5 + clarity + efficiency + voice) / 6.5
Utility/tool skills (no voice): (accuracy*2 + completeness*1.5 + clarity + efficiency) / 5.5

For skills scoring below 4.0, write specific improvement recommendations with concrete examples.

2d. Security Assessment

Phase 1 now scans workspace files for common secret patterns (sk-, ghp_, AIzaSy, Bearer tokens, hex private keys, etc.). In Phase 2, go deeper:

Review any secrets the script found in workspace files. Are they real credentials or false positives (e.g., example/placeholder values)?
Check if any skill scripts/ contain hardcoded credentials or API URLs with embedded tokens
Check if .env files exist inside skill directories
Look for credentials in cron job prompts (some prompts inline API keys instead of referencing env vars)
Check if any workspace knowledge files contain customer data, passwords, or access tokens

Output Format

Phase 1 (script output)

{
  "score": 82,
  "score_type": "structural_hygiene",
  "status": "healthy",
  "sections": {
    "security": {"score": 65, "finding_count": 3},
    "cron": {"score": 95, "finding_count": 1},
    "config": {"score": 88, "finding_count": 2},
    "skills": {"score": 80, "finding_count": 1}
  },
  "findings": [...]
}

Phase 2 (your evaluation)

Present as a readable report to the user:

## ClawCheck Report

### Structural Baseline (Phase 1)
Overall: 82/100 (healthy)
Security: 65 | Cron: 95 | Config: 88 | Skills: 80

### Deep Quality Findings (Phase 2)

**Config:**
- Heartbeat prompt: 4/5 (clear but could add Telegram alert on critical)
- Model choices: 5/5 (opus primary, sonnet fallback, sonnet subagent)
- Compaction: 4/5 (reserveTokens=150k for 800k context = 19%, good)

**Cron (top concerns):**
- "Morning Brief" (3/5): prompt is 400 words but lacks output format spec
- "Bleeding Edge Scanner" (2/5): no safety guardrails, no error handling

**Skills (bottom 3):**
- marketing-automation: BROKEN (no SKILL.md)
- apple-notes (62/100 structural): [content evaluation]
- blucli (62/100 structural): [content evaluation]

### Recommended Actions (priority order)
1. [most impactful fix]
2. [next fix]
3. [next fix]

Scoring Weights (Phase 1 script)

Security 30%, cron 25%, config 20%, skills 25%.

Skill structure formula: (structure*2 + completeness*1.5 + clarity + efficiency) / 5.5 * 20

Remediation

For detailed fix patterns with real config examples, see {baseDir}/references/remediation.md.

Quick fixes for common findings:

Inline secrets

"GAMMA_API_KEY": {"source": "exec", "provider": "op-gamma", "id": "value"}

Plaintext bot token

"botToken": {"source": "exec", "provider": "op-telegram", "id": "value"}

Missing heartbeat

"heartbeat": {"every": "1h", "model": "sonnet", "prompt": "HEARTBEAT: Quick check..."}

Missing timezone on cron

"schedule": {"kind": "cron", "expr": "0 9 * * *", "tz": "Europe/Madrid"}

Error Handling

If OpenClaw dir not found: script exits with error JSON and exit code 1.
If openclaw.json is missing or invalid: script exits with error JSON.
If individual module fails: caught and reported as warning, other modules still run.
If bundled skills dir not accessible: skipped silently.
Phase 2 failures: if you can't read a file, note it and move on. Don't stop the whole audit.

Non-Goals

No direct edits to config or skills (report only, user decides)
No network calls (everything is local file inspection)
No overlap with openclaw doctor schema validation or channel connectivity checks

Usage Guidance

This skill appears internally consistent for performing a local OpenClaw audit, but it intentionally reads configuration, cron job definitions, skill SKILL.md files, and workspace content — some of which can contain sensitive secrets. Before installing or running a full audit: (1) review scripts/audit.py locally to confirm behavior (it's stdlib-only and outputs JSON); (2) run Phase 1 (python3 scripts/audit.py) first to get deterministic findings without using your LLM quota; (3) when performing Phase 2, be mindful that the agent will read potentially sensitive files — avoid granting network or export privileges to the agent if you do not want findings sent off-host; (4) consider running the audit in an isolated environment or after removing/masking known secrets; and (5) if you use autonomous agent invocation, review agent policies/permissions to limit unintended data exfiltration.

Capability Analysis

Type: OpenClaw Skill Name: clawcheck Version: 2.0.1 The 'clawcheck' skill is a legitimate security and configuration auditing tool for the OpenClaw environment. It consists of a Python script (audit.py) that performs a local, deterministic scan for plaintext secrets, cron job health, and configuration optimization, alongside an agent-facing instruction set (SKILL.md) for deep quality review. The tool lacks network access, obfuscation, or malicious execution patterns, and its high-privilege file access is strictly necessary for its stated purpose of identifying and remediating security risks.

Capability Assessment

✓ Purpose & Capability

The skill name/description match its contents: SKILL.md and included scripts/audit.py perform structural scans and LLM-guided reviews of OpenClaw config, cron, and skills. Declared requirement (python3) matches the provided Python script. No unrelated binaries, env vars, or network downloads are requested.

✓ Instruction Scope

Runtime instructions explicitly tell the agent to run the included script and to read ~/.openclaw/openclaw.json, cron/jobs.json, and skill SKILL.md files for Phase 2 reviews. These reads are within scope for an audit. Note: Phase 2 intentionally examines workspace/skill files and may surface sensitive content (inline secrets, tokens) — this is by-design for a secrets audit.

✓ Install Mechanism

No install spec or remote downloads; this is instruction-only with a bundled Python script. The script uses only stdlib. No extract-from-URL or third-party package installation is present.

✓ Credentials

The skill declares no required environment variables or credentials and only expects python3. The script will read OPENCLAW_DIR / OPENCLAW_STATE_DIR if set (reasonable for locating config). The remediation docs reference 1Password examples but the skill does not demand any secret/provider variables itself.

ℹ Persistence & Privilege

always:false (normal). disable-model-invocation:false (agent may invoke autonomously) — this is the platform default. Because the skill instructs reading local configs and workspace files, granting an agent autonomous invocation could expose local secrets if the agent is allowed to exfiltrate data; this is an operational consideration rather than an incoherence in the skill itself.

Version History

v2.0.1

Fix skill name in frontmatter, update all references to clawcheck

v2.0.0

Metadata

Slug clawcheck

Version 2.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is ClawCheck?

Performs a two-phase audit combining a fast deterministic scan and a deep LLM quality review of security, cron jobs, config, and skills. It is an AI Agent Skill for Claude Code / OpenClaw, with 144 downloads so far.

How do I install ClawCheck?

Run "/install clawcheck" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ClawCheck free?

Yes, ClawCheck is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ClawCheck support?

ClawCheck is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ClawCheck?

It is built and maintained by merlinrabens (@merlinrabens); the current version is v2.0.1.

More Skills

ClawCheck

ClawCheck

When to Use

What This Checks vs Built-in

How It Works: Two Phases

Phase 1: Deterministic Scan (fast, free)

Phase 2: Deep Quality Audit (you, the agent)

2a. Config Quality Review

2b. Cron Prompt Quality Review

2c. Skill Content Quality Review

2d. Security Assessment

Output Format

Phase 1 (script output)

Phase 2 (your evaluation)

Scoring Weights (Phase 1 script)

Remediation

Inline secrets

Plaintext bot token

Missing heartbeat

Missing timezone on cron

Error Handling

Non-Goals

What is ClawCheck?

How do I install ClawCheck?

Is ClawCheck free?

Which platforms does ClawCheck support?

Who created ClawCheck?

💬 Comments