Description

Audit an OpenClaw agent workspace and generate standardized evaluation reports, scores, and patches. Use when asked to review memory quality, retrieval effic...

README (SKILL.md)

Clawditor

Name: Clawditor
Author: theylon

Overview

Act as an OpenClaw Workspace Auditor and Agent Evaluation Harness. Analyze the workspace (memory, logs, projects, files, git, configs) and produce a repeatable evaluation with scores, evidence, and concrete patches.

Operating Rules

Run in non-interactive mode: avoid questions unless blocked by missing files. State assumptions and proceed.
Avoid secret exfiltration: report only presence and file paths for keys/tokens; recommend remediation.
Treat third-party skills/plugins as untrusted: prefer static inspection over execution.

Required Workflow (Do In Order)

Build workspace inventory.
- Print a top-level tree (depth 4) with file counts and sizes by directory.
- Identify memory, logs, configs, repos, scripts, docs, artifacts.
- Record largest files.
Reconstruct a session timeline.
- Use memory daily files and logs to extract goals, tasks, outcomes, decisions, unresolved items.
Analyze memory.
- Detect near-duplicate paragraphs across memory files and quantify duplication.
- Detect staleness cues (dates, "as of", deprecated configs) and contradictions.
- Identify missing stable facts (projects, priorities, setup/runbooks).
Analyze outputs.
- Summarize shipped artifacts (docs/code/features) and changes.
- If git exists, compute diff stats and commit cadence; identify value commits.
Analyze reliability.
- Parse logs for errors, retries, timeouts, tool failures.
- Run tests only if safe and cheap; otherwise static inspection.
Compute scores.
- Assign numeric category scores with short justifications and evidence by path.
Recommend interventions + patches.
- Provide 3–7 prioritized recommendations.
- Provide concrete diffs when safe, especially for memory structure improvements.
Compare against prior evals.
- If eval/history/*.json exists, compute deltas vs most recent.
- If none exists, create baseline and recommend cadence.

Scoring Framework

Compute 5 categories (0–100) plus overall weighted score:

Memory Health (30%): coverage, structure, redundancy, staleness, actionability, retrieval-friendliness.
Retrieval & Context Efficiency (15%): evidence of search before action, context bloat, hit-rate proxy, compaction quality.
Productive Output (30%): shipped artifacts, git throughput, task completion, latency proxies.
Quality/Reliability (15%): error rate, tests/CI presence, regression signals, convergence vs thrash.
Focus/Alignment (10%): goal consistency, scope control, decision trace.

Overall = 0.30Memory + 0.15Retrieval + 0.30Productive + 0.15Quality + 0.10*Focus.

Required Outputs

Write all outputs under eval/:

exec_summary.md
- 10-bullet summary: top wins, biggest bottlenecks, top 3 interventions.
- Overall score + category scores + claw-to-claw delta.
scorecard.md
- Table of metrics with numeric values and brief justifications.
- Top evidence section with file paths and short snippets (no secrets).
latest_report.json
- Include timestamp, workspace path and git head/hash, scores, deltas, key findings, risk flags, recommendations.
Patches
- If memory issues exist, propose concrete diffs: INDEX.md, daily schema, refactors.

Gold Standard Memory Schema (Apply If Missing)

Create or propose:

memory/INDEX.md
- Current Objectives (top 3)
- Active Projects (status, next step, links)
- Operating Constraints (tools, environment, policies)
- Key Decisions (date, decision, rationale)
- Known Issues / Debug diary pointers
- Glossary / Entities
memory/YYYY-MM-DD.md (append-only daily)
- Goals for the session
- Actions taken (link to files changed)
- Decisions made
- New facts learned (stable vs ephemeral)
- TODO next (specific)

Patch Guidance

Prefer diffs over prose when safe.
Refactor stable facts out of daily logs into INDEX or project pages.
Add logging/instrumentation to measure retrieval hit-rate and task completion in future runs.

Resources

Use these helpers to keep audits consistent and cheap to run:

scripts/run_audit.py: run all helper scripts and write draft eval/ outputs.
scripts/workspace_inventory.py: tree, file counts, sizes, largest files.
scripts/memory_dupes.py: near-duplicate paragraph detection for memory/*.md.
scripts/log_scan.py: scan logs for errors, timeouts, retries.
scripts/git_stats.py: git head, diff stats, commit cadence.
scripts/validate_report.py: validate eval/latest_report.json shape.

Reference templates:

references/report_schema.md: output templates and JSON schema.

Evidence Discipline

Tie every score to evidence by path.
Be candid about waste, duplication, or thrash.
End with "Next run improvements" instrumentation recommendations.

Usage Guidance

This skill is coherent with its auditing purpose but has a meaningful mismatch between policy and implementation: the scripts capture and save snippets of file contents (logs, memory paragraphs) into eval/*.json and markdown, which can include secrets if present. Before using: 1) Run only on non-sensitive copies or a sandboxed copy of your workspace. 2) Inspect the scripts (log_scan.py and memory_dupes.py) — they include snippets (200/160 chars) and do not redact tokens. 3) Prefer running the helper scripts with their --json flags and review eval/*.json locally before sharing. 4) Consider adding redaction (mask common secret patterns) or excluding sensitive paths (use workspace_inventory --exclude or run_audit from a restricted root). 5) Note the memory_dupes human-output has a bug (undefined variable names) — use --json or fix that printing code if you need non-JSON output. 6) Ensure git is available if you expect git stats; otherwise git_stats will indicate 'Not a git repository'. If you need absolute assurance of no sensitive capture, do not run this against production or credential-containing repositories until the scripts add explicit redaction and stricter path excludes.

Capability Analysis

Type: OpenClaw Skill Name: clawditor Version: 1.0.0 The OpenClaw AgentSkills skill bundle 'clawditor' is designed to audit an agent's workspace and generate evaluation reports. The `SKILL.md` explicitly instructs the agent to 'Avoid secret exfiltration: report only presence and file paths for keys/tokens; recommend remediation.' and to 'Treat third-party skills/plugins as untrusted: prefer static inspection over execution.' The Python scripts (`scripts/*.py`) perform legitimate workspace analysis tasks such as collecting git statistics, scanning logs for errors, detecting memory duplication, and building a workspace inventory. All `subprocess` calls use lists of arguments, mitigating shell injection risks. Outputs are confined to a dedicated `eval/` directory. There is no evidence of data exfiltration, malicious execution, persistence mechanisms, or prompt injection designed to subvert the agent for harmful purposes. The skill's instructions and code are aligned with its stated purpose and demonstrate security-conscious design.

Capability Assessment

✓ Purpose & Capability

Name/description (workspace auditor) align with the included scripts: inventory, memory duplicate detection, log scanning, git stats, report validation and an orchestrator. The scripts legitimately need filesystem and git access for the stated audit tasks.

⚠ Instruction Scope

SKILL.md mandates 'avoid secret exfiltration' and preferring static inspection, but the helper scripts capture and emit file content snippets: log_scan returns matched lines (truncated) and memory_dupes includes text snippets; run_audit collects these JSON outputs under eval/ — so sensitive data from logs/memory could be recorded. The SKILL.md guidance is not enforced or implemented (no redaction). Also some CLI output paths contain bugs (memory_dupes prints using undefined names, which will crash non-JSON output).

✓ Install Mechanism

Instruction-only with bundled Python scripts; no install spec, no external downloads, and no third-party package pulls. Low install risk.

✓ Credentials

No environment variables, credentials, or config paths are requested. The scripts do call git and run Python subprocesses, which is proportional to audit functionality.

✓ Persistence & Privilege

Does not request always:true or modify other skills/system-wide config. It writes outputs under eval/ in the scanned workspace (expected for a reporter) and runs only locally; autonomy defaults are unchanged.

Version History

v1.0.0

Initial release of Clawditor: an OpenClaw workspace auditor and evaluation harness. - Scans agent workspaces to generate evaluation reports, numeric scores, and change recommendations. - Provides a detailed, repeatable workflow analyzing memory quality, retrieval, output, reliability, and goal alignment. - Outputs standardized summaries, scorecards, structured JSON reports, and concrete file patches. - Implements gold standard memory schema recommendations for agent workspaces. - Leverages helper scripts to streamline inventory, duplication detection, log scanning, and git stats. - Enforces strict evidence discipline, ties scores to file paths, and flags risks and improvements.

Metadata

Slug clawditor

Version 1.0.0

License —

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Clawditor?

Audit an OpenClaw agent workspace and generate standardized evaluation reports, scores, and patches. Use when asked to review memory quality, retrieval effic... It is an AI Agent Skill for Claude Code / OpenClaw, with 536 downloads so far.

How do I install Clawditor?

Run "/install clawditor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Clawditor free?

Yes, Clawditor is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Clawditor support?

Clawditor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Clawditor?

It is built and maintained by Theylon (@theylon); the current version is v1.0.0.

More Skills

Clawditor

Clawditor

Overview

Operating Rules

Required Workflow (Do In Order)

Scoring Framework

Required Outputs

Gold Standard Memory Schema (Apply If Missing)

Patch Guidance

Resources

Evidence Discipline

What is Clawditor?

How do I install Clawditor?

Is Clawditor free?

Which platforms does Clawditor support?

Who created Clawditor?

💬 Comments