Review Llm Artifacts
/install review-llm-artifacts
LLM Artifacts Review
Detect common artifacts left behind by LLM coding agents: over-abstraction, dead code, DRY violations in tests, verbose comments, and defensive overkill.
Hard gates (sequence)
Advance only when each pass condition is objectively true (prevents “review complete” without artifacts):
| Gate | Pass condition |
|---|---|
| G1 — Scope | File list is non-empty or you exit with exactly the Step 1 message; scope is set to all or changed. |
| G2 — Four categories | Tests, dead code, abstraction, and style are each reviewed (four parallel subagent runs when supported, or four sequential passes covering the same categories). Stop if any category did not complete; do not write JSON or a summary that implies a full pass. |
| G3 — JSON before summary | .beagle/llm-artifacts-review.json exists and is valid JSON before Step 6 markdown. |
| G4 — Integrity | Step 7 checks pass before treating the run as complete. |
Arguments
Parse $ARGUMENTS for flags and optional path:
| Flag | Effect |
|---|---|
| (default) | Changed-files scope — only files changed since git merge-base HEAD main (PR-style scope) |
--all |
Full project scan — all matching source files under the target path |
--parallel |
Force parallel execution where subagents are supported (default when 4+ files in scope) |
| Path | Root directory to scan (default: current working directory) |
Step 1: Determine Scope
A. Changed files only (default):
Resolve the base ref explicitly and fail loudly if none exists — do not wrap the git merge-base call in || true, which would silently swallow a missing main/master ref and report "no files to scan" on repos that only have origin/main or use master. If no base ref is found, suggest the user pass --all instead of silently falling back.
BASE=$(for ref in main origin/main master origin/master; do
git rev-parse --verify "$ref" >/dev/null 2>&1 && { echo "$ref"; break; }
done)
if [ -z "$BASE" ]; then
echo "error: no main/master ref found (checked main, origin/main, master, origin/master). Pass --all for a full-project scan." >&2
exit 1
fi
MERGE_BASE=$(git merge-base HEAD "$BASE") || {
echo "error: git merge-base HEAD $BASE failed." >&2
exit 1
}
git diff --name-only "$MERGE_BASE..HEAD" | grep -E '\.(py|ts|tsx|js|jsx|go|rs|java|rb|swift|kt)$' || true
(The trailing || true on the grep is intentional — zero source-file matches is a legitimate empty-scope result, distinct from a failed base-ref resolution.)
B. Full project (--all):
From TARGET (default .), list source files and prune excluded dependency/build trees so find never descends into them. ! -path "*/foo/*" only filters the output; find still walks the tree (minutes of wasted I/O on large node_modules, target, etc.). Use -prune instead:
find "$TARGET" \
\( -type d \( \
-name node_modules -o -name .git -o -name vendor -o -name __pycache__ \
-o -name .venv -o -name venv -o -name dist -o -name build \
-o -name target -o -name .next -o -name coverage -o -name .turbo \
\) -prune \) -o \
\( -type f \( \
-name "*.py" -o -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \
-o -name "*.go" -o -name "*.rs" -o -name "*.java" -o -name "*.rb" \
-o -name "*.swift" -o -name "*.kt" \
\) -print \)
Large repos: The --all path can produce huge file lists. If file count exceeds 400, warn and suggest narrowing: pass a subdirectory as TARGET, or drop --all to fall back to the default changed-files scope. Still proceed unless the user explicitly cancels. (This warning does not fire on the default changed-files scope, which is already bounded by the PR diff.)
If no files are found, exit with:
No files to scan. Check the path, branch, or pass --all for a full-project scan.
Set scope in the report: "all" for --all, "changed" for the default changed-files scope.
Step 2: Detect Languages
Extract unique file extensions from the file list:
echo "$FILES" | sed 's/.*\.//' | sort -u
Map extensions to language names for the report:
.py-> Python.ts,.tsx-> TypeScript.js,.jsx-> JavaScript.go-> Go.rs-> Rust.java-> Java.rb-> Ruby.swift-> Swift.kt-> Kotlin
Step 3: Review the Four Categories
Cover all four categories below. If the agent supports subagents and file count >= 4 (or --parallel is set), dispatch one subagent per category in parallel. Otherwise, run the four category reviews sequentially yourself, producing the same findings. Either way:
- Load the llm-artifacts-detection skill
- Review each category (one per subagent when parallel, one pass at a time when sequential)
- Collect findings in the structured format below
Category 1: Tests
Focus: Testing anti-patterns from LLM generation
- DRY violations (repeated setup code, duplicate assertions)
- Testing library/framework code instead of application logic
- Wrong mock boundaries (mocking too much or too little)
- Overly verbose test names that describe implementation
- Tests that just mirror the implementation
Category 2: Dead Code
Focus: Unused or obsolete code
- Unused imports, variables, functions, classes
- TODO/FIXME comments that should have been resolved
- Backwards compatibility code for removed features
- Orphaned test files for deleted code
- Commented-out code blocks
- Feature flags that are always on/off
Category 3: Abstraction
Focus: Over-engineering patterns
- Unnecessary abstraction layers (interfaces for single implementations)
- Copy-paste drift (similar code that diverged slightly)
- Over-configuration (configurable things that never change)
- Premature generalization
- Factory/Builder patterns for simple object creation
- Deep inheritance hierarchies
Category 4: Style
Focus: Verbose or defensive patterns
- Verbose comments explaining obvious code
- Defensive overkill (null checks on non-nullable values)
- Unnecessary type hints (dynamic languages with obvious types)
- Overly explicit error messages
- Redundant logging
- Self-documenting code with documentation
Step 4: Consolidate Findings
Prerequisite: G2 satisfied (all four category reviews finished successfully).
Once all four category reviews have completed (parallel subagents or sequential passes), then:
- Merge all findings into a single list
- Assign unique IDs (1, 2, 3...)
- Group by category for display
Echo before write (anti-confabulation): Every finding written to JSON MUST come from a category review's [FILE:LINE] ISSUE_TITLE output, not from the branch name, directory, or your own inference. After assigning ids, echo the consolidated table — id | category | file:line | description — and confirm each row traces to a specific category result. Do not add findings that no category review reported.
ID lock: Ids are contiguous 1..N with no gaps or duplicates. This 1..N set is the locked id set that downstream skills (verify-llm-artifacts, fix-llm-artifacts) bind to 1:1. summary.total MUST equal N, and summary.by_category counts MUST sum to N. State the id set before writing JSON.
Step 5: Write JSON Report
Create .beagle directory if it doesn't exist:
mkdir -p .beagle
Write findings to .beagle/llm-artifacts-review.json:
{
"version": "1.0.0",
"created_at": "2024-01-15T10:30:00Z",
"git_head": "abc1234",
"scope": "all" | "changed",
"target": ".",
"files_scanned": 42,
"languages": ["Python", "TypeScript", "Go"],
"findings": [
{
"id": 1,
"category": "tests" | "dead_code" | "abstraction" | "style",
"type": "dry_violation" | "unused_import" | "over_abstraction" | "verbose_comment" | "...",
"file": "src/utils/helper.py",
"line": 42,
"description": "Repeated setup code in 5 test functions",
"suggestion": "Extract to a pytest fixture",
"risk": "Low" | "Medium" | "High",
"fix_safety": "Safe" | "Needs review",
"fix_action": "refactor" | "delete" | "simplify" | "extract"
}
],
"summary": {
"total": 15,
"by_category": {
"tests": 4,
"dead_code": 5,
"abstraction": 3,
"style": 3
},
"by_risk": {
"High": 2,
"Medium": 8,
"Low": 5
},
"by_fix_safety": {
"Safe": 10,
"Needs review": 5
}
}
}
Step 6: Display Summary
Prerequisite: G3 satisfied (JSON on disk and parseable).
## LLM Artifacts Review
**Scope:** Changed files since merge-base with main | Entire project under `\x3Cpath>` (when `--all`)
**Files scanned:** 42
**Languages:** Python, TypeScript, Go
### Findings by Category
...
### Summary Table
...
### Next Steps
- Run the [verify-llm-artifacts](../verify-llm-artifacts/SKILL.md) skill to confirm findings and drop false positives before fixing.
- Run the [fix-llm-artifacts](../fix-llm-artifacts/SKILL.md) skill after verification (or to preview safe-only fixes).
- Review the JSON report at `.beagle/llm-artifacts-review.json`
Step 7: Verification (report integrity)
Before completing, verify the review executed correctly:
- JSON validity: Confirm
.beagle/llm-artifacts-review.jsonexists and is parseable - Category coverage: All 4 category reviews completed without errors (parallel subagents or sequential passes)
- Git HEAD captured: The
git_headfield is non-empty in the report - Staleness check: If a previous report exists, compare stored
git_headto current HEAD and warn if different - ID + count integrity: Finding ids are contiguous
1..N;summary.total == N;summary.by_categorysums toN. A mismatch means a finding was added, dropped, or duplicated — fix before completing.
python3 -c "import json; json.load(open('.beagle/llm-artifacts-review.json'))" 2>/dev/null && echo "✓ Valid JSON" || echo "✗ Invalid JSON"
python3 - \x3C\x3C'PY'
import json
r = json.load(open('.beagle/llm-artifacts-review.json'))
ids = [x['id'] for x in r['findings']]
n = len(ids)
ok = ids == list(range(1, n + 1)) and r['summary']['total'] == n \
and sum(r['summary']['by_category'].values()) == n
print("✓ ids 1..N and counts consistent" if ok else f"✗ id/count mismatch: ids={ids} total={r['summary']['total']}")
PY
STORED_HEAD=$(jq -r '.git_head' .beagle/llm-artifacts-review.json 2>/dev/null)
CURRENT_HEAD=$(git rev-parse --short HEAD)
if [ "$STORED_HEAD" != "$CURRENT_HEAD" ]; then
echo "⚠️ Report was generated on $STORED_HEAD, current HEAD is $CURRENT_HEAD"
fi
If any verification fails, report the error and do not proceed.
Finding-level verification (precision, not JSON syntax) is a separate skill: verify-llm-artifacts — run it before mass deletes or --fix on risky items.
Output Format for Each Finding
[FILE:LINE] **ISSUE_TYPE** (Risk, Fix Safety)
- Description
- Suggestion: Specific fix recommendation
Rules
- Follow Hard gates order; do not skip G3 (JSON before Step 6).
- Anti-confabulation: every finding must trace to a category review's
[FILE:LINE]output (Step 4 echo); never invent findings from the branch name, directory, or inference. See the review-verification-protocol skill → Anti-confabulation (gate 0). - Always load the llm-artifacts-detection skill first
- Use parallel subagents (when the agent supports them) for the four category reviews when >= 4 files; otherwise run them sequentially
- Every finding MUST have file:line reference
- Categorize risk honestly (don't inflate or deflate)
- Mark fix safety as "Safe" only if change is mechanical and reversible
- Create
.beagledirectory if needed - Write JSON report before displaying summary
- Default scope is changed files since merge-base with main; pass
--allfor a full-project scan
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install review-llm-artifacts - After installation, invoke the skill by name or use
/review-llm-artifacts - Provide required inputs per the skill's parameter spec and get structured output
What is Review Llm Artifacts?
Detects common LLM coding agent artifacts across four categories (tests, dead code, abstraction, style) over the project or changed files — using parallel su... It is an AI Agent Skill for Claude Code / OpenClaw, with 31 downloads so far.
How do I install Review Llm Artifacts?
Run "/install review-llm-artifacts" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Review Llm Artifacts free?
Yes, Review Llm Artifacts is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Review Llm Artifacts support?
Review Llm Artifacts is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Review Llm Artifacts?
It is built and maintained by Kevin Anderson (@anderskev); the current version is v1.0.0.