Description

Multi-dimensional code audit using structured subagent delegation. Use when reviewing a GitHub release, PR, or codebase. Systematically inspects security, co...

README (SKILL.md)

Code Review — Multi-Dimensional Audit Methodology

Name: Code Review — Multi-Dimensional Audit
Author: yinghaojia

Systematically audit a codebase release through five dimensions, using parallel subagent delegation for deep verification. Inspired by: Modern Code Review taxonomy research (Bavota & Russo 2015 "Four Eyes Are Better Than Two"), reviewdog's tool-agnostic harness pattern, Danger's pre-review gate philosophy, and community experience with AI-generated code quality issues.

Core Principles

Real code, not release notes. Every finding must be verified against actual source files by fetching them. The only acceptable evidence is file:line citations. The only acceptable conclusion labels are Confirmed / Mitigated / False Alarm.
Four Eyes on every Critical. Any finding classified as Critical severity MUST be independently verified by a second subagent before appearing in the final report. This is the "Four Eyes" principle from Bavota & Russo (2015): multiple reviewers independently examining the same issue catch 60%+ more real bugs than a single reviewer. See four-eyes.md.
Simplicity is a first-class dimension. AI-generated code often produces "massive overkill" — hundreds of lines for what should be a two-method change. Always ask: "Does the complexity of this solution match the complexity of the problem?" This dimension is inspired by community experience on Hacker News and Reddit (2025 State of AI Code Quality discussions).

Workflow

Phase 0: Pre-Review Gate (in main session, \x3C2 min)

Run these quick checks before committing to a full audit. Inspired by Danger's "automated pre-review" philosophy.

PR/Diff size check: if the change exceeds 400 lines, flag it as high-risk and recommend splitting
Missing artifacts: is there a CHANGELOG entry? Updated README if API changed? Migration guide if schema changed?
File-level red flags: any committed .env, credentials, large binary files?
Test presence: does this change include or update tests? If zero test changes on a >100 line diff, flag.

Output: Gate report (pass/warn/fail) + recommended audit depth.

Phase 1: Surface Scan (in main session)

Read these in order — enough to understand architecture and identify candidate issues:

Release notes / CHANGELOG — what the authors claim changed
README — project purpose, architecture diagram, on-disk layout
ARCHITECTURE.md or equivalent — module decomposition, API contracts
Directory tree (via GitHub tree view) — file listing to map modules
Key source files — entry point, core state machine, critical paths (read ~3-8 files)

Output: A list of 10-20 candidate issues, categorized by dimension:

Security (SSRF, injection, auth, path traversal, credential leaks)
Concurrency & State Machine (race conditions, missing locks, TOCTOU, state corruption)
UX & Implementation Logic (feature semantics, error messages, recovery paths, access control)
Test Quality (mock fidelity, integration gaps, signature mismatches, coverage blind spots)
Simplicity & Over-Engineering (complexity-vs-problem mismatch, unnecessary abstraction, AI-bloat patterns)

Phase 2: Deep Audit (via subagents)

For each non-trivial dimension, spawn an isolated subagent. Each subagent:

Fetches every relevant source file via web_fetch — never infers from docs
Verifies each issue against actual code — cites specific lines
Constructs exploit scenarios (security) or race timelines (concurrency)
Returns structured findings with: Conclusion / Severity / Source Evidence / Risk / Fix

See subagent-templates.md for the exact prompt template. See audit-dimensions.md for dimension-specific question probes.

Model guidance: Use the same model for all subagents to ensure consistent judgment. Prefer high-reasoning models for complex audits.

Phase 3: Four-Eyes Cross-Verification (critical findings only)

For every finding classified as Critical by a subagent:

Spawn a second, independent subagent (different dimension focus) with the exact same issue prompt
If both confirm → Confirmed. The issue enters the final report with a 👁️ Four-Eyes Verified badge.
If they disagree → Flag as "Disputed" in the report with both conclusions quoted.
If the second finds the issue is Mitigated/False Alarm while the first said Critical → The second wins. But keep both in an appendix.

Phase 4: Synthesis (in main session)

When all subagent reports return:

Merge findings — deduplicate across dimensions, re-classify severity
Build summary table — all issues with conclusion + severity + source dimension + root cause
Build priority matrix — P0 (drop everything) through P5 (nice to have), with estimated work and blast radius
Write executive summary — overall quality assessment + top 3 action items
Add Simplicity Score — a subjective score 1-5 on whether the codebase's complexity matches its problem domain. 5 = elegantly simple, 1 = massively over-engineered.

See output-format.md for table and emoji conventions. See severity-rubric.md for severity classification rules.

Key Heuristics

Security Scan Heuristics

Every URL fetch path must be checked for SSRF: trace from user input → URL parsing → DNS resolution → HTTP request → redirect handling → response reading. Flag any step that skips IP validation.
Every subprocess call must be checked for injection: is shell=True used? Are user-controlled strings concatenated into the command? Are file paths sanitized?
Every external API call must be checked for credential leaks: are tokens/secrets logged? Do error messages include request bodies?

Concurrency Scan Heuristics

For every .json / .jsonl write: check if it uses tmp-rename atomic pattern or flock. Direct overwrite without either = bug.
For every load → modify → save pattern: check if the entire block is lock-protected. If load happens outside the lock, it's a TOCTOU bug.
For every state machine transition: check if two concurrent events can both see the same "before" state and both advance. If yes, state corruption possible.
For every append-only log: verify flock(LOCK_EX) covers the full append operation.

UX/Logic Scan Heuristics

For every feature flag/mode: trace all branches. Does "mode=review-only" actually prevent non-review actions? Don't trust the name — verify the code.
For every error message: read it as a user would. Does it tell you what went wrong AND how to fix it? If it only says "X failed", flag it.
For every multi-step workflow: is there an undo/backtrack/revisit path? If not, flag it.
For every access-control check: look for what's NOT checked. Does a group chat require @-mention? Does a rate limit exist?

Test Quality Heuristics

Mocks that match wrong signatures: if a test monkeypatches call_llm with a fake that takes **kw and reads kw.get("old_param"), it will never catch a production code change to new_param. Flag these.
No integration test in CI: if CI only runs unit tests with mocks and there's no end-to-end smoke test, flag it.

Simplicity & Over-Engineering Heuristics (NEW — v1.1.0)

AI-bloat detection: does the change introduce a new service class, background worker, or framework dependency for what should be 1-2 methods in an existing file? (Inspired by the HN "batching = 2 methods, not 200 lines" incident.)
Abstraction without justification: does the code introduce interfaces, factories, or dependency injection where direct calls would suffice? For each abstraction layer, ask: "What concrete problem does this solve today?"
Dead code or "future-proofing": are there code paths, config options, or extension points that are not used by any existing feature?
Config sprawl: does this change add new config keys, env vars, or CLI flags? Is each one justified by a real use case?
Copy-paste detection: are there blocks of >10 lines that could be extracted? Flag both directions — missing DRY AND forced DRY where the two copies have different evolution paths.

Anti-Patterns (avoid)

❌ Filing an issue based on release notes alone (always verify against source)
❌ Accepting a docstring claim without checking the implementation
❌ Using "I think" / "probably" / "seems like" — every finding is Confirmed or it's not a finding
❌ Leaving severity as "TBD" — classify immediately using the rubric
❌ Mentioning an issue in prose without filing it in the structured output table
❌ Trusting a single subagent's Critical finding without Four-Eyes cross-verification

Harness Compatibility

This skill is designed to work with existing code review tooling, not replace it. The recommended stack:

Layer	Tool	What it catches
Lint/formatter	ruff, eslint, gofmt	Style, basic bugs
Static analysis	SonarQube, Semgrep, CodeQL	Security vulns, code smells
Diff harness	reviewdog, Danger	Runs the above, posts inline comments
🆕 Deep audit	This skill	Cross-cutting: concurrency, over-engineering, UX logic, test gaps
Human review	Your team	Architecture, trade-offs, domain knowledge

The Pre-Review Gate (Phase 0) picks up what reviewdog/SonarQube would catch, so you don't waste subagent time on style issues. Subagents focus on what static analysis can't see.

Usage Guidance

This skill appears safe to use for structured code reviews. Before using it, make sure the repository contents can be shared with the agent/subagents, especially for private codebases, and expect additional tool/model usage from the multi-agent review process.

Capability Analysis

Type: OpenClaw Skill Name: deep-code-review Version: 1.1.1 The skill bundle provides a highly structured and professional framework for an AI agent to perform deep code reviews, including security audits, concurrency checks, and simplicity assessments. It utilizes advanced techniques like subagent delegation and a 'Four-Eyes' verification protocol (detailed in references/four-eyes.md) to minimize false positives. The instructions in SKILL.md and the supporting reference files are entirely consistent with the stated purpose of identifying vulnerabilities and logic flaws in external codebases, with no evidence of malicious intent, data exfiltration, or harmful prompt injection.

Capability Assessment

✓ Purpose & Capability

The stated purpose is a multi-dimensional code audit, and the included reference files consistently support code-review methodology, severity scoring, output formatting, and subagent templates.

ℹ Instruction Scope

The workflow asks the agent to fetch relevant source files and spawn review subagents; this is purpose-aligned for code review but should be used only on code the user is allowed to share with the agent.

✓ Install Mechanism

There is no install spec, no required binaries, no environment variables, and no code files; the skill is instruction-only.

ℹ Credentials

Fetching repository files via web_fetch and distributing review work to subagents is proportionate for a deep audit, but it may touch many source files and consume extra model/tool resources.

✓ Persistence & Privilege

The artifacts do not request credentials, elevated privileges, persistent memory, background workers, or ongoing activity after the review task.

Version History

v1.1.1

v1.1.1: Added /deep-code-review, /code-review, /review-code trigger phrases for direct invocation.

v1.1.0

v1.1.0: Added Phase 0 Pre-Review Gate (Danger-inspired), Simplicity & Over-Engineering dimension (from HN community experience), Four-Eyes Cross-Verification protocol (Bavota & Russo 2015), and Harness Compatibility guide.

v1.0.0

Initial release: Introduces a multi-dimensional code audit skill with structured subagent-based review and strict evidence requirements. - Systematically audits codebases across security, concurrency, UX/logic, and test quality dimensions. - Spawns parallel subagents for each audit dimension, each verifying findings via actual source files with line citations. - Enforces strict classification for findings: Confirmed / Mitigated / False Alarm; no speculative reporting allowed. - Synthesizes findings into a severity- and priority-ranked matrix for actionable review. - Provides detailed heuristics and anti-patterns for consistent, high-quality code review processes.

Metadata

Slug deep-code-review

Version 1.1.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Code Review — Multi-Dimensional Audit?

Multi-dimensional code audit using structured subagent delegation. Use when reviewing a GitHub release, PR, or codebase. Systematically inspects security, co... It is an AI Agent Skill for Claude Code / OpenClaw, with 58 downloads so far.

How do I install Code Review — Multi-Dimensional Audit?

Run "/install deep-code-review" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Code Review — Multi-Dimensional Audit free?

Yes, Code Review — Multi-Dimensional Audit is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Code Review — Multi-Dimensional Audit support?

Code Review — Multi-Dimensional Audit is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Code Review — Multi-Dimensional Audit?

It is built and maintained by YinghaoJia (@yinghaojia); the current version is v1.1.1.

More Skills

Code Review — Multi-Dimensional Audit