← 返回 Skills 市场
don-gbot

Cross Model Review

作者 Don-GBot · GitHub ↗ · v2.1.0
cross-platform ✓ 安全检测通过
750
总下载
0
收藏
3
当前安装
5
版本数
在 OpenClaw 中安装
/install cross-model-review
功能描述
Adversarial plan review using two different AI models. Supports static mode (fixed roles) and alternating mode (models swap writer/reviewer each round, fully...
使用说明 (SKILL.md)

cross-model-review

Metadata

name: cross-model-review
version: 2.0.0
description: >
  Adversarial plan review using two different AI models.
  v2: Alternating mode — models swap writer/reviewer each round.
  Fully autonomous loop — no human input between rounds.
  Use when: building features touching auth/payments/data models,
  plans that will take >1hr to implement.
  NOT for: simple one-file fixes, research tasks, quick scripts.
triggers:
  - "review this plan"
  - "cross review"
  - "challenge this"
  - "is this plan solid?"
  - "adversarial review"

When to Activate

Activate this skill when the user:

  • Says any trigger phrase above
  • Shares a plan and asks for adversarial/second-opinion review
  • Asks you to "sanity check" a multi-step implementation plan

Do NOT activate for: simple fixes, one-liners, pure research tasks.


Modes

Static Mode (v1 — backward compatible)

Fixed roles: planner always writes, reviewer always reviews. Requires human to trigger each round.

Alternating Mode (v2 — recommended)

Models swap roles each round. Fully autonomous — no human input between rounds.

Flow:

  • Round 1: Model A writes the plan. Model B reviews.
  • Round 2: Model B rewrites (based on its own review). Model A reviews.
  • Round 3: Model A rewrites (based on its own review). Model B reviews.
  • ...continues alternating until both agree (reviewer says APPROVED) or max rounds hit.

Why this works:

  • Each model must implement its own critique — can't nitpick without owning the fix
  • The other model catches over-engineering or proportionality issues
  • Natural convergence: each round addresses the other's concerns

Autonomous Orchestration (Alternating Mode)

You (the main agent) run this loop. It's fully autonomous after kickoff.

Step 1 — Save the plan and init

node review.js init \
  --plan /path/to/plan.md \
  --mode alternating \
  --model-a "anthropic/claude-opus-4-6" \
  --model-b "openai-codex/gpt-5.3-codex" \
  --project-context "Brief description for reviewer calibration" \
  --out /home/ubuntu/clawd/tasks/reviews

Captures workspace path from stdout.

Step 2 — The autonomous loop

while true:
  step = next-step(workspace)

  if step.action == "done":
    break  # APPROVED!

  if step.action == "max-rounds":
    ask user: override or manual fix
    break

  if step.action == "review":
    spawn sub-agent with step.model, step.prompt
    save response to workspace/round-N-response.json
    parse-round(workspace, round, response)
    continue

  if step.action == "revise":
    spawn sub-agent with step.model, step.prompt
    save output plan to temp file
    save-plan(workspace, temp file, version)
    continue

Step 3 — Finalize

When the loop exits with APPROVED:

node review.js finalize --workspace \x3Cworkspace>

Present: rounds taken, issues found/resolved, rubric scores, plan-final.md location.


CLI Reference

Commands:
  init           Create a review workspace
  next-step      Get next action for autonomous loop
  parse-round    Parse a reviewer response, update issue tracker
  save-plan      Save a revised plan version from writer output
  finalize       Generate plan-final.md, changelog.md, summary.json
  status         Print current workspace state

init options:
  --plan \x3Cfile>            Path to plan file (required)
  --mode \x3Cm>               "static" (default) or "alternating"
  --model-a \x3Cm>            Model A — writes first (alternating mode, required)
  --model-b \x3Cm>            Model B — reviews first (alternating mode, required)
  --reviewer-model \x3Cm>     Reviewer model (static mode, required)
  --planner-model \x3Cm>      Planner model (static mode, required)
  --project-context \x3Cs>    Brief project context for reviewer calibration
  --out \x3Cdir>              Output base dir (default: tasks/reviews)
  --max-rounds \x3Cn>         Max rounds (default: 5 static, 8 alternating)
  --token-budget \x3Cn>       Token budget for context (default: 8000)

next-step options:
  --workspace \x3Cdir>        Path to review workspace (required)
  Returns JSON: { action, model, round, prompt, planVersion, saveTo }
  Actions: "review", "revise", "done", "max-rounds"

parse-round options:
  --workspace \x3Cdir>        Path to review workspace (required)
  --round \x3Cn>              Round number (required)
  --response \x3Cfile>        Path to raw reviewer response (required)

save-plan options:
  --workspace \x3Cdir>        Path to review workspace (required)
  --plan \x3Cfile>            Path to revised plan markdown (required)
  --version \x3Cn>            Plan version number (required)

finalize options:
  --workspace \x3Cdir>        Path to review workspace (required)
  --override-reason \x3Cs>    Reason for force-approving with open issues
  --ci-force               Required in non-TTY mode when overriding

status options:
  --workspace \x3Cdir>        Path to review workspace (required)

Exit codes:
  0   Approved / OK
  1   Revise / max-rounds
  2   Error

Detailed Orchestration (for agent implementation)

Spawning reviewers

step = next-step(workspace)  # action: "review"
response = sessions_spawn(model=step.model, task=step.prompt, timeout=120s)
# Save raw response to workspace/round-{step.round}-response.json
parse-round(workspace, step.round, response_file)

System instruction for reviewer: "You are a senior engineering reviewer. Output ONLY valid JSON matching the schema. No tool calls. No markdown fences. No preamble."

Spawning writers

step = next-step(workspace)  # action: "revise"
revised_plan = sessions_spawn(model=step.model, task=step.prompt, timeout=300s)
# Save raw output as temp file
save-plan(workspace, temp_file, step.planVersion)

System instruction for writer: none needed — the prompt is self-contained.

Error handling

  • Reviewer timeout/failure: retry once, then ask user
  • Writer timeout/failure: retry once, then ask user
  • Parse error on review JSON: re-prompt reviewer once with "Your response was not valid JSON"
  • Max rounds hit: present status to user, ask for override or manual fix

Convergence

The loop converges when the reviewer says APPROVED with no open CRITICAL/HIGH blockers. The script enforces this — if reviewer says APPROVED but blockers remain, it overrides to REVISE.


Static Mode (v1 — backward compatible)

For static mode, the original orchestration from v1 still works:

Step 1 — Init

node review.js init --plan \x3Cfile> --reviewer-model \x3Cm> --planner-model \x3Cm>

Step 2 — Manual loop

For each round: build reviewer prompt from template, spawn reviewer, parse-round, revise plan yourself, continue.

Step 3 — Finalize

Same as alternating mode.


Integration with coding-agent

Before dispatching any plan to coding-agent that:

  • Touches auth, payments, or data models
  • Has 3+ implementation steps
  • The user hasn't already reviewed adversarially

Run cross-model-review first. Only proceed if exit code 0.


Notes

  • Workspace persists in tasks/reviews/ — referenceable later
  • issues.json tracks full lifecycle of all issues
  • meta.json stores mode, models, current round, verdict, needsRevision flag
  • next-step is the state machine — always call it to determine what to do
  • Dedup warnings help catch semantic drift across rounds
  • Models must be from different provider families (cross-provider enforcement)
  • --project-context is injected into reviewer prompts for calibration
安全使用建议
This skill appears to do what it says: it orchestrates an adversarial review loop between two different models and includes on-disk helpers and templates. Before installing or running it, consider the following: (1) Do NOT include secrets, credentials, or PII in plan content or codebase context — those are sent to third-party model APIs. (2) Prefer static/human‑mediated mode for sensitive plans (alternating mode is fully autonomous). (3) Ensure the platform's sessions_spawn uses trusted provider credentials and that you understand where reviewer responses are sent/stored. (4) Review the included scripts (scripts/review.js) and templates yourself — they are provided and straightforward, but you should confirm workspace paths and retention policies meet your security/compliance needs. (5) If you must review sensitive or regulated plans, run this in an isolated environment or redact sensitive fields before invoking the skill.
功能分析
Type: OpenClaw Skill Name: cross-model-review Version: 2.1.0 The cross-model-review skill bundle is a well-architected tool for adversarial AI plan reviews with a strong focus on security and robustness. The core logic in `scripts/review.js` is dependency-free and implements a structured state machine for multi-round reviews, including Jaccard-based issue deduplication and schema validation. Most importantly, the bundle includes explicit prompt-injection mitigations in its templates (e.g., `alternating-reviewer-prompt.md` and `writer-prompt.md`) by wrapping untrusted content in delimiters and providing strict system instructions to sub-agents to treat input as data rather than instructions. No evidence of data exfiltration, malicious execution, or unauthorized persistence was found.
能力评估
Purpose & Capability
Name/description, CLI, templates, and scripts all implement an adversarial cross-model review loop (static and alternating modes). The included Node.js helper (scripts/review.js) manages workspaces, parsing, dedup, and verdicts — this is expected and proportionate to the skill's purpose. There are no unrelated env vars, binaries, or surprising external services requested.
Instruction Scope
SKILL.md instructs the agent to spawn reviewer/writer sub-agents (sessions_spawn) and to save/parse JSON responses; templates explicitly wrap plan content in UNTRUSTED delimiters and require structured JSON output. This stays within the review orchestration scope, but the skill necessarily transmits plan content to third‑party models and relies on instruction-level sandboxing to mitigate prompt injection. The SKILL.md acknowledges that this is a prompt-level protection (not an API-level isolation) and warns of limitations.
Install Mechanism
No install spec; skill is instruction-first and ships helper scripts and tests that run under Node.js >=18. No downloads from external URLs or package-install steps. The codebase claims zero external dependencies and uses only Node stdlib. This is a low-risk install footprint.
Credentials
The skill declares no required environment variables or credentials. It does assume the platform's sessions_spawn mechanism will provide model access (so the platform will use whatever model/provider credentials it normally has). The absence of required secrets is appropriate; however, users must not include secrets/PII in plan or codebase_context because those values will be sent to external model APIs.
Persistence & Privilege
always:false and no special privileges. The skill writes run artifacts only to a workspace directory supplied at init (user-controlled path). It does not request system-wide changes or modify other skills' configurations.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install cross-model-review
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /cross-model-review 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.1.0
Round 0 criteria negotiation: Model A proposes 5 task-specific acceptance criteria, Model B challenges/refines. Agreed criteria injected into all reviewer prompts. New command: save-criteria. Backward compatible.
v2.0.1
v2.0: Alternating mode — models swap writer/reviewer each round. Fully autonomous loop. v2.0.1: Atomic meta writes, error states, save-plan validation. Dogfooded on itself (3-round stress test passed).
v1.2.0
Added Differentiation rubric dimension (#8): scores 0 if plan reads like generic LLM output, 5 if decisions grounded in project context. Verification gate catches baseline slop.
v1.1.0
v1.1.0: rubric-based multi-dimensional scoring (7 dimensions, 0-5 scale), backward compatible, 77 tests passing
v1.0.1
Adversarial plan review with stable issue tracking, dedup, fail-closed gating, 55 tests
元数据
Slug cross-model-review
版本 2.1.0
许可证
累计安装 3
当前安装数 3
历史版本数 5
常见问题

Cross Model Review 是什么?

Adversarial plan review using two different AI models. Supports static mode (fixed roles) and alternating mode (models swap writer/reviewer each round, fully... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 750 次。

如何安装 Cross Model Review?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install cross-model-review」即可一键安装,无需额外配置。

Cross Model Review 是免费的吗?

是的,Cross Model Review 完全免费(开源免费),可自由下载、安装和使用。

Cross Model Review 支持哪些平台?

Cross Model Review 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Cross Model Review?

由 Don-GBot(@don-gbot)开发并维护,当前版本 v2.1.0。

💬 留言讨论