功能描述

Enforces task completion: turns your goal into pass/fail criteria, runs a worker, judges the output, feeds back what's missing, and loops until every criteri...

使用说明 (SKILL.md)

Checkmate

Name: Checkmate
Author: insipidpoint

A deterministic Python loop (scripts/run.py) calls an LLM for worker and judge roles. Nothing leaves until it passes — and you stay in control at every checkpoint.

Requirements

OpenClaw platform CLI (openclaw) — must be available in PATH. Used for:
- openclaw gateway call sessions.list — resolve session UUID for turn injection
- openclaw agent --session-id \x3CUUID> — inject checkpoint messages into the live session
- openclaw message send — fallback channel delivery (e.g. Telegram, Signal)
Python 3 — run.py is pure stdlib; no pip packages required
No separate API keys or env vars needed — routes through the gateway's existing OAuth

Security & Privilege Model

⚠️ This is a high-privilege skill. Read before using in batch/automated mode.

Spawned workers and judges inherit full host-agent runtime, including:

exec (arbitrary shell commands)
web_search, web_fetch
All installed skills (including those with OAuth-bound credentials — Gmail, Drive, etc.)
sessions_spawn (workers can spawn further sub-agents)

This means the task description you provide directly controls what the worker does — treat it like code you're about to run, not a message you're about to send.

Batch mode (--no-interactive) removes all human gates. In interactive mode (default), you approve criteria and each checkpoint before the loop continues. In batch mode, criteria are auto-approved and the loop runs to completion autonomously — only use this for tasks and environments you fully trust.

User-input bridging writes arbitrary content to disk. When you reply to a checkpoint, the main agent writes your reply verbatim to user-input.md in the workspace. The orchestrator reads it and acts on it. Don't relay untrusted third-party content as checkpoint replies.

When to Use

Use checkmate when correctness matters more than speed — when "good enough on the first try" isn't acceptable.

Good fits:

Code that must pass tests or meet a spec
Docs or reports that must hit a defined quality bar
Research that must be thorough and cover specific ground
Any task where you'd otherwise iterate manually until satisfied

Trigger phrases (say any of these):

checkmate: TASK
keep iterating until it passes
don't stop until done
until it passes
quality loop: TASK
iterate until satisfied
judge and retry
keep going until done

Architecture

scripts/run.py  (deterministic Python while loop — the orchestrator)
  ├─ Intake loop [up to max_intake_iter, default 5]:
  │    ├─ Draft criteria (intake prompt + task + refinement feedback)
  │    ├─ ⏸ USER REVIEW: show draft → wait for approval or feedback
  │    │     approved? → lock criteria.md
  │    │     feedback? → refine, next intake iteration
  │    └─ (non-interactive: criteria-judge gates instead of user)
  │
  ├─ ⏸ PRE-START GATE: show final task + criteria → user confirms "go"
  │         (edit task / cancel supported here)
  │
  └─ Main loop [up to max_iter, default 10]:
       ├─ Worker: spawn agent session → iter-N/output.md
       │          (full runtime: exec, web_search, all skills, OAuth auth)
       ├─ Judge:  spawn agent session → iter-N/verdict.md
       ├─ PASS?  → write final-output.md, notify user, exit
       └─ FAIL?  → extract gaps → ⏸ CHECKPOINT: show score + gaps to user
                     continue?  → next iteration (with judge gaps)
                     redirect:X → next iteration (with user direction appended)
                     stop?      → end loop, take best result so far

Interactive mode (default): user approves criteria, confirms pre-start, and reviews each FAIL checkpoint. Batch mode (--no-interactive): fully autonomous; criteria-judge gates intake, no checkpoints.

User Input Bridge

When the orchestrator needs user input, it:

Writes workspace/pending-input.json (kind + workspace path)
Sends a notification via --recipient and --channel
Polls workspace/user-input.md every 5s (up to --checkpoint-timeout minutes)

The main agent acts as the bridge: when pending-input.json exists and the user replies, the agent writes their response to user-input.md. The orchestrator picks it up automatically.

Each agent session is spawned via:

openclaw agent --session-id \x3Cisolated-id> --message \x3Cprompt> --timeout \x3CN> --json

Routes through the gateway WebSocket using existing OAuth — no separate API key. Workers get full agent runtime: exec, web_search, web_fetch, all skills, sessions_spawn.

Your Job (main agent)

When checkmate is triggered:

Get your session UUID (for direct agent-turn injection):

openclaw gateway call sessions.list --params '{"limit":1}' --json \
  | python3 -c "import json,sys; s=json.load(sys.stdin)['sessions'][0]; print(s['sessionId'])"

Also note your --recipient (channel user/chat ID) and --channel as fallback.

Create workspace:
```
bash \x3Cskill-path>/scripts/workspace.sh /tmp "TASK"
```
Prints the workspace path. Write the full task to workspace/task.md if needed.

Run the orchestrator (background exec):

python3 \x3Cskill-path>/scripts/run.py \
  --workspace /tmp/checkmate-TIMESTAMP \
  --task "FULL TASK DESCRIPTION" \
  --max-iter 10 \
  --session-uuid YOUR_SESSION_UUID \
  --recipient YOUR_RECIPIENT_ID \
  --channel \x3Cyour-channel>

Use exec with background=true. This runs for as long as needed. Add --no-interactive for fully autonomous runs (no user checkpoints).

Tell the user checkmate is running, what it's working on, and that they'll receive criteria drafts and checkpoint messages via your configured channel to review and approve.
Bridge user replies: When user responds to a checkpoint message, check for pending-input.json and write their response to workspace/user-input.md.

Bridging User Input

When a checkpoint message arrives (the orchestrator sent the user a criteria/approval/checkpoint request), bridge their reply:

# Find active pending input
cat \x3Cworkspace-parent>/checkmate-*/pending-input.json 2>/dev/null

# Route user's reply
echo "USER REPLY HERE" > /path/to/workspace/user-input.md

The orchestrator polls for this file every 5 seconds. Once written, it resumes automatically and deletes the file.

Accepted replies at each gate:

Gate	Continue	Redirect	Cancel
Criteria review	"ok", "approve", "lgtm"	any feedback text	—
Pre-start	"go", "start", "ok"	"edit task: NEW TASK"	"cancel"
Iteration checkpoint	"continue", (empty)	"redirect: DIRECTION"	"stop"

Parameters

Flag	Default	Notes
`--max-intake-iter`	5	Intake criteria refinement iterations
`--max-iter`	10	Main loop iterations (increase to 20 for complex tasks)
`--worker-timeout`	3600s	Per worker session
`--judge-timeout`	300s	Per judge session
`--session-uuid`	—	Agent session UUID (from `sessions.list`); used for direct turn injection — primary notification path
`--recipient`	—	Channel recipient ID (e.g. user/chat ID, E.164 phone number); fallback if injection fails
`--channel`	—	Delivery channel for fallback notifications (e.g. `telegram`, `whatsapp`, `signal`)
`--no-interactive`	off	Disable user checkpoints (batch mode)
`--checkpoint-timeout`	60	Minutes to wait for user reply at each checkpoint

Workspace layout

memory/checkmate-YYYYMMDD-HHMMSS/
├── task.md               # task description (user may edit pre-start)
├── criteria.md           # locked after intake
├── feedback.md           # accumulated judge gaps + user direction
├── state.json            # {iteration, status} — resume support
├── pending-input.json    # written when waiting for user; deleted after response
├── user-input.md         # agent writes user's reply here; read + deleted by orchestrator
├── intake-01/
│   ├── criteria-draft.md
│   ├── criteria-verdict.md  (non-interactive only)
│   └── user-feedback.md     (interactive: user's review comments)
├── iter-01/
│   ├── output.md         # worker output
│   └── verdict.md        # judge verdict
└── final-output.md       # written on completion

Resume

If the script is interrupted, just re-run it with the same --workspace. It reads state.json and skips completed steps. Locked criteria.md is reused; completed iter-N/output.md files are not re-run.

Prompts

Active prompts called by run.py:

prompts/intake.md — converts task → criteria draft
prompts/criteria-judge.md — evaluates criteria quality (APPROVED / NEEDS_WORK) — used in non-interactive mode
prompts/worker.md — worker prompt (variables: TASK, CRITERIA, FEEDBACK, ITERATION, MAX_ITER, OUTPUT_PATH)
prompts/judge.md — evaluates output against criteria (PASS / FAIL)

Reference only (not called by run.py):

prompts/orchestrator.md — architecture documentation explaining the design rationale

安全使用建议

This skill appears to implement what it claims (a deterministic orchestration loop that spawns worker and judge agent sessions), but it requires high runtime privileges and implicitly uses the platform's OAuth and other installed skills. Before installing or running it: - Confirm the 'openclaw' CLI and Python 3 requirement (registry metadata incorrectly lists no required binaries). - Run only in interactive mode by default; avoid --no-interactive / batch mode unless you fully trust the task and environment. - Do not pass secrets or sensitive credentials inside the task text or workspace. - Audit the included scripts (scripts/run.py and workspace.sh) yourself (they are small and present in the package). Pay special attention to any places where the orchestrator injects messages or writes files. - Consider running first in an isolated environment (a throwaway agent account or restricted OpenClaw instance) that does not have OAuth access to sensitive skills (email, Drive, cloud provider connectors). - If you must run in production, restrict which skills/credentials are installed on that agent gateway or require manual checkpoints for every iteration. Because workers inherit broad capabilities and the skill can bridge user replies to disk and inject live session turns, treat it like code execution: only run with explicit, limited trust and proper operational safeguards.

功能分析

Type: OpenClaw Skill Name: checkmate Version: 2.0.4 The OpenClaw 'checkmate' skill is classified as suspicious due to a significant prompt injection vulnerability that could lead to Remote Code Execution (RCE). The skill is explicitly declared as 'high-privilege' in SKILL.md and README.md, granting spawned worker agents full host-agent runtime, including `exec` (arbitrary shell commands). User-provided input (initial task, task edits, and iteration feedback) is directly incorporated into the prompts for these worker agents (e.g., `{{TASK}}`, `{{FEEDBACK}}` in `prompts/worker.md`). A malicious user could craft input that, when injected into the agent's prompt, causes the agent to execute arbitrary commands on the host system. While the skill's documentation transparently warns about these risks and the `scripts/run.py` orchestrator uses `subprocess.run` safely for its own CLI calls, the design inherently allows for RCE via prompt injection against the agent, making it a critical vulnerability rather than intentional malware.

能力评估

ℹ Purpose & Capability

The skill name/description (iterative worker→judge loop) align with its code and runtime instructions: it spawns agent sessions via the OpenClaw CLI, judges outputs, and notifies users. However the registry metadata claims no required binaries/env vars, while SKILL.md and run.py clearly require the 'openclaw' CLI and Python 3. This metadata mismatch is an incoherence you should confirm with the publisher.

⚠ Instruction Scope

The orchestrator injects turns into live sessions and spawns worker/judge agent sessions that 'inherit full host-agent runtime' (exec, web_search, web_fetch, all skills including OAuth-bound credentials, and sessions_spawn). The SKILL.md and run.py explicitly implement a bridging mechanism that instructs an agent to write user replies to disk. Those instructions are coherent with the skill's purpose, but they give any worker you spawn broad access to local and connected resources — enough to read or use other skills' credentials or execute arbitrary commands if the task or prompts are malicious or malformed.

✓ Install Mechanism

There is no automated install script or remote download; the skill is distributed as files and expects 'openclaw' in PATH and Python 3. No external URLs or archive extraction are used. That lowers install-time risk, but you still run a local Python script that performs networked actions.

⚠ Credentials

The skill declares no required environment variables, which is reasonable, but it depends on the platform's OAuth (gateway) and on other installed skills' credentials implicitly — and worker sessions explicitly inherit access to those OAuth-bound skills. That is a powerful capability: a worker could (if instructed) use Gmail/Drive/other skills or spawn further sessions. The implicit use of gateway OAuth and the absence of explicit credential disclaimers in metadata are notable and should be evaluated before granting trust.

⚠ Persistence & Privilege

The skill is not forced-always, but it spawns background processes and agent sessions that inherit full runtime privileges. It supports a batch (--no-interactive) mode that removes human checkpoints, enabling fully autonomous operation with access to OAuth-bound skills. Combined with the worker privilege model, this gives a high blast radius if a run is misconfigured or given an untrusted task.

版本历史

v2.0.4

Redesign intake: goal statement (plain English) replaces checklist. Judge evaluates holistically against goal, not a spec sheet. User reviews a paragraph, not 10 bullet points.

v2.0.3

Fix checkpoint delivery: injection is now an action command instructing agent to relay via message tool and bridge user reply, not a passive notification

v2.0.2

Add Requirements and Security sections; declare openclaw CLI as required dependency; document worker privilege inheritance and batch mode risks

v2.0.1

Improved description clarity: emphasizes enforcement loop and criteria-based completion

v2.0.0

Complete rework: fixed notification delivery (openclaw message send), agent-turn injection via session UUID, fixed worker output capture bug, removed all env-specific refs, workspace.sh defaults to /tmp, description fits Telegram 256 char limit

v1.4.0

Default max-iter reduced to 10 (worst case ~10.8h). Use --max-iter up to 20 for complex tasks.

v1.1.0

Full implementation: orchestrator sub-agent manages intake→worker→judge loop. Workers spawn as sub-agents per iteration. Judge runs inline. State persisted to workspace files for resume support. Supports up to 20 iterations (configurable), hours of runtime.

v1.0.0

Initial release: intake → worker → judge loop with PASS/FAIL criteria and gap feedback. Max 5 iterations by default.

元数据

Slug checkmate

版本 2.0.4

许可证 —

累计安装 3

当前安装数 3

历史版本数 8

常见问题

Checkmate 是什么？

Enforces task completion: turns your goal into pass/fail criteria, runs a worker, judges the output, feeds back what's missing, and loops until every criteri... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 894 次。

如何安装 Checkmate？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install checkmate」即可一键安装，无需额外配置。

Checkmate 是免费的吗？

是的，Checkmate 完全免费（开源免费），可自由下载、安装和使用。

Checkmate 支持哪些平台？

Checkmate 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Checkmate？

由 Shiwei Song（@insipidpoint）开发并维护，当前版本 v2.0.4。

Checkmate

Checkmate

Requirements

Security & Privilege Model

When to Use

Architecture

User Input Bridge

Your Job (main agent)

Bridging User Input

Parameters

Workspace layout

Resume

Prompts

Checkmate 是什么？

如何安装 Checkmate？

Checkmate 是免费的吗？

Checkmate 支持哪些平台？

谁开发了 Checkmate？

💬 留言讨论