Agent QA Gates
/install agent-qa-gates
Agent QA Gates
A field-tested validation system for AI agent output. Born from production failures, not theory.
Quick Start
Before any agent delivers output, run the Pre-Ship Checklist:
- Accurate? — every number/date/metric has a source. Unsourced → prefix "estimated"
- Complete? — no missing pieces, no "I'll do that next"
- Actionable? — ends with clear next step or decision point
- Fits the channel? — check character limits for your delivery surface
- No leaks? — no internal context, private data, or secrets
- Not a duplicate? — verify no recent identical send
- Would the human be embarrassed? — if yes, don't ship
Gate Tiers
Four ascending tiers by risk level:
| Gate | Scope | Key Checks |
|---|---|---|
| Gate 0 | Internal (files, config, memory) | Mechanism changed not just text, no placeholders, file exists |
| Gate 1 | Human-facing (briefings, summaries) | Key info in first 2 lines, ≤3-line paragraphs, channel length limits |
| Gate 2 | External (email, public content, client materials) | No internal context leaked, recipient-appropriate tone, dedup check |
| Gate 3 | Code & technical | Builds clean, no secrets in code, error handling, tests pass |
See references/gates-detail.md for full gate checklists.
Severity Classification
Not all failures are equal:
- 🔴 BLOCK — cannot ship (secrets, privacy, hallucinated data, wrong recipient)
- 🟡 FIX — fix before shipping, \x3C2 min (formatting, too long, missing citation)
- 🟢 NOTE — log and ship (style preference, minor optimization)
Protocol Gates
Recurring failure modes need dedicated gates. These are the most common:
Heartbeat / Periodic Check Output
- Binary output: alert text ONLY or status-OK ONLY. Never mixed.
- Every data point verified by current-session tool call. No hallucinated metrics.
- No stale data from previous cycles or pre-compaction sessions.
Post-Compaction / Context Reset
- Do not trust facts from the pre-reset session — verify from files and tools.
- Rerun pending checks from scratch.
- Zero carryover for periodic checks.
Scheduled Job / Cron Changes
- Explicit timeout set
- Explicit model set
- Verify schedule after creation
- Output fits destination channel limits
Sub-Agent Output Review
- Does output match the brief's success criteria?
- Any uncertainty flags unresolved?
- Is the reasoning (not just the conclusion) sound?
Isolated Agent / Cron Output (real-world data)
For any cron or sub-agent that reports external data without orchestrator review:
- Did the agent make a verifiable live tool call? Is the raw response traceable?
- Any names, dates, amounts, or IDs that can't be traced to a tool result? → 🔴 BLOCK
- If tool call failed: output must be
DATA_UNAVAILABLE — [reason], not fabricated data - Does the cron prompt include the Real-World Data Verification Rule? Severity: Fabricated real-world data = 🔴 BLOCK. Same as hallucinated metrics.
Delegated Work Acceptance
For any non-trivial delegated task (especially builds, audits, config changes, or external deliverables):
- Does the handoff include a clear artifact path or proof object?
- Did the worker report exact commands run rather than vague claims?
- Did verification actually happen, with results stated?
- Is the output non-empty and specific, not just "done" or "completed successfully"?
- Are known gaps / next actions named explicitly?
- If the handoff is empty, artifact-free, or self-certifying without proof → 🔴 BLOCK
- Valid dispositions:
Done,Revision Needed,Blocked,Failed,Stale
Silent Worker / Stale Task Classification
For delegated work that appears to be running:
- Was the spawn actually accepted? If not, it is not running.
- No start signal within 10 minutes after accepted spawn →
Stale - No materially new output for 30 minutes on active work →
Staleunless the task explicitly justifies a longer quiet window - Stale work must be investigated, respawned, or escalated — never left as indefinite
In Progress
Gate Evolution
Gates should evolve based on real failures, not imagination:
- When a failure occurs → log it with root cause
- Same failure class occurs 2+ times → add a gate item
- Monthly: prune gates that haven't caught anything in 60 days
Anti-Patterns
- Gates that sound good but never catch anything → kill them
- Per-agent checklists that duplicate general gates → merge or reference
- "ADHD-friendly" or "high-quality" as gate items → not testable, replace with mechanical checks
- Aspirational gates nobody runs → either automate or cut
Adapting to Your System
This skill provides the pattern. Adapt it:
- Start with the Pre-Ship Checklist — it works for any agent system
- Add Protocol Gates for your top 3 recurring failure modes
- Set channel limits for your delivery surfaces
- Map real failures to gates — if a failure isn't gated, add the gate
- Kill gates that never fire — a shorter, sharper checklist wins
For the full reference implementation, see references/gates-detail.md.
For automation scripts, see scripts/qa-check.sh.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install agent-qa-gates - 安装完成后,直接呼叫该 Skill 的名称或使用
/agent-qa-gates触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Agent QA Gates 是什么?
Output validation gates for AI agent systems. Prevents hallucinated data, leaked internal context, wrong formats, duplicate sends, post-compaction drift, and... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 319 次。
如何安装 Agent QA Gates?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install agent-qa-gates」即可一键安装,无需额外配置。
Agent QA Gates 是免费的吗?
是的,Agent QA Gates 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Agent QA Gates 支持哪些平台?
Agent QA Gates 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Agent QA Gates?
由 Don Zurbrick(@zurbrick)开发并维护,当前版本 v1.2.0。