Agent QA Gates
/install agent-qa-gates
Agent QA Gates
A field-tested validation system for AI agent output. Born from production failures, not theory.
Quick Start
Before any agent delivers output, run the Pre-Ship Checklist:
- Accurate? — every number/date/metric has a source. Unsourced → prefix "estimated"
- Complete? — no missing pieces, no "I'll do that next"
- Actionable? — ends with clear next step or decision point
- Fits the channel? — check character limits for your delivery surface
- No leaks? — no internal context, private data, or secrets
- Not a duplicate? — verify no recent identical send
- Would the human be embarrassed? — if yes, don't ship
Gate Tiers
Four ascending tiers by risk level:
| Gate | Scope | Key Checks |
|---|---|---|
| Gate 0 | Internal (files, config, memory) | Mechanism changed not just text, no placeholders, file exists |
| Gate 1 | Human-facing (briefings, summaries) | Key info in first 2 lines, ≤3-line paragraphs, channel length limits |
| Gate 2 | External (email, public content, client materials) | No internal context leaked, recipient-appropriate tone, dedup check |
| Gate 3 | Code & technical | Builds clean, no secrets in code, error handling, tests pass |
See references/gates-detail.md for full gate checklists.
Severity Classification
Not all failures are equal:
- 🔴 BLOCK — cannot ship (secrets, privacy, hallucinated data, wrong recipient)
- 🟡 FIX — fix before shipping, \x3C2 min (formatting, too long, missing citation)
- 🟢 NOTE — log and ship (style preference, minor optimization)
Protocol Gates
Recurring failure modes need dedicated gates. These are the most common:
Heartbeat / Periodic Check Output
- Binary output: alert text ONLY or status-OK ONLY. Never mixed.
- Every data point verified by current-session tool call. No hallucinated metrics.
- No stale data from previous cycles or pre-compaction sessions.
Post-Compaction / Context Reset
- Do not trust facts from the pre-reset session — verify from files and tools.
- Rerun pending checks from scratch.
- Zero carryover for periodic checks.
Scheduled Job / Cron Changes
- Explicit timeout set
- Explicit model set
- Verify schedule after creation
- Output fits destination channel limits
Sub-Agent Output Review
- Does output match the brief's success criteria?
- Any uncertainty flags unresolved?
- Is the reasoning (not just the conclusion) sound?
Isolated Agent / Cron Output (real-world data)
For any cron or sub-agent that reports external data without orchestrator review:
- Did the agent make a verifiable live tool call? Is the raw response traceable?
- Any names, dates, amounts, or IDs that can't be traced to a tool result? → 🔴 BLOCK
- If tool call failed: output must be
DATA_UNAVAILABLE — [reason], not fabricated data - Does the cron prompt include the Real-World Data Verification Rule? Severity: Fabricated real-world data = 🔴 BLOCK. Same as hallucinated metrics.
Delegated Work Acceptance
For any non-trivial delegated task (especially builds, audits, config changes, or external deliverables):
- Does the handoff include a clear artifact path or proof object?
- Did the worker report exact commands run rather than vague claims?
- Did verification actually happen, with results stated?
- Is the output non-empty and specific, not just "done" or "completed successfully"?
- Are known gaps / next actions named explicitly?
- If the handoff is empty, artifact-free, or self-certifying without proof → 🔴 BLOCK
- Valid dispositions:
Done,Revision Needed,Blocked,Failed,Stale
Silent Worker / Stale Task Classification
For delegated work that appears to be running:
- Was the spawn actually accepted? If not, it is not running.
- No start signal within 10 minutes after accepted spawn →
Stale - No materially new output for 30 minutes on active work →
Staleunless the task explicitly justifies a longer quiet window - Stale work must be investigated, respawned, or escalated — never left as indefinite
In Progress
Gate Evolution
Gates should evolve based on real failures, not imagination:
- When a failure occurs → log it with root cause
- Same failure class occurs 2+ times → add a gate item
- Monthly: prune gates that haven't caught anything in 60 days
Anti-Patterns
- Gates that sound good but never catch anything → kill them
- Per-agent checklists that duplicate general gates → merge or reference
- "ADHD-friendly" or "high-quality" as gate items → not testable, replace with mechanical checks
- Aspirational gates nobody runs → either automate or cut
Adapting to Your System
This skill provides the pattern. Adapt it:
- Start with the Pre-Ship Checklist — it works for any agent system
- Add Protocol Gates for your top 3 recurring failure modes
- Set channel limits for your delivery surfaces
- Map real failures to gates — if a failure isn't gated, add the gate
- Kill gates that never fire — a shorter, sharper checklist wins
For the full reference implementation, see references/gates-detail.md.
For automation scripts, see scripts/qa-check.sh.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install agent-qa-gates - After installation, invoke the skill by name or use
/agent-qa-gates - Provide required inputs per the skill's parameter spec and get structured output
What is Agent QA Gates?
Output validation gates for AI agent systems. Prevents hallucinated data, leaked internal context, wrong formats, duplicate sends, post-compaction drift, and... It is an AI Agent Skill for Claude Code / OpenClaw, with 319 downloads so far.
How do I install Agent QA Gates?
Run "/install agent-qa-gates" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Agent QA Gates free?
Yes, Agent QA Gates is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Agent QA Gates support?
Agent QA Gates is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Agent QA Gates?
It is built and maintained by Don Zurbrick (@zurbrick); the current version is v1.2.0.