Description

Generates evidence obligations for a claim or action, evaluates existing evidence against them, and returns a structured verdict (PASS / SOFT_PASS / BLOCK /...

README (SKILL.md)

Evidence Gate

Name: Evidence Gate
Author: shanicky

Use this skill to insert a lightweight evidence gate into an existing workflow without replacing the workflow.

Its purpose is not to make the caller more cautious — capable agents are already cautious. Its purpose is to make that caution structured, auditable, and actionable by answering a narrower question:

What evidence must exist before this conclusion or action is responsible enough to present, recommend, or execute?

Treat the caller's conclusion or action as tentative until the gate returns a verdict.

Keep the skill lightweight, selective, and non-blocking by default.

Scope

This skill gates the agent's own reasoning quality — not the user's intent.

It is not:

content moderation or policy enforcement
user intent classification (allow / refuse / clarify)
a legal, compliance, or safety advisory tool
a replacement for domain expertise

Core idea

Given a tentative claim or action, do three things:

Define the minimum evidence obligations for that claim/action.
Check what evidence already exists and what is still missing or conflicting.
Return a verdict and a safe next-step policy.

Do not fully own evidence collection. Recommend missing evidence for the caller to gather using its existing tools.

Operating model

Use a single-pass gate instead of taking over the full workflow:

The caller reaches a tentative claim, diagnosis, recommendation, or action.
Generate the evidence obligations for that candidate.
Evaluate only the evidence currently available in the invocation.
Return a final verdict for this invocation:
- whether the current evidence is sufficient
- how the caller should downgrade if it is not
- which next evidence checks would be most valuable
Exit.

Assume no durable skill state across calls. Do not require a second gate pass unless the caller explicitly chooses to orchestrate one outside this skill.

When to use

Use this skill when one or more of the following are true:

The caller is about to make a strong claim such as:
- "the root cause is X"
- "this is safe"
- "this configuration should be changed"
- "the correct action is Y"
The caller is about to recommend or execute a high-impact step such as:
- rollback
- scale up/down
- delete/disable/quarantine
- approve/reject
- change production configuration
The current conclusion appears to rely on only one signal, one log line, one chart, or one tool result.
Competing explanations have not been checked.
The user explicitly asks for an evidence-backed answer.
The environment or workflow has a policy requiring stronger justification before action.

When NOT to use

Do not use this skill when:

The output is low-risk and easily reversible.
The task is simple summarization or formatting.
The caller is brainstorming possibilities and is not presenting a conclusion as established.
The additional delay or cost of gating would outweigh the value.
The caller already has an explicit evidence-validation layer for this exact step.

Design constraints

This skill must preserve the caller's original capability as much as possible.

It should:

be selective rather than always-on
avoid taking over the entire workflow
avoid forcing chain-of-thought disclosure
avoid blocking work unless a real risk threshold is crossed
prefer downgrade/fallback over hard failure
assume each invocation is stateless

Integration policy

Apply these defaults unless the caller provides stricter policy:

Run the gate only at conclusion points or before high-impact actions.
Generate only 2-5 concrete evidence obligations.
Evaluate only the evidence explicitly present in the current invocation.
Return one final verdict for the current invocation.
If evidence is insufficient, downgrade or defer instead of spinning.
Keep domain ownership with the caller.
Judge only explicit artifacts, not hidden reasoning.

Input contract

The only required input is the claim — the conclusion, diagnosis, recommendation, or action under consideration.

Invocation examples:

/evidence-gate "The root cause is a nil dereference in request parsing"
/evidence-gate "Safe to delete the staging database"
Agent self-trigger: the agent recognizes a gate-worthy moment and invokes the skill with the current claim from context.

When invoked with just a claim, the skill infers the remaining context:

claim_type: inferred from the claim language (e.g., "the cause is" → diagnosis, "safe to" → safety, "should delete" → action)
domain: inferred from the current working context
risk_level: inferred from the action's reversibility and blast radius
execution_mode: inferred from whether the caller is informing, recommending, or about to execute
target_strength: inferred from the claim's language strength

The caller may optionally provide any of these fields to override inference. Use references/input-template.md when a caller wants a canonical explicit input shape. See references/protocol.md for the full schema semantics.

Output contract

The skill should return a structured gate result containing:

whether a gate is required
why the gate is required
evidence requirements
per-requirement status
missing evidence
conflicting evidence
sufficiency rule
verdict
allowed next actions
blocked next actions
fallback behavior
suggested caller wording when evidence is insufficient
next evidence actions

Return JSON matching references/output-template.md. Use references/verdict-schema.json as the machine-checkable schema. Keep gate_required even on explicit invocation. Use gate_required = false as a fast exit when the claim is already low-risk, exploratory, or sufficiently bounded.

Verdict states

Use exactly these verdicts:

PASS
- Evidence is sufficient for the intended claim/action.
SOFT_PASS
- Evidence is incomplete, but sufficient for a weaker claim, advisory output, or low-risk continuation.
BLOCK
- Evidence is insufficient for the intended strength or risk level. High-impact continuation should not proceed.
CONFLICT
- Evidence materially disagrees or supports multiple competing interpretations. The caller should not present a strong conclusion as settled.

Required behavior

1. Normalize the candidate

Reduce the caller's current position to a tentative, explicit candidate. If the caller already states the final conclusion as settled, rewrite it internally as tentative before gating it.

2. Define evidence obligations

Translate the candidate claim/action into a small set of concrete evidence requirements.

Good evidence requirements are:

specific
externally checkable
operationally gatherable
tied to the claim, not generic boilerplate

Bad evidence requirements are vague, such as:

"get more proof"
"verify better"
"be more certain"

3. Evaluate sufficiency

Determine whether currently known evidence satisfies the requirements.

The skill should explicitly mark:

satisfied
missing
conflicting
not_applicable

4. Produce a final verdict for the current invocation

Return a verdict immediately after evaluating known evidence. If evidence is missing, identify only the smallest set of additional checks that would materially change the verdict.

5. Prefer downgrade over dead stop

If evidence is insufficient, prefer one of:

provisional conclusion
candidate hypotheses
advisory-only output
ask-for-human-review
request-more-evidence plan

Do not hard-block low-risk work unnecessarily.

6. Assume stateless execution

Assume every call is fresh. Do not depend on remembering prior requirements, prior verdicts, or prior collection attempts unless the caller explicitly embeds them in the current input.

7. Avoid hidden-reasoning dependence

Do not require access to hidden chain-of-thought. Judge only from explicit claim, explicit evidence, explicit policy, and explicit outputs.

Suggested workflow

Receive normalized candidate claim/action.
Decide whether gating is required.
If no gate is required, return PASS with rationale.
If a gate is required:
- generate evidence requirements
- evaluate known evidence
- identify gaps and conflicts
- apply a sufficiency rule
- produce a final verdict for this invocation
- produce fallback and next-step guidance
Return a structured result without taking over execution.

Default trigger heuristics

Bias toward using this skill when any of the following are present:

risk_level = high
execution_mode = auto
claim language is strong or definitive
only one evidence source supports the claim
no competing hypothesis check exists
action is costly, irreversible, or externally visible

Bias away from using this skill when:

risk_level = low
the output is exploratory, not conclusive
the result is easy to reverse
the task is primarily formatting or summarization

Default fallback policy

When the gate does not fully pass, prefer these downgrades:

intended strong conclusion -> provisional conclusion
automatic action -> advisory recommendation
settled diagnosis -> candidate hypotheses
irreversible operation -> human approval required
insufficient current evidence -> stop and return a bounded next-evidence plan

Output style guidance

When the verdict is not PASS, the caller should avoid overstating certainty.

Good examples:

"Current evidence suggests X, but this is not yet sufficiently established."
"This is a plausible diagnosis, not a confirmed root cause."
"Evidence is currently insufficient for automatic execution."
"Additional evidence is needed before recommending Y with confidence."

Bad examples:

"This is definitely the cause" when key evidence is missing
"Safe to proceed" when competing evidence exists

Example use cases

SRE: Before recommending scale-up, verify that bottleneck evidence is real and alternative explanations were checked.
Coding: Before claiming a bug root cause, verify reproduction path, code-path match, and at least one falsified alternative.
Security: Before declaring an action safe, require policy match, scope confirmation, and risk checks.
Research: Before presenting a strong conclusion, require source support and contradiction checks.

Non-goals

This skill is not:

a universal orchestrator
a replacement for domain expertise
a guarantee of correctness
a hidden chain-of-thought inspector
a mandatory wrapper around every agent step

Its job is narrower: make evidence obligations explicit, assess whether they are met, and enforce safe downgrade behavior when they are not.

Usage Guidance

This skill appears coherent and low-risk because it's instruction-only, stateless, and asks for no credentials. Before installing or enabling it broadly, consider: (1) whether you want agents to invoke it implicitly — if not, disable implicit invocation in your agent policy; (2) ensure callers pass explicit known_evidence when you need precise, auditable checks (relying on inference from ambient context can produce weaker or surprising results); (3) test the gate on non-critical, reversible claims first so you can confirm its downgrade behavior and phrasing fit your workflow; and (4) remember the skill is not a compliance/legal/safety oracle — keep human approval in the loop for high-impact actions.

Capability Analysis

Type: OpenClaw Skill Name: evidence-gate Version: 0.0.2 The 'evidence-gate' skill is a reasoning framework designed to improve agent reliability by requiring structured evidence for claims and high-risk actions. It defines a clear protocol for evaluating evidence and providing verdicts (PASS/BLOCK/etc.) without performing any sensitive system operations, network calls, or data access. The bundle consists entirely of documentation, schemas, and templates (SKILL.md, protocol.md, verdict-schema.json) aimed at enforcing logical rigor and safety guardrails.

Capability Assessment

✓ Purpose & Capability

The name and description (creating evidence obligations, evaluating provided evidence, returning a verdict) match the provided files (protocol, templates) and the skill requests nothing (no env vars, binaries, or installs). The included templates/schemas support the stated purpose.

ℹ Instruction Scope

SKILL.md confines evaluation to evidence explicitly provided in the invocation and instructs the gate to be stateless and non-invasive. One area to watch: the skill will 'infer' context when only a claim is passed, which could produce broader inferences if the caller's agent supplies a lot of ambient context. The skill itself does not instruct reading system files, secrets, or external endpoints.

✓ Install Mechanism

No install spec and no code files to execute; instruction-only skills are low-risk from an install perspective.

✓ Credentials

The skill requires no environment variables, credentials, or config paths and does not ask for secrets; requested scope is minimal and appropriate for an opinionated, stateless reasoning helper.

✓ Persistence & Privilege

always is false and there are no requests to modify other skills or persist state. agents/openai.yaml allows implicit invocation (allow_implicit_invocation: true), which means an agent may call the gate automatically at conclusion points — this is consistent with the skill's intended use but is an operational choice the integrator should consider.

Version History

v0.0.2

evidence-gate 0.0.2 - Add Scope section to SKILL.md to clarify this skill gates agent reasoning quality, not user intent or content moderation. - Add "what it does" to frontmatter description (generates evidence obligations, evaluates evidence, returns structured verdict). - Add TRIGGER / DO NOT TRIGGER phrases to frontmatter description for better automatic invocation.

v0.0.1

evidence-gate 0.0.1 – Initial release - Introduces a lightweight, stateless, single-pass evidence gating skill for responsible claims, diagnoses, recommendations, and actions. - Generates evidence obligations for a claim or action, evaluates existing explicit evidence, identifies gaps or conflicts, and returns a single structured verdict (`PASS`, `SOFT_PASS`, `BLOCK`, `CONFLICT`). - Provides clear, actionable downgrade, defer, or next-evidence guidance when evidence is insufficient. - Designed for selective use at high-impact decision points, with fast exit behavior for low-risk or already bounded cases to minimize workflow disruption. - Includes canonical input/output templates, a machine-checkable verdict schema, and OpenAI agent metadata for discovery.

Metadata

Slug evidence-gate

Version 0.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Evidence Gate?

Generates evidence obligations for a claim or action, evaluates existing evidence against them, and returns a structured verdict (PASS / SOFT_PASS / BLOCK /... It is an AI Agent Skill for Claude Code / OpenClaw, with 223 downloads so far.

How do I install Evidence Gate?

Run "/install evidence-gate" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Evidence Gate free?

Yes, Evidence Gate is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Evidence Gate support?

Evidence Gate is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Evidence Gate?

It is built and maintained by Shanicky Chen (@shanicky); the current version is v0.0.2.

More Skills

Evidence Gate

Evidence Gate

Scope

Core idea

Operating model

When to use

When NOT to use

Design constraints

Integration policy

Input contract

Output contract

Verdict states

Required behavior

1. Normalize the candidate

2. Define evidence obligations

3. Evaluate sufficiency

4. Produce a final verdict for the current invocation

5. Prefer downgrade over dead stop

6. Assume stateless execution

7. Avoid hidden-reasoning dependence

Suggested workflow

Default trigger heuristics

Default fallback policy

Output style guidance

Example use cases

Non-goals

What is Evidence Gate?

How do I install Evidence Gate?

Is Evidence Gate free?

Which platforms does Evidence Gate support?

Who created Evidence Gate?

💬 Comments