功能描述

Use this skill to detect semantic hallucinations and context drift in LLM outputs. Triggers when an agent or pipeline needs to verify that a generated respon...

使用说明 (SKILL.md)

DCL Semantic Drift Guard

Name: DCL Semantic Drift Guard — Hallucination & Context Drift Detector
Author: daririnch

Publisher: @daririnch · Fronesis Labs
Version: 1.0.0
Part of: Leibniz Layer™ Verification Suite

What this skill does

Semantic Drift Guard compares an LLM-generated response against a trusted source of truth and detects:

Hallucinated facts — claims not present in the source
Logical contradictions — statements that directly conflict with the source
Omission drift — critical information from the source that was silently dropped
Fabricated specifics — invented numbers, dates, names, clauses, or identifiers

It supports two source modes:

context mode — inline document or contract passed directly in the request
kb_query mode — knowledge base lookup via RAG endpoint

Every verification produces a cryptographic audit record compatible with the DCL Evaluator tamper-evident chain.

Verdicts

Verdict	Meaning
`IN_COMMIT`	Response is faithfully grounded in the source. No hallucinations detected. Safe to proceed.
`HALLUCINATION_DRIFT`	Response contains fabricated, contradicted, or unsupported claims. Do not commit. Review `drift_items`.

Input schema

{
  "source_mode": "context" | "kb_query",

  // For source_mode = "context":
  "source_document": "\x3Cfull text of the authoritative document>",

  // For source_mode = "kb_query":
  "kb_endpoint": "\x3CRAG endpoint URL>",
  "kb_query": "\x3Cquery string to retrieve relevant chunks>",

  // Always required:
  "llm_output": "\x3Cthe LLM-generated response to verify>",
  "strictness": "strict" | "balanced" | "lenient",  // default: "balanced"
  "policy": "eu_ai_act" | "gdpr" | "fstek" | "internal" | "none"  // optional
}

Strictness levels

strict — any unverifiable claim triggers HALLUCINATION_DRIFT. Use for contracts, medical, legal, financial outputs.
balanced — minor paraphrasing and reasonable inferences are tolerated. Use for customer support, summaries.
lenient — only direct factual contradictions trigger HALLUCINATION_DRIFT. Use for creative or exploratory outputs.

Output schema

{
  "status": "success" | "error",
  "data": {
    "verdict": "IN_COMMIT" | "HALLUCINATION_DRIFT",
    "confidence": 0.0–1.0,
    "source_mode": "context" | "kb_query",
    "strictness": "strict" | "balanced" | "lenient",
    "policy": "eu_ai_act" | "none" | "...",
    "drift_items": [
      {
        "type": "hallucination" | "contradiction" | "omission" | "fabricated_specific",
        "claim": "\x3Cthe problematic claim in the LLM output>",
        "source_reference": "\x3Crelevant excerpt from source, or null if absent>",
        "severity": "critical" | "major" | "minor"
      }
    ],
    "tx_hash": "\x3CSHA-256 of input+output payload>",
    "timestamp": "ISO-8601",
    "audit_chain_id": "\x3CMerkle leaf ID for DCL Evaluator chain>"
  }
}

drift_items is an empty array [] when verdict is IN_COMMIT.

Verification workflow

When this skill is invoked, follow these steps:

Step 1 — Retrieve source of truth

If source_mode = "context":
Use source_document directly. Chunk it into logical sections for comparison.

If source_mode = "kb_query":
Query the kb_endpoint with kb_query. Retrieve top-k relevant chunks. Treat the union of retrieved chunks as the authoritative source. If the endpoint is unreachable, return status: "error" with reason: "kb_unavailable".

Step 2 — Decompose LLM output into claims

Parse the llm_output into atomic, verifiable claims:

Factual assertions ("The contract states X")
Numerical values ("The penalty is €10,000")
Named entities ("The responsible party is Company A")
Temporal claims ("The deadline is March 15")
Logical conclusions ("Therefore, clause 4.2 applies")

Step 3 — Cross-reference each claim against source

For each claim, determine:

Finding	Classification
Claim is explicitly supported by source	✅ Grounded
Claim is a reasonable paraphrase (strictness: lenient/balanced)	✅ Grounded
Claim introduces information absent from source	⚠️ `hallucination`
Claim directly contradicts source	🚨 `contradiction`
Critical source information was omitted from output	⚠️ `omission`
Specific value (number, date, name) was invented	🚨 `fabricated_specific`

Step 4 — Apply strictness filter

strict: any ⚠️ or 🚨 → HALLUCINATION_DRIFT
balanced: any 🚨, or multiple ⚠️ → HALLUCINATION_DRIFT
lenient: only 🚨 contradiction or fabricated_specific → HALLUCINATION_DRIFT

Step 5 — Compute audit record

Generate:

tx_hash = SHA-256(source_fingerprint + llm_output + verdict + timestamp)
audit_chain_id = Merkle leaf position in DCL Evaluator chain

Return the full output schema.

Interpreting results

IN_COMMIT — safe to proceed

{
  "status": "success",
  "data": {
    "verdict": "IN_COMMIT",
    "confidence": 0.97,
    "drift_items": [],
    "tx_hash": "0xa3f1...c72e",
    "timestamp": "2026-04-09T14:22:00Z",
    "audit_chain_id": "dcl-leaf-0047"
  }
}

The LLM output is faithfully grounded in the source. Log tx_hash to your audit trail.

HALLUCINATION_DRIFT — do not commit

{
  "status": "success",
  "data": {
    "verdict": "HALLUCINATION_DRIFT",
    "confidence": 0.89,
    "drift_items": [
      {
        "type": "fabricated_specific",
        "claim": "The penalty for breach is €50,000.",
        "source_reference": "Section 8.3: The penalty shall not exceed €10,000.",
        "severity": "critical"
      },
      {
        "type": "hallucination",
        "claim": "The agreement includes a 90-day cooling-off period.",
        "source_reference": null,
        "severity": "major"
      }
    ],
    "tx_hash": "0xb8d2...4f91",
    "timestamp": "2026-04-09T14:22:00Z",
    "audit_chain_id": "dcl-leaf-0048"
  }
}

Block the output. Surface drift_items to the human reviewer or trigger a re-generation loop.

Integration patterns

With DCL Policy Enforcer (recommended pipeline)

Run Policy Enforcer first (jailbreak / compliance check), then Semantic Drift Guard (factual grounding):

LLM Output
    │
    ▼
DCL Policy Enforcer ──► REJECT? → Block immediately
    │ COMMIT
    ▼
DCL Semantic Drift Guard ──► HALLUCINATION_DRIFT? → Block / re-generate
    │ IN_COMMIT
    ▼
Safe to deliver

Both tx_hash values are logged to the same DCL Evaluator audit chain, giving end-to-end verifiability.

With DCL Sentinel Trace (full Leibniz Layer™ stack)

Sentinel Trace → strip PII before source reaches LLM
Policy Enforcer → compliance check on output
Semantic Drift Guard → factual grounding check

Standalone (quick RAG validation)

result = dcl_semantic_drift_guard(
    source_mode="kb_query",
    kb_endpoint="https://kb.yourapp.com/query",
    kb_query="penalty clauses breach of contract",
    llm_output=agent_response,
    strictness="strict",
    policy="eu_ai_act"
)

if result["data"]["verdict"] == "HALLUCINATION_DRIFT":
    raise ValueError(f"Drift detected: {result['data']['drift_items']}")

Use cases

Domain	Source mode	Strictness	Why
Legal contract summarization	`context`	`strict`	Fabricated clauses = liability
RAG-based customer support	`kb_query`	`balanced`	Prevent wrong product info
Medical documentation	`context`	`strict`	Patient safety
Financial report generation	`context`	`strict`	Regulatory compliance
EU AI Act compliance auditing	`kb_query`	`strict`	FSTEK / AI Act article mapping
Internal knowledge assistant	`kb_query`	`lenient`	Lower stakes, exploratory

Compliance notes

Audit records are compatible with EU AI Act Article 12 (logging requirements for high-risk AI systems)
tx_hash chain is admissible as tamper-evident evidence under GDPR Article 5(2) accountability principle
All source documents processed in context mode are never stored — only their fingerprint is hashed
Compatible with FSTEK audit trail requirements for AI systems in Russian regulated industries

Privacy & Data Policy

This skill is operated by Fronesis Labs under a strict no-retention data policy.

What is processed: Only the text submitted for evaluation. No user identity, no API keys, no metadata beyond what is required to run the verification.

Retention: Evaluations are processed in-memory only. No text is written to disk, no logs are retained, no data is shared with third parties. The only persistent record is the cryptographic tx_hash and chain_hash — these contain no personal data.

Source documents: Content passed via source_document (context mode) is never stored or logged. Only a cryptographic fingerprint is included in the audit hash.

Infrastructure: Webhook hosted on a private VPS operated solely by Fronesis Labs. No cloud analytics, no third-party processors.

Full policy: https://fronesislabs.com/#privacy · Questions: [email protected]

Related skills

dcl-policy-enforcer — Compliance and jailbreak detection (run before Drift Guard)
dcl-sentinel-trace — PII redaction and identity exposure detection (run before source reaches LLM)

Leibniz Layer™ · Fronesis Labs · fronesislabs.com

安全使用建议

This skill is instruction-only and appears to do what it says: compare LLM output to a provided document or to results fetched from a kb_endpoint. Before using it: (1) only pass sources and kb_endpoint URLs you trust — the skill will query whatever kb_endpoint you provide, so don't point it at untrusted external services or share sensitive documents with unknown endpoints; (2) confirm how you want the DCL audit record handled — the SKILL.md produces a tx_hash and an audit_chain_id but does not specify an external DCL service or publishing step, so if you expect the record to be posted to Fronesis/DCL infrastructure you should request details (endpoint and auth) from the publisher; (3) prefer 'strict' for high-risk outputs (contracts, legal, medical) and understand the strictness tradeoffs. Overall the skill is internally consistent, but verify expected external publishing semantics before relying on its audit-chain claims.

功能分析

Type: OpenClaw Skill Name: dcl-semantic-drift-guard Version: 1.0.0 The skill 'dcl-semantic-drift-guard' is a prompt-based utility designed to guide an AI agent in detecting hallucinations and semantic drift by comparing LLM outputs against a source document or a user-provided RAG endpoint (kb_endpoint). The SKILL.md file outlines a structured workflow for claim decomposition, verification, and the generation of a simulated cryptographic audit record (tx_hash). There is no evidence of malicious intent, data exfiltration to unauthorized domains, or harmful instructions; the network capability is restricted to the user-supplied endpoint for knowledge retrieval, and the overall logic is consistent with its stated purpose of factual grounding.

能力评估

ℹ Purpose & Capability

Name, description, and runtime instructions align: the skill verifies LLM outputs against a provided context or a caller-specified kb_endpoint. One minor mismatch: the SKILL.md promises a DCL Evaluator 'audit_chain_id' / Merkle leaf, but it doesn't specify an external DCL service endpoint, credentials, or how/where the chain is published. This can be harmless if the chain is generated locally, but it should be clarified if the skill is expected to publish records externally.

✓ Instruction Scope

Instructions are narrowly scoped to chunking the provided source (or querying a caller-supplied kb_endpoint), decomposing LLM output into claims, cross-referencing, applying a strictness filter, and computing a tamper-evident hash/record. The skill does not instruct reading unrelated files or environment variables. The only external network activity implied is contacting the kb_endpoint supplied at invocation (expected behavior for RAG).

✓ Install Mechanism

No install spec and no code files — this is instruction-only. Nothing will be downloaded or written to disk by an installer step as part of the skill package.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths. That is proportional to its stated purpose because all source material or RAG endpoints are provided as inputs at invocation.

✓ Persistence & Privilege

always is false and the skill does not request any persistent system privileges or attempt to modify other skills or system-wide settings. Autonomous invocation is allowed by default but that's expected for a skill; nothing here increases privilege beyond normal.

版本历史

v1.0.0

Initial release of DCL Semantic Drift Guard — Hallucination & Context Drift Detector - Compares LLM output with source documents or RAG-retrieved knowledge, detecting unsupported, fabricated, or omitted claims. - Supports both inline context and knowledge base source modes. - Provides configurable strictness levels for different use cases: strict, balanced, and lenient. - Outputs a tamper-evident audit record with drift details, verdict (IN_COMMIT or HALLUCINATION_DRIFT), and cryptographic hash. Part of the Leibniz Layer™ verification suite — designed to compose with DCL Policy Enforcer and DCL Sentinel Trace for end-to-end tamper-evident AI output verification.

元数据

Slug dcl-semantic-drift-guard

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题