← 返回 Skills 市场
daririnch

DCL Semantic Drift Guard — Hallucination & Context Drift Detector

作者 Dari Rinch · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
161
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install dcl-semantic-drift-guard
功能描述
Use this skill to detect semantic hallucinations and context drift in LLM outputs. Triggers when an agent or pipeline needs to verify that a generated respon...
使用说明 (SKILL.md)

DCL Semantic Drift Guard

Publisher: @daririnch · Fronesis Labs
Version: 1.0.0
Part of: Leibniz Layer™ Verification Suite


What this skill does

Semantic Drift Guard compares an LLM-generated response against a trusted source of truth and detects:

  • Hallucinated facts — claims not present in the source
  • Logical contradictions — statements that directly conflict with the source
  • Omission drift — critical information from the source that was silently dropped
  • Fabricated specifics — invented numbers, dates, names, clauses, or identifiers

It supports two source modes:

  • context mode — inline document or contract passed directly in the request
  • kb_query mode — knowledge base lookup via RAG endpoint

Every verification produces a cryptographic audit record compatible with the DCL Evaluator tamper-evident chain.


Verdicts

Verdict Meaning
IN_COMMIT Response is faithfully grounded in the source. No hallucinations detected. Safe to proceed.
HALLUCINATION_DRIFT Response contains fabricated, contradicted, or unsupported claims. Do not commit. Review drift_items.

Input schema

{
  "source_mode": "context" | "kb_query",

  // For source_mode = "context":
  "source_document": "\x3Cfull text of the authoritative document>",

  // For source_mode = "kb_query":
  "kb_endpoint": "\x3CRAG endpoint URL>",
  "kb_query": "\x3Cquery string to retrieve relevant chunks>",

  // Always required:
  "llm_output": "\x3Cthe LLM-generated response to verify>",
  "strictness": "strict" | "balanced" | "lenient",  // default: "balanced"
  "policy": "eu_ai_act" | "gdpr" | "fstek" | "internal" | "none"  // optional
}

Strictness levels

  • strict — any unverifiable claim triggers HALLUCINATION_DRIFT. Use for contracts, medical, legal, financial outputs.
  • balanced — minor paraphrasing and reasonable inferences are tolerated. Use for customer support, summaries.
  • lenient — only direct factual contradictions trigger HALLUCINATION_DRIFT. Use for creative or exploratory outputs.

Output schema

{
  "status": "success" | "error",
  "data": {
    "verdict": "IN_COMMIT" | "HALLUCINATION_DRIFT",
    "confidence": 0.0–1.0,
    "source_mode": "context" | "kb_query",
    "strictness": "strict" | "balanced" | "lenient",
    "policy": "eu_ai_act" | "none" | "...",
    "drift_items": [
      {
        "type": "hallucination" | "contradiction" | "omission" | "fabricated_specific",
        "claim": "\x3Cthe problematic claim in the LLM output>",
        "source_reference": "\x3Crelevant excerpt from source, or null if absent>",
        "severity": "critical" | "major" | "minor"
      }
    ],
    "tx_hash": "\x3CSHA-256 of input+output payload>",
    "timestamp": "ISO-8601",
    "audit_chain_id": "\x3CMerkle leaf ID for DCL Evaluator chain>"
  }
}

drift_items is an empty array [] when verdict is IN_COMMIT.


Verification workflow

When this skill is invoked, follow these steps:

Step 1 — Retrieve source of truth

If source_mode = "context":
Use source_document directly. Chunk it into logical sections for comparison.

If source_mode = "kb_query":
Query the kb_endpoint with kb_query. Retrieve top-k relevant chunks. Treat the union of retrieved chunks as the authoritative source. If the endpoint is unreachable, return status: "error" with reason: "kb_unavailable".

Step 2 — Decompose LLM output into claims

Parse the llm_output into atomic, verifiable claims:

  • Factual assertions ("The contract states X")
  • Numerical values ("The penalty is €10,000")
  • Named entities ("The responsible party is Company A")
  • Temporal claims ("The deadline is March 15")
  • Logical conclusions ("Therefore, clause 4.2 applies")

Step 3 — Cross-reference each claim against source

For each claim, determine:

Finding Classification
Claim is explicitly supported by source ✅ Grounded
Claim is a reasonable paraphrase (strictness: lenient/balanced) ✅ Grounded
Claim introduces information absent from source ⚠️ hallucination
Claim directly contradicts source 🚨 contradiction
Critical source information was omitted from output ⚠️ omission
Specific value (number, date, name) was invented 🚨 fabricated_specific

Step 4 — Apply strictness filter

  • strict: any ⚠️ or 🚨 → HALLUCINATION_DRIFT
  • balanced: any 🚨, or multiple ⚠️ → HALLUCINATION_DRIFT
  • lenient: only 🚨 contradiction or fabricated_specific → HALLUCINATION_DRIFT

Step 5 — Compute audit record

Generate:

tx_hash = SHA-256(source_fingerprint + llm_output + verdict + timestamp)
audit_chain_id = Merkle leaf position in DCL Evaluator chain

Return the full output schema.


Interpreting results

IN_COMMIT — safe to proceed

{
  "status": "success",
  "data": {
    "verdict": "IN_COMMIT",
    "confidence": 0.97,
    "drift_items": [],
    "tx_hash": "0xa3f1...c72e",
    "timestamp": "2026-04-09T14:22:00Z",
    "audit_chain_id": "dcl-leaf-0047"
  }
}

The LLM output is faithfully grounded in the source. Log tx_hash to your audit trail.

HALLUCINATION_DRIFT — do not commit

{
  "status": "success",
  "data": {
    "verdict": "HALLUCINATION_DRIFT",
    "confidence": 0.89,
    "drift_items": [
      {
        "type": "fabricated_specific",
        "claim": "The penalty for breach is €50,000.",
        "source_reference": "Section 8.3: The penalty shall not exceed €10,000.",
        "severity": "critical"
      },
      {
        "type": "hallucination",
        "claim": "The agreement includes a 90-day cooling-off period.",
        "source_reference": null,
        "severity": "major"
      }
    ],
    "tx_hash": "0xb8d2...4f91",
    "timestamp": "2026-04-09T14:22:00Z",
    "audit_chain_id": "dcl-leaf-0048"
  }
}

Block the output. Surface drift_items to the human reviewer or trigger a re-generation loop.


Integration patterns

With DCL Policy Enforcer (recommended pipeline)

Run Policy Enforcer first (jailbreak / compliance check), then Semantic Drift Guard (factual grounding):

LLM Output
    │
    ▼
DCL Policy Enforcer ──► REJECT? → Block immediately
    │ COMMIT
    ▼
DCL Semantic Drift Guard ──► HALLUCINATION_DRIFT? → Block / re-generate
    │ IN_COMMIT
    ▼
Safe to deliver

Both tx_hash values are logged to the same DCL Evaluator audit chain, giving end-to-end verifiability.

With DCL Sentinel Trace (full Leibniz Layer™ stack)

Sentinel Trace → strip PII before source reaches LLM
Policy Enforcer → compliance check on output
Semantic Drift Guard → factual grounding check

Standalone (quick RAG validation)

result = dcl_semantic_drift_guard(
    source_mode="kb_query",
    kb_endpoint="https://kb.yourapp.com/query",
    kb_query="penalty clauses breach of contract",
    llm_output=agent_response,
    strictness="strict",
    policy="eu_ai_act"
)

if result["data"]["verdict"] == "HALLUCINATION_DRIFT":
    raise ValueError(f"Drift detected: {result['data']['drift_items']}")

Use cases

Domain Source mode Strictness Why
Legal contract summarization context strict Fabricated clauses = liability
RAG-based customer support kb_query balanced Prevent wrong product info
Medical documentation context strict Patient safety
Financial report generation context strict Regulatory compliance
EU AI Act compliance auditing kb_query strict FSTEK / AI Act article mapping
Internal knowledge assistant kb_query lenient Lower stakes, exploratory

Compliance notes

  • Audit records are compatible with EU AI Act Article 12 (logging requirements for high-risk AI systems)
  • tx_hash chain is admissible as tamper-evident evidence under GDPR Article 5(2) accountability principle
  • All source documents processed in context mode are never stored — only their fingerprint is hashed
  • Compatible with FSTEK audit trail requirements for AI systems in Russian regulated industries

Privacy & Data Policy

This skill is operated by Fronesis Labs under a strict no-retention data policy.

What is processed: Only the text submitted for evaluation. No user identity, no API keys, no metadata beyond what is required to run the verification.

Retention: Evaluations are processed in-memory only. No text is written to disk, no logs are retained, no data is shared with third parties. The only persistent record is the cryptographic tx_hash and chain_hash — these contain no personal data.

Source documents: Content passed via source_document (context mode) is never stored or logged. Only a cryptographic fingerprint is included in the audit hash.

Infrastructure: Webhook hosted on a private VPS operated solely by Fronesis Labs. No cloud analytics, no third-party processors.

Full policy: https://fronesislabs.com/#privacy · Questions: [email protected]


Related skills

  • dcl-policy-enforcer — Compliance and jailbreak detection (run before Drift Guard)
  • dcl-sentinel-trace — PII redaction and identity exposure detection (run before source reaches LLM)

Leibniz Layer™ · Fronesis Labs · fronesislabs.com

安全使用建议
This skill is instruction-only and appears to do what it says: compare LLM output to a provided document or to results fetched from a kb_endpoint. Before using it: (1) only pass sources and kb_endpoint URLs you trust — the skill will query whatever kb_endpoint you provide, so don't point it at untrusted external services or share sensitive documents with unknown endpoints; (2) confirm how you want the DCL audit record handled — the SKILL.md produces a tx_hash and an audit_chain_id but does not specify an external DCL service or publishing step, so if you expect the record to be posted to Fronesis/DCL infrastructure you should request details (endpoint and auth) from the publisher; (3) prefer 'strict' for high-risk outputs (contracts, legal, medical) and understand the strictness tradeoffs. Overall the skill is internally consistent, but verify expected external publishing semantics before relying on its audit-chain claims.
功能分析
Type: OpenClaw Skill Name: dcl-semantic-drift-guard Version: 1.0.0 The skill 'dcl-semantic-drift-guard' is a prompt-based utility designed to guide an AI agent in detecting hallucinations and semantic drift by comparing LLM outputs against a source document or a user-provided RAG endpoint (kb_endpoint). The SKILL.md file outlines a structured workflow for claim decomposition, verification, and the generation of a simulated cryptographic audit record (tx_hash). There is no evidence of malicious intent, data exfiltration to unauthorized domains, or harmful instructions; the network capability is restricted to the user-supplied endpoint for knowledge retrieval, and the overall logic is consistent with its stated purpose of factual grounding.
能力评估
Purpose & Capability
Name, description, and runtime instructions align: the skill verifies LLM outputs against a provided context or a caller-specified kb_endpoint. One minor mismatch: the SKILL.md promises a DCL Evaluator 'audit_chain_id' / Merkle leaf, but it doesn't specify an external DCL service endpoint, credentials, or how/where the chain is published. This can be harmless if the chain is generated locally, but it should be clarified if the skill is expected to publish records externally.
Instruction Scope
Instructions are narrowly scoped to chunking the provided source (or querying a caller-supplied kb_endpoint), decomposing LLM output into claims, cross-referencing, applying a strictness filter, and computing a tamper-evident hash/record. The skill does not instruct reading unrelated files or environment variables. The only external network activity implied is contacting the kb_endpoint supplied at invocation (expected behavior for RAG).
Install Mechanism
No install spec and no code files — this is instruction-only. Nothing will be downloaded or written to disk by an installer step as part of the skill package.
Credentials
The skill declares no required environment variables, credentials, or config paths. That is proportional to its stated purpose because all source material or RAG endpoints are provided as inputs at invocation.
Persistence & Privilege
always is false and the skill does not request any persistent system privileges or attempt to modify other skills or system-wide settings. Autonomous invocation is allowed by default but that's expected for a skill; nothing here increases privilege beyond normal.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install dcl-semantic-drift-guard
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /dcl-semantic-drift-guard 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of DCL Semantic Drift Guard — Hallucination & Context Drift Detector - Compares LLM output with source documents or RAG-retrieved knowledge, detecting unsupported, fabricated, or omitted claims. - Supports both inline context and knowledge base source modes. - Provides configurable strictness levels for different use cases: strict, balanced, and lenient. - Outputs a tamper-evident audit record with drift details, verdict (IN_COMMIT or HALLUCINATION_DRIFT), and cryptographic hash. Part of the Leibniz Layer™ verification suite — designed to compose with DCL Policy Enforcer and DCL Sentinel Trace for end-to-end tamper-evident AI output verification.
元数据
Slug dcl-semantic-drift-guard
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

DCL Semantic Drift Guard — Hallucination & Context Drift Detector 是什么?

Use this skill to detect semantic hallucinations and context drift in LLM outputs. Triggers when an agent or pipeline needs to verify that a generated respon... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 161 次。

如何安装 DCL Semantic Drift Guard — Hallucination & Context Drift Detector?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dcl-semantic-drift-guard」即可一键安装,无需额外配置。

DCL Semantic Drift Guard — Hallucination & Context Drift Detector 是免费的吗?

是的,DCL Semantic Drift Guard — Hallucination & Context Drift Detector 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

DCL Semantic Drift Guard — Hallucination & Context Drift Detector 支持哪些平台?

DCL Semantic Drift Guard — Hallucination & Context Drift Detector 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 DCL Semantic Drift Guard — Hallucination & Context Drift Detector?

由 Dari Rinch(@daririnch)开发并维护,当前版本 v1.0.0。

💬 留言讨论