← 返回 Skills 市场
yezhaowang888-stack

Sharpagent Content Safety

作者 yezhaowang888-stack · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
20
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install sharpagent-content-safety
功能描述
SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports...
使用说明 (SKILL.md)

SharpAgent Content Safety Engine v1.0.0

The last line of defense for content output. It's not about "should we say it" — it's "how should it be said in this jurisdiction." Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.

Architecture Position

Layer 1: Five-Factor Review     ← Trust verification (global, immutable)
Layer 2: Calibration Framework  ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine  ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output

Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.

Contract

contract:
  name: sharpagent-content-safety
  version: "1.0.0"
  category: analysis
  trust_level: verified
  reads:
    - Content
    - CompliancePolicy
  writes:
    - SafetyVerdict
  preconditions:
    - "At least one compliance policy loaded"
    - "Content is not empty"
  postconditions:
    - "Verdict is one of: pass | flag | block"
    - "If flag or block, reason and rule reference are provided"
  calibration:
    default_mode: professional
    modes_supported: [warm, professional, deep]
  compliance:
    jurisdiction: global
    safety_level: strict
  lifecycle:
    status: active
    publish_as: SharpAgent

Core Design

Pluggable Rule Engine

rules:
  - id: "global/PII-001"
    type: "block"
    description: "Detect and block personal identifiable information"
    patterns:
      - "email"
      - "phone_number"
      - "id_card"
      - "address"
    severity: "high"

  - id: "cn/content-001"
    type: "block"
    description: "Block prohibited content per China Internet regulations"
    jurisdiction: "cn"
    severity: "critical"

  - id: "us/export-001"
    type: "flag"
    description: "Flag export-controlled technology references"
    jurisdiction: "us"
    severity: "medium"

  - id: "global/hate-speech-001"
    type: "block"
    description: "Block hate speech and discriminatory content"
    severity: "high"

  - id: "global/privacy-003"
    type: "flag"
    description: "Flag privacy-sensitive content for human review"
    severity: "medium"

Rule Structure

rule:
  id: "{jurisdiction}/{name}-{seq}"   # Unique identifier
  type: "block" | "flag" | "pass"      # Action
  description: "..."                   # Human-readable
  jurisdiction: "cn" | "us" | "eu" | "global"  # Applicable jurisdiction
  patterns: [regex...]                 # Match patterns (optional)
  keywords: [string...]                # Keyword matching (optional)
  severity: "low" | "medium" | "high" | "critical"
  exemptions: [                        # Exceptions
    "educational context",
    "news reporting"
  ]

Jurisdiction Configuration

Runtime selection (multi-select):

safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])

Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.

Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity

Workflow

Step 1: Pre-Flight

  • Content empty?
  • Content too long? Chunk at ≤4096 chars.

Step 2: Rule Matching

For each chunk:
    for each loaded rule:
        skip if jurisdiction not active
        check patterns/keywords
        check exemptions
        record match

Step 3: Verdict

Verdict Meaning Action
✅ pass No matches Let through to output
⚠️ flag Low severity match Tag + allow + log
🚫 block High severity match Block + return alternative content

Block replacement:

[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.

Step 4: Logging

{
  "event": "safety_check",
  "jurisdictions": ["cn", "global"],
  "rules_matched": [
    {"rule": "cn/content-001", "severity": "critical"}
  ],
  "verdict": "block",
  "timestamp": "2026-05-11T06:10:00Z",
  "agent": "sharpagent"
}

Ruleset Management

Built-in Rulesets

Ruleset Coverage File
global Universal safety (hate speech/PII/privacy) rules/global.yaml
cn China internet content regulations rules/cn.yaml
us US export control/safe harbor rules/us.yaml
eu GDPR related rules/eu.yaml

Custom Rules

rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md

Edge Cases

Situation Action
Conflicting jurisdiction rules Strictest wins (block > flag > pass)
Rule false positive Add exemption, log false positive
Cross-chunk sensitive phrase Overlap scanning (±200 chars)
No jurisdiction configured Load global only
Corrupt rule file Skip + log error, don't crash engine
Exemption conditions met Skip rule, log exemption reason

Quality Gates

Check What Fail action
At least 1 ruleset No rules = nothing blocked Don't start
Verdict unambiguous pass/flag/block Default block
Block provides reason User knows why Add reason
Complete audit log Every check recorded Backfill
Rules versioned Updates don't break running checks Semver rules

Integration Points

Five-Factor Review

  • Safety engine output (compliance_check: fail) can trigger five-factor
  • Independent but cooperative

Calibration Framework

  • Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
  • Calibration compliance field maps to safety engine rule selection

Self-Evolving

  • Safety false positives/negatives trigger self-evolving reflection
  • New rules as improvement hypotheses

Layered Memory

  • Safety logs go to L6 archive (legal compliance)

Version History

  • v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.

SharpAgent · MIT-0 · 2026-05-11

安全使用建议
Before installing, confirm that you want policy-based content filtering, review the actual rulesets and jurisdictions that will be loaded, and clarify audit-log handling if sensitive content may be checked.
功能分析
Type: OpenClaw Skill Name: sharpagent-content-safety Version: 1.0.0 The skill bundle contains documentation and instructions for a 'Content Safety Engine' designed to filter and flag content based on multi-jurisdictional compliance rules (Global, US, CN, EU). The SKILL.md file defines a structured logic for PII detection, hate speech filtering, and regulatory compliance without any executable code, data exfiltration patterns, or malicious prompt injection. All content is strictly aligned with the stated purpose of providing a safety and policy enforcement layer for AI outputs.
能力评估
Purpose & Capability
The stated purpose and instructions align: the skill is meant to evaluate content and return pass, flag, or block decisions. This is not inherently suspicious, but it can change or suppress agent output.
Instruction Scope
The instructions are scoped to content policy enforcement, including strictest-rule-wins behavior and default blocking on ambiguous verdicts. That is purpose-aligned but should be enabled only when the user wants policy-based output filtering.
Install Mechanism
There is no install spec or code, but the documentation references ruleset files such as rules/global.yaml and rules/cn.yaml that are not present in the supplied manifest, so the actual policy details cannot be verified from these artifacts.
Credentials
The artifacts declare no required binaries, environment variables, credentials, network access, or local system access.
Persistence & Privilege
The skill describes audit logging of safety checks, but does not specify storage location, retention, or who can read logs. No privileged access or background persistence is shown.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install sharpagent-content-safety
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /sharpagent-content-safety 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
SharpAgent Content Safety Engine v1.0.0 — initial release. - Introduces a pluggable rule engine for multi-jurisdiction content safety enforcement (supports global, CN, US, and EU). - Provides concurrent jurisdiction support and strict rule conflict resolution (block > flag > pass). - Integrates with calibration framework and five-factor review; operates independently as Layer 3 of the SharpAgent architecture. - Enforces compliance by blocking, flagging, or passing content with detailed logging and rule-based reasoning. - Includes support for custom and built-in rulesets, comprehensive audit logging, and robust edge case handling.
元数据
Slug sharpagent-content-safety
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Sharpagent Content Safety 是什么?

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 20 次。

如何安装 Sharpagent Content Safety?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install sharpagent-content-safety」即可一键安装,无需额外配置。

Sharpagent Content Safety 是免费的吗?

是的,Sharpagent Content Safety 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Sharpagent Content Safety 支持哪些平台?

Sharpagent Content Safety 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Sharpagent Content Safety?

由 yezhaowang888-stack(@yezhaowang888-stack)开发并维护,当前版本 v1.0.0。

💬 留言讨论