← 返回 Skills 市场

Sharpagent Content Safety

Name: Sharpagent Content Safety
Author: yezhaowang888-stack

作者 yezhaowang888-stack · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install sharpagent-content-safety

功能描述

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports...

使用说明 (SKILL.md)

SharpAgent Content Safety Engine v1.0.0

The last line of defense for content output. It's not about "should we say it" — it's "how should it be said in this jurisdiction." Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.

Architecture Position

Layer 1: Five-Factor Review     ← Trust verification (global, immutable)
Layer 2: Calibration Framework  ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine  ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output

Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.

Contract

contract:
  name: sharpagent-content-safety
  version: "1.0.0"
  category: analysis
  trust_level: verified
  reads:
    - Content
    - CompliancePolicy
  writes:
    - SafetyVerdict
  preconditions:
    - "At least one compliance policy loaded"
    - "Content is not empty"
  postconditions:
    - "Verdict is one of: pass | flag | block"
    - "If flag or block, reason and rule reference are provided"
  calibration:
    default_mode: professional
    modes_supported: [warm, professional, deep]
  compliance:
    jurisdiction: global
    safety_level: strict
  lifecycle:
    status: active
    publish_as: SharpAgent

Core Design

Pluggable Rule Engine

rules:
  - id: "global/PII-001"
    type: "block"
    description: "Detect and block personal identifiable information"
    patterns:
      - "email"
      - "phone_number"
      - "id_card"
      - "address"
    severity: "high"

  - id: "cn/content-001"
    type: "block"
    description: "Block prohibited content per China Internet regulations"
    jurisdiction: "cn"
    severity: "critical"

  - id: "us/export-001"
    type: "flag"
    description: "Flag export-controlled technology references"
    jurisdiction: "us"
    severity: "medium"

  - id: "global/hate-speech-001"
    type: "block"
    description: "Block hate speech and discriminatory content"
    severity: "high"

  - id: "global/privacy-003"
    type: "flag"
    description: "Flag privacy-sensitive content for human review"
    severity: "medium"

Rule Structure

rule:
  id: "{jurisdiction}/{name}-{seq}"   # Unique identifier
  type: "block" | "flag" | "pass"      # Action
  description: "..."                   # Human-readable
  jurisdiction: "cn" | "us" | "eu" | "global"  # Applicable jurisdiction
  patterns: [regex...]                 # Match patterns (optional)
  keywords: [string...]                # Keyword matching (optional)
  severity: "low" | "medium" | "high" | "critical"
  exemptions: [                        # Exceptions
    "educational context",
    "news reporting"
  ]

Jurisdiction Configuration

Runtime selection (multi-select):

safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])

Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.

Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity

Workflow

Step 1: Pre-Flight

Content empty?
Content too long? Chunk at ≤4096 chars.

Step 2: Rule Matching

For each chunk:
    for each loaded rule:
        skip if jurisdiction not active
        check patterns/keywords
        check exemptions
        record match

Step 3: Verdict

Verdict	Meaning	Action
✅ pass	No matches	Let through to output
⚠️ flag	Low severity match	Tag + allow + log
🚫 block	High severity match	Block + return alternative content

Block replacement:

[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.

Step 4: Logging

{
  "event": "safety_check",
  "jurisdictions": ["cn", "global"],
  "rules_matched": [
    {"rule": "cn/content-001", "severity": "critical"}
  ],
  "verdict": "block",
  "timestamp": "2026-05-11T06:10:00Z",
  "agent": "sharpagent"
}

Ruleset Management

Built-in Rulesets

Ruleset	Coverage	File
`global`	Universal safety (hate speech/PII/privacy)	`rules/global.yaml`
`cn`	China internet content regulations	`rules/cn.yaml`
`us`	US export control/safe harbor	`rules/us.yaml`
`eu`	GDPR related	`rules/eu.yaml`

Custom Rules

rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md

Edge Cases

Situation	Action
Conflicting jurisdiction rules	Strictest wins (block > flag > pass)
Rule false positive	Add exemption, log false positive
Cross-chunk sensitive phrase	Overlap scanning (±200 chars)
No jurisdiction configured	Load `global` only
Corrupt rule file	Skip + log error, don't crash engine
Exemption conditions met	Skip rule, log exemption reason

Quality Gates

Check	What	Fail action
At least 1 ruleset	No rules = nothing blocked	Don't start
Verdict unambiguous	pass/flag/block	Default block
Block provides reason	User knows why	Add reason
Complete audit log	Every check recorded	Backfill
Rules versioned	Updates don't break running checks	Semver rules

Integration Points

Five-Factor Review

Safety engine output (compliance_check: fail) can trigger five-factor
Independent but cooperative

Calibration Framework

Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
Calibration compliance field maps to safety engine rule selection

Self-Evolving

Safety false positives/negatives trigger self-evolving reflection
New rules as improvement hypotheses

Layered Memory

Safety logs go to L6 archive (legal compliance)

Version History

v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.

SharpAgent · MIT-0 · 2026-05-11

安全使用建议

Before installing, confirm that you want policy-based content filtering, review the actual rulesets and jurisdictions that will be loaded, and clarify audit-log handling if sensitive content may be checked.

功能分析

Type: OpenClaw Skill Name: sharpagent-content-safety Version: 1.0.0 The skill bundle contains documentation and instructions for a 'Content Safety Engine' designed to filter and flag content based on multi-jurisdictional compliance rules (Global, US, CN, EU). The SKILL.md file defines a structured logic for PII detection, hate speech filtering, and regulatory compliance without any executable code, data exfiltration patterns, or malicious prompt injection. All content is strictly aligned with the stated purpose of providing a safety and policy enforcement layer for AI outputs.

能力评估

ℹ Purpose & Capability

The stated purpose and instructions align: the skill is meant to evaluate content and return pass, flag, or block decisions. This is not inherently suspicious, but it can change or suppress agent output.

ℹ Instruction Scope

The instructions are scoped to content policy enforcement, including strictest-rule-wins behavior and default blocking on ambiguous verdicts. That is purpose-aligned but should be enabled only when the user wants policy-based output filtering.

ℹ Install Mechanism

There is no install spec or code, but the documentation references ruleset files such as rules/global.yaml and rules/cn.yaml that are not present in the supplied manifest, so the actual policy details cannot be verified from these artifacts.

✓ Credentials

The artifacts declare no required binaries, environment variables, credentials, network access, or local system access.

ℹ Persistence & Privilege

The skill describes audit logging of safety checks, but does not specify storage location, retention, or who can read logs. No privileged access or background persistence is shown.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install sharpagent-content-safety
安装完成后，直接呼叫该 Skill 的名称或使用 /sharpagent-content-safety 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

SharpAgent Content Safety Engine v1.0.0 — initial release. - Introduces a pluggable rule engine for multi-jurisdiction content safety enforcement (supports global, CN, US, and EU). - Provides concurrent jurisdiction support and strict rule conflict resolution (block > flag > pass). - Integrates with calibration framework and five-factor review; operates independently as Layer 3 of the SharpAgent architecture. - Enforces compliance by blocking, flagging, or passing content with detailed logging and rule-based reasoning. - Includes support for custom and built-in rulesets, comprehensive audit logging, and robust edge case handling.

元数据

Slug sharpagent-content-safety

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题