← Back to Skills Marketplace
yezhaowang888-stack

Sharpagent Content Safety

by yezhaowang888-stack · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
20
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install sharpagent-content-safety
Description
SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports...
README (SKILL.md)

SharpAgent Content Safety Engine v1.0.0

The last line of defense for content output. It's not about "should we say it" — it's "how should it be said in this jurisdiction." Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.

Architecture Position

Layer 1: Five-Factor Review     ← Trust verification (global, immutable)
Layer 2: Calibration Framework  ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine  ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output

Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.

Contract

contract:
  name: sharpagent-content-safety
  version: "1.0.0"
  category: analysis
  trust_level: verified
  reads:
    - Content
    - CompliancePolicy
  writes:
    - SafetyVerdict
  preconditions:
    - "At least one compliance policy loaded"
    - "Content is not empty"
  postconditions:
    - "Verdict is one of: pass | flag | block"
    - "If flag or block, reason and rule reference are provided"
  calibration:
    default_mode: professional
    modes_supported: [warm, professional, deep]
  compliance:
    jurisdiction: global
    safety_level: strict
  lifecycle:
    status: active
    publish_as: SharpAgent

Core Design

Pluggable Rule Engine

rules:
  - id: "global/PII-001"
    type: "block"
    description: "Detect and block personal identifiable information"
    patterns:
      - "email"
      - "phone_number"
      - "id_card"
      - "address"
    severity: "high"

  - id: "cn/content-001"
    type: "block"
    description: "Block prohibited content per China Internet regulations"
    jurisdiction: "cn"
    severity: "critical"

  - id: "us/export-001"
    type: "flag"
    description: "Flag export-controlled technology references"
    jurisdiction: "us"
    severity: "medium"

  - id: "global/hate-speech-001"
    type: "block"
    description: "Block hate speech and discriminatory content"
    severity: "high"

  - id: "global/privacy-003"
    type: "flag"
    description: "Flag privacy-sensitive content for human review"
    severity: "medium"

Rule Structure

rule:
  id: "{jurisdiction}/{name}-{seq}"   # Unique identifier
  type: "block" | "flag" | "pass"      # Action
  description: "..."                   # Human-readable
  jurisdiction: "cn" | "us" | "eu" | "global"  # Applicable jurisdiction
  patterns: [regex...]                 # Match patterns (optional)
  keywords: [string...]                # Keyword matching (optional)
  severity: "low" | "medium" | "high" | "critical"
  exemptions: [                        # Exceptions
    "educational context",
    "news reporting"
  ]

Jurisdiction Configuration

Runtime selection (multi-select):

safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])

Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.

Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity

Workflow

Step 1: Pre-Flight

  • Content empty?
  • Content too long? Chunk at ≤4096 chars.

Step 2: Rule Matching

For each chunk:
    for each loaded rule:
        skip if jurisdiction not active
        check patterns/keywords
        check exemptions
        record match

Step 3: Verdict

Verdict Meaning Action
✅ pass No matches Let through to output
⚠️ flag Low severity match Tag + allow + log
🚫 block High severity match Block + return alternative content

Block replacement:

[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.

Step 4: Logging

{
  "event": "safety_check",
  "jurisdictions": ["cn", "global"],
  "rules_matched": [
    {"rule": "cn/content-001", "severity": "critical"}
  ],
  "verdict": "block",
  "timestamp": "2026-05-11T06:10:00Z",
  "agent": "sharpagent"
}

Ruleset Management

Built-in Rulesets

Ruleset Coverage File
global Universal safety (hate speech/PII/privacy) rules/global.yaml
cn China internet content regulations rules/cn.yaml
us US export control/safe harbor rules/us.yaml
eu GDPR related rules/eu.yaml

Custom Rules

rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md

Edge Cases

Situation Action
Conflicting jurisdiction rules Strictest wins (block > flag > pass)
Rule false positive Add exemption, log false positive
Cross-chunk sensitive phrase Overlap scanning (±200 chars)
No jurisdiction configured Load global only
Corrupt rule file Skip + log error, don't crash engine
Exemption conditions met Skip rule, log exemption reason

Quality Gates

Check What Fail action
At least 1 ruleset No rules = nothing blocked Don't start
Verdict unambiguous pass/flag/block Default block
Block provides reason User knows why Add reason
Complete audit log Every check recorded Backfill
Rules versioned Updates don't break running checks Semver rules

Integration Points

Five-Factor Review

  • Safety engine output (compliance_check: fail) can trigger five-factor
  • Independent but cooperative

Calibration Framework

  • Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
  • Calibration compliance field maps to safety engine rule selection

Self-Evolving

  • Safety false positives/negatives trigger self-evolving reflection
  • New rules as improvement hypotheses

Layered Memory

  • Safety logs go to L6 archive (legal compliance)

Version History

  • v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.

SharpAgent · MIT-0 · 2026-05-11

Usage Guidance
Before installing, confirm that you want policy-based content filtering, review the actual rulesets and jurisdictions that will be loaded, and clarify audit-log handling if sensitive content may be checked.
Capability Analysis
Type: OpenClaw Skill Name: sharpagent-content-safety Version: 1.0.0 The skill bundle contains documentation and instructions for a 'Content Safety Engine' designed to filter and flag content based on multi-jurisdictional compliance rules (Global, US, CN, EU). The SKILL.md file defines a structured logic for PII detection, hate speech filtering, and regulatory compliance without any executable code, data exfiltration patterns, or malicious prompt injection. All content is strictly aligned with the stated purpose of providing a safety and policy enforcement layer for AI outputs.
Capability Assessment
Purpose & Capability
The stated purpose and instructions align: the skill is meant to evaluate content and return pass, flag, or block decisions. This is not inherently suspicious, but it can change or suppress agent output.
Instruction Scope
The instructions are scoped to content policy enforcement, including strictest-rule-wins behavior and default blocking on ambiguous verdicts. That is purpose-aligned but should be enabled only when the user wants policy-based output filtering.
Install Mechanism
There is no install spec or code, but the documentation references ruleset files such as rules/global.yaml and rules/cn.yaml that are not present in the supplied manifest, so the actual policy details cannot be verified from these artifacts.
Credentials
The artifacts declare no required binaries, environment variables, credentials, network access, or local system access.
Persistence & Privilege
The skill describes audit logging of safety checks, but does not specify storage location, retention, or who can read logs. No privileged access or background persistence is shown.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install sharpagent-content-safety
  3. After installation, invoke the skill by name or use /sharpagent-content-safety
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
SharpAgent Content Safety Engine v1.0.0 — initial release. - Introduces a pluggable rule engine for multi-jurisdiction content safety enforcement (supports global, CN, US, and EU). - Provides concurrent jurisdiction support and strict rule conflict resolution (block > flag > pass). - Integrates with calibration framework and five-factor review; operates independently as Layer 3 of the SharpAgent architecture. - Enforces compliance by blocking, flagging, or passing content with detailed logging and rule-based reasoning. - Includes support for custom and built-in rulesets, comprehensive audit logging, and robust edge case handling.
Metadata
Slug sharpagent-content-safety
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Sharpagent Content Safety?

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports... It is an AI Agent Skill for Claude Code / OpenClaw, with 20 downloads so far.

How do I install Sharpagent Content Safety?

Run "/install sharpagent-content-safety" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Sharpagent Content Safety free?

Yes, Sharpagent Content Safety is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Sharpagent Content Safety support?

Sharpagent Content Safety is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Sharpagent Content Safety?

It is built and maintained by yezhaowang888-stack (@yezhaowang888-stack); the current version is v1.0.0.

💬 Comments