← Back to Skills Marketplace

Sharpagent Content Safety

Name: Sharpagent Content Safety
Author: yezhaowang888-stack

by yezhaowang888-stack · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install sharpagent-content-safety

Description

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports...

README (SKILL.md)

SharpAgent Content Safety Engine v1.0.0

The last line of defense for content output. It's not about "should we say it" — it's "how should it be said in this jurisdiction." Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.

Architecture Position

Layer 1: Five-Factor Review     ← Trust verification (global, immutable)
Layer 2: Calibration Framework  ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine  ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output

Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.

Contract

contract:
  name: sharpagent-content-safety
  version: "1.0.0"
  category: analysis
  trust_level: verified
  reads:
    - Content
    - CompliancePolicy
  writes:
    - SafetyVerdict
  preconditions:
    - "At least one compliance policy loaded"
    - "Content is not empty"
  postconditions:
    - "Verdict is one of: pass | flag | block"
    - "If flag or block, reason and rule reference are provided"
  calibration:
    default_mode: professional
    modes_supported: [warm, professional, deep]
  compliance:
    jurisdiction: global
    safety_level: strict
  lifecycle:
    status: active
    publish_as: SharpAgent

Core Design

Pluggable Rule Engine

rules:
  - id: "global/PII-001"
    type: "block"
    description: "Detect and block personal identifiable information"
    patterns:
      - "email"
      - "phone_number"
      - "id_card"
      - "address"
    severity: "high"

  - id: "cn/content-001"
    type: "block"
    description: "Block prohibited content per China Internet regulations"
    jurisdiction: "cn"
    severity: "critical"

  - id: "us/export-001"
    type: "flag"
    description: "Flag export-controlled technology references"
    jurisdiction: "us"
    severity: "medium"

  - id: "global/hate-speech-001"
    type: "block"
    description: "Block hate speech and discriminatory content"
    severity: "high"

  - id: "global/privacy-003"
    type: "flag"
    description: "Flag privacy-sensitive content for human review"
    severity: "medium"

Rule Structure

rule:
  id: "{jurisdiction}/{name}-{seq}"   # Unique identifier
  type: "block" | "flag" | "pass"      # Action
  description: "..."                   # Human-readable
  jurisdiction: "cn" | "us" | "eu" | "global"  # Applicable jurisdiction
  patterns: [regex...]                 # Match patterns (optional)
  keywords: [string...]                # Keyword matching (optional)
  severity: "low" | "medium" | "high" | "critical"
  exemptions: [                        # Exceptions
    "educational context",
    "news reporting"
  ]

Jurisdiction Configuration

Runtime selection (multi-select):

safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])

Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.

Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity

Workflow

Step 1: Pre-Flight

Content empty?
Content too long? Chunk at ≤4096 chars.

Step 2: Rule Matching

For each chunk:
    for each loaded rule:
        skip if jurisdiction not active
        check patterns/keywords
        check exemptions
        record match

Step 3: Verdict

Verdict	Meaning	Action
✅ pass	No matches	Let through to output
⚠️ flag	Low severity match	Tag + allow + log
🚫 block	High severity match	Block + return alternative content

Block replacement:

[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.

Step 4: Logging

{
  "event": "safety_check",
  "jurisdictions": ["cn", "global"],
  "rules_matched": [
    {"rule": "cn/content-001", "severity": "critical"}
  ],
  "verdict": "block",
  "timestamp": "2026-05-11T06:10:00Z",
  "agent": "sharpagent"
}

Ruleset Management

Built-in Rulesets

Ruleset	Coverage	File
`global`	Universal safety (hate speech/PII/privacy)	`rules/global.yaml`
`cn`	China internet content regulations	`rules/cn.yaml`
`us`	US export control/safe harbor	`rules/us.yaml`
`eu`	GDPR related	`rules/eu.yaml`

Custom Rules

rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md

Edge Cases

Situation	Action
Conflicting jurisdiction rules	Strictest wins (block > flag > pass)
Rule false positive	Add exemption, log false positive
Cross-chunk sensitive phrase	Overlap scanning (±200 chars)
No jurisdiction configured	Load `global` only
Corrupt rule file	Skip + log error, don't crash engine
Exemption conditions met	Skip rule, log exemption reason

Quality Gates

Check	What	Fail action
At least 1 ruleset	No rules = nothing blocked	Don't start
Verdict unambiguous	pass/flag/block	Default block
Block provides reason	User knows why	Add reason
Complete audit log	Every check recorded	Backfill
Rules versioned	Updates don't break running checks	Semver rules

Integration Points

Five-Factor Review

Safety engine output (compliance_check: fail) can trigger five-factor
Independent but cooperative

Calibration Framework

Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
Calibration compliance field maps to safety engine rule selection

Self-Evolving

Safety false positives/negatives trigger self-evolving reflection
New rules as improvement hypotheses

Layered Memory

Safety logs go to L6 archive (legal compliance)

Version History

v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.

SharpAgent · MIT-0 · 2026-05-11

Usage Guidance

Before installing, confirm that you want policy-based content filtering, review the actual rulesets and jurisdictions that will be loaded, and clarify audit-log handling if sensitive content may be checked.

Capability Analysis

Type: OpenClaw Skill Name: sharpagent-content-safety Version: 1.0.0 The skill bundle contains documentation and instructions for a 'Content Safety Engine' designed to filter and flag content based on multi-jurisdictional compliance rules (Global, US, CN, EU). The SKILL.md file defines a structured logic for PII detection, hate speech filtering, and regulatory compliance without any executable code, data exfiltration patterns, or malicious prompt injection. All content is strictly aligned with the stated purpose of providing a safety and policy enforcement layer for AI outputs.

Capability Assessment

ℹ Purpose & Capability

The stated purpose and instructions align: the skill is meant to evaluate content and return pass, flag, or block decisions. This is not inherently suspicious, but it can change or suppress agent output.

ℹ Instruction Scope

The instructions are scoped to content policy enforcement, including strictest-rule-wins behavior and default blocking on ambiguous verdicts. That is purpose-aligned but should be enabled only when the user wants policy-based output filtering.

ℹ Install Mechanism

There is no install spec or code, but the documentation references ruleset files such as rules/global.yaml and rules/cn.yaml that are not present in the supplied manifest, so the actual policy details cannot be verified from these artifacts.

✓ Credentials

The artifacts declare no required binaries, environment variables, credentials, network access, or local system access.

ℹ Persistence & Privilege

The skill describes audit logging of safety checks, but does not specify storage location, retention, or who can read logs. No privileged access or background persistence is shown.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install sharpagent-content-safety
After installation, invoke the skill by name or use /sharpagent-content-safety
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

SharpAgent Content Safety Engine v1.0.0 — initial release. - Introduces a pluggable rule engine for multi-jurisdiction content safety enforcement (supports global, CN, US, and EU). - Provides concurrent jurisdiction support and strict rule conflict resolution (block > flag > pass). - Integrates with calibration framework and five-factor review; operates independently as Layer 3 of the SharpAgent architecture. - Enforces compliance by blocking, flagging, or passing content with detailed logging and rule-based reasoning. - Includes support for custom and built-in rulesets, comprehensive audit logging, and robust edge case handling.

Metadata

Slug sharpagent-content-safety

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Sharpagent Content Safety?

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports... It is an AI Agent Skill for Claude Code / OpenClaw, with 20 downloads so far.

How do I install Sharpagent Content Safety?

Run "/install sharpagent-content-safety" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Sharpagent Content Safety free?

Yes, Sharpagent Content Safety is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Sharpagent Content Safety support?

Sharpagent Content Safety is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Sharpagent Content Safety?

It is built and maintained by yezhaowang888-stack (@yezhaowang888-stack); the current version is v1.0.0.

More Skills

Sharpagent Content Safety

SharpAgent Content Safety Engine v1.0.0

Architecture Position

Contract

Core Design

Pluggable Rule Engine

Rule Structure

Jurisdiction Configuration

Workflow

Step 1: Pre-Flight

Step 2: Rule Matching

Step 3: Verdict

Step 4: Logging

Ruleset Management

Built-in Rulesets

Custom Rules

Edge Cases

Quality Gates

Integration Points

Five-Factor Review

Calibration Framework

Self-Evolving

Layered Memory

Version History

What is Sharpagent Content Safety?

How do I install Sharpagent Content Safety?

Is Sharpagent Content Safety free?

Which platforms does Sharpagent Content Safety support?

Who created Sharpagent Content Safety?

💬 Comments