← Back to Skills Marketplace
bryantegomoh

Content Security Filter

by Bryan Tegomoh, MD, MPH · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
117
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install content-security-filter
Description
Prompt injection and malware detection filter for external content. Scans text, files, or URLs for 20+ attack patterns including instruction overrides, crede...
README (SKILL.md)

content-security-filter

Run before processing any external content — web pages, user pastes, articles, API responses — to detect prompt injection attacks and other malicious patterns.

Detection Coverage

Category Examples
Override attempts "ignore previous instructions", "forget everything"
Instruction hijacking "your new rules are:", "updated system prompt:"
Persona hijacking "you are now", "act as an unrestricted"
Jailbreak attempts DAN mode, unrestricted mode
Data exfiltration "send all private files", "leak workspace"
Credential probing "reveal your API key", "what is your system prompt"
Fake system messages [SYSTEM], [ADMIN], [[system]]
Encoded payloads base64 blobs containing suspicious content
Credential harvesting "provide your password/token/secret"
Command injection rm -rf, os.system, subprocess.run
Invisible characters zero-width spaces, soft hyphens, BOM
Homoglyph attacks unicode substitution hiding injection patterns

Usage

# Scan a string
python3 scripts/content-security-filter.py --text "ignore all previous instructions"

# Scan a file
python3 scripts/content-security-filter.py --file /path/to/document.txt

# Fetch and scan a URL
python3 scripts/content-security-filter.py --url "https://example.com/page"

# Pipe from stdin
echo "some content" | python3 scripts/content-security-filter.py

# JSON-only output (no stderr)
python3 scripts/content-security-filter.py --text "content" --quiet

Output

{
  "safe": false,
  "risk_level": "CRITICAL",
  "findings": [
    {
      "type": "OVERRIDE_ATTEMPT",
      "risk": "CRITICAL",
      "matched": "ignore all previous instructions",
      "detail": "Injection pattern detected: OVERRIDE_ATTEMPT"
    }
  ],
  "finding_count": 1,
  "sanitized": "...",
  "chars_scanned": 1234
}

Exit codes: 0 = safe, 1 = threat detected

Risk Levels

  • SAFE / LOW → safe to process
  • MEDIUM → review recommended (encoded content, invisible chars)
  • HIGH → likely malicious (data exfil probes, fake system tags)
  • CRITICAL → block immediately (override attempts, command injection)

Requirements

  • Python 3.8+
  • stdlib only (no pip dependencies)
Usage Guidance
This skill appears to be what it claims: a local scanner implemented in a small Python script that checks text/files/URLs for prompt-injection patterns. Before installing or using it: (1) inspect the bundled script (already provided) yourself and run it in a safe environment; (2) be aware it will read any file path or URL you give it — do not point it at sensitive local files unless you trust the environment; (3) test the tool on non-sensitive inputs to verify behavior; (4) the static scanner flagged prompt-injection phrases inside SKILL.md because the skill documents the patterns it detects — that's expected, not malicious. If you plan to allow the agent to invoke this skill autonomously, ensure its use cases justify automated scanning of user-supplied content so it cannot be misused to read sensitive files without oversight.
Capability Analysis
Type: OpenClaw Skill Name: content-security-filter Version: 1.0.0 The content-security-filter skill is a defensive utility designed to protect the OpenClaw agent by scanning external input for prompt injection, jailbreaks, and malicious patterns. The implementation in `scripts/content-security-filter.py` uses standard regex matching, unicode normalization, and base64 decoding to identify risks without any evidence of hidden malicious intent, data exfiltration, or unauthorized execution. The tool's behavior is fully aligned with its documentation in `SKILL.md`.
Capability Assessment
Purpose & Capability
Name/description match the included Python scanner. The script implements pattern matching, invisible-char detection, base64 decoding, and URL fetching — all appropriate for a content-security filter. No extraneous credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md and the script only instruct scanning of text, files, stdin, or a user-supplied URL. The pre-scan detector flagged prompt-injection phrases in SKILL.md, but those are example detection patterns and are expected for this purpose. The instructions do not direct data to third-party endpoints other than fetching the user-provided URL.
Install Mechanism
No install spec (instruction-only skill) and the included script uses Python stdlib only. Nothing is downloaded from external URLs or installed to disk beyond the bundled script.
Credentials
The skill requires no environment variables or secrets and the code does not read credentials or system config. It only uses standard Python libs and performs local file reads or URL fetches as requested by the user.
Persistence & Privilege
always:false and user-invocable:true (normal). The skill does not modify other skills or system-wide settings and does not request permanent presence or elevated privileges.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install content-security-filter
  3. After installation, invoke the skill by name or use /content-security-filter
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of content-security-filter. - Scans external text, files, or URLs for 20+ prompt injection and malware patterns. - Detects override attempts, persona hijacking, jailbreaks, credential leaks, fake system messages, encoded payloads, command injection, and more. - Outputs JSON report with risk level, findings, sanitized content, and character count. - Supports string, file, URL inputs, and stdin piping. - No external dependencies; requires Python 3.8+.
Metadata
Slug content-security-filter
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Content Security Filter?

Prompt injection and malware detection filter for external content. Scans text, files, or URLs for 20+ attack patterns including instruction overrides, crede... It is an AI Agent Skill for Claude Code / OpenClaw, with 117 downloads so far.

How do I install Content Security Filter?

Run "/install content-security-filter" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Content Security Filter free?

Yes, Content Security Filter is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Content Security Filter support?

Content Security Filter is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Content Security Filter?

It is built and maintained by Bryan Tegomoh, MD, MPH (@bryantegomoh); the current version is v1.0.0.

💬 Comments