Reef Prompt Guard

Name: Reef Prompt Guard
Author: staybased

Description

Detect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation.

Usage Guidance

This skill appears to be what it claims: a regex-based prompt-injection filter implemented as a local Python script. Before installing/use: 1) Prefer invoking the filter module directly (import/call) rather than interpolating untrusted text into shell commands — the Node.js execSync example in SKILL.md can be unsafe and lead to command injection if input isn't properly escaped. 2) Understand the tool's limitation: it's regex-based and will miss novel/semantic attacks; consider adding a classifier or anomaly/perplexity checks for ambiguous inputs. 3) Review and test the pattern lists in scripts/filter.py and references/attack-patterns.md to ensure no false-positives block legitimate content and to tune context multipliers. 4) Keep the script on a secure path and avoid running it with elevated privileges. If you need stronger guarantees (e.g., in production-facing pipelines or multi-agent systems), perform adversarial testing and consider layered defenses (sandboxed processing, dual-LLM architecture, strict escaping when calling subprocesses).

Capability Analysis

Type: OpenClaw Skill Name: reef-prompt-guard Version: 1.0.0 This skill bundle, 'reef-prompt-guard', is a security tool designed to detect and filter prompt injection attacks. The `scripts/filter.py` script uses regular expressions to identify patterns associated with various prompt injection techniques (e.g., instruction override, data exfiltration, command execution attempts) and then sanitizes the input. The `SKILL.md` and `references/attack-patterns.md` files serve as documentation, explaining the purpose, usage, and underlying attack patterns this skill defends against. There is no evidence of malicious intent, data exfiltration, unauthorized execution, or prompt injection against the OpenClaw agent itself; rather, the skill actively works to prevent these types of attacks.

Capability Assessment

✓ Purpose & Capability

Name/description match the included artifacts: a Python filter script and a reference doc about attack patterns. No credentials, external downloads, or unrelated binaries are requested — everything present is proportional to a local prompt-filtering tool.

ℹ Instruction Scope

SKILL.md stays within scope (scanning/sanitizing untrusted text, sandwich defense, integration examples). One integration example runs the Python script via a shell exec (Node.js execSync with a JSON string embedded), which if used as shown could introduce command-injection risk when untrusted text is interpolated into a shell command. The SKILL.md also intentionally contains injection examples (e.g., “ignore previous instructions”) — this is expected for a detector but was flagged by the pre-scan.

✓ Install Mechanism

No install spec or remote downloads; the skill is instruction + a local Python script. That is low-risk compared with installers that fetch/extract remote archives.

✓ Credentials

No environment variables, credentials, or config paths are requested. The tool does not ask for unrelated secrets and operates on local input only.

✓ Persistence & Privilege

always:false and normal user-invocable/autonomous invocation defaults are used. The skill does not request permanent system presence or attempt to modify other skills or global agent settings.

Version History

v1.0.0

Initial release — injection detection, 5 threat categories, CLI filter

Metadata

Slug reef-prompt-guard

Version 1.0.0

License —

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Reef Prompt Guard?

Detect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation. It is an AI Agent Skill for Claude Code / OpenClaw, with 895 downloads so far.

How do I install Reef Prompt Guard?

Run "/install reef-prompt-guard" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Reef Prompt Guard free?

Yes, Reef Prompt Guard is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Reef Prompt Guard support?

Reef Prompt Guard is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Reef Prompt Guard?

It is built and maintained by staybased (@staybased); the current version is v1.0.0.

More Skills

What is Reef Prompt Guard?

How do I install Reef Prompt Guard?

Is Reef Prompt Guard free?

Which platforms does Reef Prompt Guard support?

Who created Reef Prompt Guard?

💬 Comments