Reef Prompt Guard

Name: Reef Prompt Guard
Author: staybased

功能描述

Detect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation.

安全使用建议

This skill appears to be what it claims: a regex-based prompt-injection filter implemented as a local Python script. Before installing/use: 1) Prefer invoking the filter module directly (import/call) rather than interpolating untrusted text into shell commands — the Node.js execSync example in SKILL.md can be unsafe and lead to command injection if input isn't properly escaped. 2) Understand the tool's limitation: it's regex-based and will miss novel/semantic attacks; consider adding a classifier or anomaly/perplexity checks for ambiguous inputs. 3) Review and test the pattern lists in scripts/filter.py and references/attack-patterns.md to ensure no false-positives block legitimate content and to tune context multipliers. 4) Keep the script on a secure path and avoid running it with elevated privileges. If you need stronger guarantees (e.g., in production-facing pipelines or multi-agent systems), perform adversarial testing and consider layered defenses (sandboxed processing, dual-LLM architecture, strict escaping when calling subprocesses).

功能分析

Type: OpenClaw Skill Name: reef-prompt-guard Version: 1.0.0 This skill bundle, 'reef-prompt-guard', is a security tool designed to detect and filter prompt injection attacks. The `scripts/filter.py` script uses regular expressions to identify patterns associated with various prompt injection techniques (e.g., instruction override, data exfiltration, command execution attempts) and then sanitizes the input. The `SKILL.md` and `references/attack-patterns.md` files serve as documentation, explaining the purpose, usage, and underlying attack patterns this skill defends against. There is no evidence of malicious intent, data exfiltration, unauthorized execution, or prompt injection against the OpenClaw agent itself; rather, the skill actively works to prevent these types of attacks.

能力评估

✓ Purpose & Capability

Name/description match the included artifacts: a Python filter script and a reference doc about attack patterns. No credentials, external downloads, or unrelated binaries are requested — everything present is proportional to a local prompt-filtering tool.

ℹ Instruction Scope

SKILL.md stays within scope (scanning/sanitizing untrusted text, sandwich defense, integration examples). One integration example runs the Python script via a shell exec (Node.js execSync with a JSON string embedded), which if used as shown could introduce command-injection risk when untrusted text is interpolated into a shell command. The SKILL.md also intentionally contains injection examples (e.g., “ignore previous instructions”) — this is expected for a detector but was flagged by the pre-scan.

✓ Install Mechanism

No install spec or remote downloads; the skill is instruction + a local Python script. That is low-risk compared with installers that fetch/extract remote archives.

✓ Credentials

No environment variables, credentials, or config paths are requested. The tool does not ask for unrelated secrets and operates on local input only.

✓ Persistence & Privilege

always:false and normal user-invocable/autonomous invocation defaults are used. The skill does not request permanent system presence or attempt to modify other skills or global agent settings.

版本历史

v1.0.0

Initial release — injection detection, 5 threat categories, CLI filter

元数据

Slug reef-prompt-guard

版本 1.0.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Reef Prompt Guard 是什么？

Detect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 895 次。

如何安装 Reef Prompt Guard？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install reef-prompt-guard」即可一键安装，无需额外配置。

Reef Prompt Guard 是免费的吗？

是的，Reef Prompt Guard 完全免费（开源免费），可自由下载、安装和使用。

Reef Prompt Guard 支持哪些平台？

Reef Prompt Guard 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Reef Prompt Guard？

由 staybased（@staybased）开发并维护，当前版本 v1.0.0。

Reef Prompt Guard 是什么？

如何安装 Reef Prompt Guard？

Reef Prompt Guard 是免费的吗？

Reef Prompt Guard 支持哪些平台？

谁开发了 Reef Prompt Guard？

💬 留言讨论