← 返回 Skills 市场
895
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install reef-prompt-guard
功能描述
Detect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation.
安全使用建议
This skill appears to be what it claims: a regex-based prompt-injection filter implemented as a local Python script. Before installing/use: 1) Prefer invoking the filter module directly (import/call) rather than interpolating untrusted text into shell commands — the Node.js execSync example in SKILL.md can be unsafe and lead to command injection if input isn't properly escaped. 2) Understand the tool's limitation: it's regex-based and will miss novel/semantic attacks; consider adding a classifier or anomaly/perplexity checks for ambiguous inputs. 3) Review and test the pattern lists in scripts/filter.py and references/attack-patterns.md to ensure no false-positives block legitimate content and to tune context multipliers. 4) Keep the script on a secure path and avoid running it with elevated privileges. If you need stronger guarantees (e.g., in production-facing pipelines or multi-agent systems), perform adversarial testing and consider layered defenses (sandboxed processing, dual-LLM architecture, strict escaping when calling subprocesses).
功能分析
Type: OpenClaw Skill
Name: reef-prompt-guard
Version: 1.0.0
This skill bundle, 'reef-prompt-guard', is a security tool designed to detect and filter prompt injection attacks. The `scripts/filter.py` script uses regular expressions to identify patterns associated with various prompt injection techniques (e.g., instruction override, data exfiltration, command execution attempts) and then sanitizes the input. The `SKILL.md` and `references/attack-patterns.md` files serve as documentation, explaining the purpose, usage, and underlying attack patterns this skill defends against. There is no evidence of malicious intent, data exfiltration, unauthorized execution, or prompt injection against the OpenClaw agent itself; rather, the skill actively works to prevent these types of attacks.
能力评估
Purpose & Capability
Name/description match the included artifacts: a Python filter script and a reference doc about attack patterns. No credentials, external downloads, or unrelated binaries are requested — everything present is proportional to a local prompt-filtering tool.
Instruction Scope
SKILL.md stays within scope (scanning/sanitizing untrusted text, sandwich defense, integration examples). One integration example runs the Python script via a shell exec (Node.js execSync with a JSON string embedded), which if used as shown could introduce command-injection risk when untrusted text is interpolated into a shell command. The SKILL.md also intentionally contains injection examples (e.g., “ignore previous instructions”) — this is expected for a detector but was flagged by the pre-scan.
Install Mechanism
No install spec or remote downloads; the skill is instruction + a local Python script. That is low-risk compared with installers that fetch/extract remote archives.
Credentials
No environment variables, credentials, or config paths are requested. The tool does not ask for unrelated secrets and operates on local input only.
Persistence & Privilege
always:false and normal user-invocable/autonomous invocation defaults are used. The skill does not request permanent system presence or attempt to modify other skills or global agent settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install reef-prompt-guard - 安装完成后,直接呼叫该 Skill 的名称或使用
/reef-prompt-guard触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release — injection detection, 5 threat categories, CLI filter
元数据
常见问题
Reef Prompt Guard 是什么?
Detect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 895 次。
如何安装 Reef Prompt Guard?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install reef-prompt-guard」即可一键安装,无需额外配置。
Reef Prompt Guard 是免费的吗?
是的,Reef Prompt Guard 完全免费(开源免费),可自由下载、安装和使用。
Reef Prompt Guard 支持哪些平台?
Reef Prompt Guard 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Reef Prompt Guard?
由 staybased(@staybased)开发并维护,当前版本 v1.0.0。
推荐 Skills