← Back to Skills Marketplace
Prompt Injection Defense
by
AdrianTeng
· GitHub ↗
· v1.0.0
· MIT-0
126
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install prompt-injection-defense
Description
Harden agent sessions against prompt injection from untrusted content. Use when the agent reads web search results, emails, downloaded files, PDFs, or any ex...
Usage Guidance
This skill appears to do what it says: tag untrusted outputs, scan them for prompt-injection patterns, and quarantine or accept content before writing to memory. Before installing, consider: (1) set OPENCLAW_WORKSPACE explicitly if you don't want files in your home directory; review filesystem permissions on that workspace. (2) Do not allow the agent to construct shell commands from untrusted input and then pass them to tag-untrusted.sh (that script will execute whatever command you give it). (3) Regularly review the quarantine directory for false positives and for any sensitive data captured there. (4) Treat the scanner as a defense-in-depth tool — it can miss sophisticated attacks; combine with read-only API permissions and human review for risky actions. If you want higher assurance, audit the scripts locally and run them in a sandboxed environment first.
Capability Analysis
Type: OpenClaw Skill
Name: prompt-injection-defense
Version: 1.0.0
The bundle is a defensive security toolkit designed to protect OpenClaw agents from prompt injection attacks. It implements a multi-layered defense strategy including content tagging (scripts/tag-untrusted.sh), heuristic-based scanning for adversarial patterns (scripts/scan-content.py), and a gated memory-writing pipeline (scripts/safe-memory-write.sh) that quarantines suspicious input. The instructions in SKILL.md and the logic in the scripts are consistently aligned with the stated purpose of hardening the agent against external adversarial content.
Capability Assessment
Purpose & Capability
Name/description match the provided assets: SKILL.md documents tagging, scanning, memory guardrails and canaries; scripts implement scanning (scan-content.py), safe memory writes (safe-memory-write.sh), and tagging (tag-untrusted.sh). No unrelated credentials, binaries, or install steps are requested.
Instruction Scope
Runtime instructions are focused on scanning/tagging/quarantine. tag-untrusted.sh runs an arbitrary command and echoes its output wrapped in tags — this is expected for capturing tool output, but be careful: do not pass untrusted user-supplied strings as executable commands (that would execute them). The SKILL.md itself contains the injection phrases the scanner looks for (hence pre-scan hits); this is expected because the doc teaches detection rules.
Install Mechanism
Instruction-only with small local scripts; no download/install mechanism, package managers, or network fetches embedded in the install. Low installation risk.
Credentials
The skill requests no credentials or required env vars. Scripts write to a workspace path (OPENCLAW_WORKSPACE or default $HOME/.openclaw/workspace) and create memory/quarantine files there — this is consistent with purpose but means the skill will create persistent files on the user's filesystem and may store sanitized or quarantined copies of untrusted content (which could include secrets if such content contained them).
Persistence & Privilege
always:false (not force-installed) and user-invocable:true. The skill writes its own memory/quarantine files (expected). It does not modify other skills or request elevated system privileges.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install prompt-injection-defense - After installation, invoke the skill by name or use
/prompt-injection-defense - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release focused on agent prompt injection defense.
- Adds layered defense scripts: content tagging, scanning, memory write guardrails, and canary pattern detection.
- New scripts for tagging untrusted input, scanning for attack patterns, and safely writing to memory.
- Includes comprehensive checklist, hardening rules for agents, and practical usage examples.
- Provides reference detection patterns and strong usage guidance for handling any untrusted external content.
- Replaces the earlier prompt skill with a security-focused module.
Metadata
Frequently Asked Questions
What is Prompt Injection Defense?
Harden agent sessions against prompt injection from untrusted content. Use when the agent reads web search results, emails, downloaded files, PDFs, or any ex... It is an AI Agent Skill for Claude Code / OpenClaw, with 126 downloads so far.
How do I install Prompt Injection Defense?
Run "/install prompt-injection-defense" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Prompt Injection Defense free?
Yes, Prompt Injection Defense is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Prompt Injection Defense support?
Prompt Injection Defense is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Prompt Injection Defense?
It is built and maintained by AdrianTeng (@adrianteng); the current version is v1.0.0.
More Skills