Prompt Shield Publish

Name: Prompt Shield Publish
Author: stlas

Description

Prompt Injection Firewall for AI agents. 113 detection patterns, 14 threat categories, zero dependencies. Protects against fake authority, command injection, memory poisoning, skill malware, crypto spam, and more. Hash-chain tamper-proof whitelist with mandatory peer review. Claude Code hook integration.

Usage Guidance

This package appears to implement a local prompt-scanner and hook for Claude, but exercise caution before enabling it: - Provenance: the registry lists the source as 'unknown' even though SKILL.md references a GitHub repo. Verify the upstream project and author (download from a trusted repo or vendor) before installing. - Dependencies: despite the claim of "zero dependencies," shield.py requires Python3 and PyYAML. Install those from trusted package sources and inspect installed packages. - Whitelist / peer review: the hash-chain whitelist exists, but the "peer review" protections are enforced only via names in the YAML (strings). There is no cryptographic identity or external approval service in the shipped code, so a local attacker or misconfigured workflow could add approvals or edit the file. Do not enable the whitelist or give it trust until you understand the approval workflow. - File writes: the tool writes whitelist.yaml and whitelist-audit.log in its directory. Consider file permissions, location (use an isolated path), and backups before enabling. - Hook integration: adding the hook requires editing ~/.claude/settings.json; verify what your Claude client does with hook exit codes. Test the scanner locally first (dry-run on benign data) to evaluate false positives and behaviour. - Review for network behavior: the included files show no explicit network calls, but you should search the full shield.py (especially truncated parts) for any HTTP/exec calls before trusting it in production. If you want to proceed: run the code in an isolated environment, inspect the full shield.py for external calls, install PyYAML from a trusted source, and only enable the Claude hook after confirming behaviour and whitelist governance. If you want me to, I can re-scan the remaining truncated parts of shield.py for network or obfuscated behavior (upload the full file text).

Capability Analysis

Type: OpenClaw Skill Name: prompt-shield Version: 3.0.6 This skill bundle, 'prompt-shield', is a security tool designed to act as a Prompt Injection Firewall for AI agents. All files (SKILL.md, prompt-shield-hook.sh, shield.py, patterns.yaml, whitelist.yaml, SCORING.md) consistently describe and implement a defensive mechanism. The `shield.py` script contains logic for pattern matching, heuristic scoring, and a hash-chain-based whitelist to detect and mitigate various attack types (e.g., command injection, reverse shells, data exfiltration attempts, memory poisoning). The `patterns.yaml` file defines the signatures of these malicious activities that the tool *detects*, not performs. The `prompt-shield-hook.sh` integrates this scanner into an agent's input pipeline to block or warn about malicious input. There is no evidence of intentional harmful behavior, data exfiltration, unauthorized execution, or prompt injection *by* this skill against the agent. The design explicitly includes safeguards against false positives, reinforcing its defensive intent.

Capability Assessment

ℹ Purpose & Capability

The files (shield.py, patterns.yaml, whitelist.yaml, hook) implement a local prompt-scanner/whitelist as described. However the top-level description claims "zero dependencies" while SKILL.md and shield.py require Python3 + PyYAML. SKILL.md references a GitHub repo (https://github.com/stlas/PromptShield) but the skill source is marked 'unknown' in registry metadata — this mismatch reduces provenance/trust. Overall the required artifacts (pattern DB, CLI, Claude hook) are coherent with the stated purpose, but the provenance and dependency claim are inconsistent.

ℹ Instruction Scope

Runtime instructions and the provided hook are narrowly scoped: they read input (stdin/JSON) and run local pattern scans, then exit with codes/messages to let Claude accept/warn/block. The skill does not request or read arbitrary system environment variables. It does read and write local files (whitelist.yaml, whitelist-audit.log) in its directory. SKILL.md promotes integrating the hook into ~/.claude/settings.json (requires user edit). Documentation mentions external mechanisms (e.g., SYNAPSE peer approval) and 'peer review' processes that are not implemented or cryptographically enforced in the supplied code — the approval model is just string entries in the YAML, not authenticated peers. That inconsistency could give a false sense of protection.

ℹ Install Mechanism

There is no install spec (instruction-only) but implementation files are included. This means installing is a manual file placement and running shield.py locally. There are no network downloads or packaged installers in the manifest, which lowers supply-chain risk, but the code requires PyYAML (pip). The absence of an automated install step is coherent but the skill's description saying 'zero dependencies' contradicts the actual dependency on PyYAML.

✓ Credentials

The skill asks for no environment variables, no credentials, and no config paths outside its own directory. It writes its own whitelist and audit log files locally. That level of access is proportionate for a local prompt-scanner.

✓ Persistence & Privilege

always:false (default) and disable-model-invocation:false are set — normal for skills. The skill does not request system-wide privileges or modify other skills. It does create/modify whitelist.yaml and whitelist-audit.log in its own directory if used; enabling the hook requires editing the user's Claude settings.json (a manual step by the user).

Version History

v3.0.6

Audit fixes: version consistency, dependencies corrected (PyYAML), test_shield.py removed from package, SKILL.md metadata updated

v3.0.5

Fix OpenClaw scanner crash: Removed inline fallback patterns from shield.py. All patterns now exclusively loaded from patterns.yaml. QA passed: 29/29 core tests, 73/135 GUARDIAN tests, no regression.

v3.0.4

Fix metadata structure: moved bins from openclaw to requires section for OpenClaw compatibility.

v3.0.3

Clean release: Removed test_shield.py with embedded attack samples to pass security scanners. Runtime files only.

v3.0.2

Initial ClawHub release. 113 patterns, 14 categories, hash-chain whitelist v2, Claude Code hook, zero dependencies.

Metadata

Slug prompt-shield

Version 3.0.6

License —

All-time Installs 2

Active Installs 2

Total Versions 5

Frequently Asked Questions

What is Prompt Shield Publish?

Prompt Injection Firewall for AI agents. 113 detection patterns, 14 threat categories, zero dependencies. Protects against fake authority, command injection, memory poisoning, skill malware, crypto spam, and more. Hash-chain tamper-proof whitelist with mandatory peer review. Claude Code hook integration. It is an AI Agent Skill for Claude Code / OpenClaw, with 1207 downloads so far.

How do I install Prompt Shield Publish?

Run "/install prompt-shield" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Prompt Shield Publish free?

Yes, Prompt Shield Publish is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Prompt Shield Publish support?

Prompt Shield Publish is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Prompt Shield Publish?

It is built and maintained by stlas (@stlas); the current version is v3.0.6.

More Skills

What is Prompt Shield Publish?

How do I install Prompt Shield Publish?

Is Prompt Shield Publish free?

Which platforms does Prompt Shield Publish support?

Who created Prompt Shield Publish?

💬 Comments