← Back to Skills Marketplace
Agent Firewall
by
Adnane Arharbi
· GitHub ↗
· v1.0.0
· MIT-0
90
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install agent-firewall
Description
Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model.
README (SKILL.md)
Agent Firewall — Input/Output Guardian
Architecture
[Channel Input] → [INPUT FILTER] → [Agent/Model] → [OUTPUT FILTER] → [Channel Output]
↓ ↓
┌─────────────┐ ┌──────────────┐
│ Block List │ │ Secret Scan │
│ Pattern DB │ │ PII Redact │
│ Rate Limit │ │ Path Scrub │
│ Encoding Det│ │ URL Checker │
└─────────────┘ └──────────────┘
Input Filters
| # | Filter | Description |
|---|---|---|
| 1 | Injection patterns | Regex + heuristic match for "ignore previous", "you are now", role confusion |
| 2 | Unicode sanitizer | Strip zero-width chars, control characters, RTL overrides |
| 3 | Encoding detector | Detect Base64, hex, ROT13 encoded payloads in user messages |
| 4 | Role confusion | Detect fake system messages, assistant impersonation |
| 5 | Rate limiter | Max messages per user per channel per minute |
| 6 | Size limiter | Reject inputs exceeding token budget |
Output Filters
| # | Filter | Description |
|---|---|---|
| 1 | Secret scanner | High-entropy strings + known patterns (AWS key, GitHub token) |
| 2 | PII redactor | Email, phone, SSN, credit card → [REDACTED] |
| 3 | Path scrubber | Remove internal filesystem paths from outputs |
| 4 | URL checker | Block responses containing known malicious URLs |
| 5 | Consistency check | Verify output doesn't contradict system prompt directives |
Configuration
# .security/firewall-rules.yaml
input:
injection_patterns:
- pattern: "ignore (all )?previous instructions"
action: BLOCK
severity: CRITICAL
- pattern: "you are now (?!helping)"
action: BLOCK
severity: HIGH
rate_limit:
max_per_minute: 30
max_per_hour: 500
max_input_tokens: 4096
output:
secret_patterns:
- name: aws_key
pattern: "AKIA[0-9A-Z]{16}"
action: REDACT
- name: github_token
pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
action: REDACT
pii_redaction: true
path_scrubbing: true
Guardrails
- Firewall rules are append-only in production — deletion requires human approval
- False positives → log, alert, pass through with warning (don't silently drop)
- All blocks are logged with: timestamp, rule matched, full context, channel, user hash
- Firewall itself cannot be disabled by agent instructions
- Rules file is read-only from the agent's perspective
Usage Guidance
This skill appears to implement the advertised filtering features, but there are important mismatches and risks to verify before installing:
- Confirm rule handling: SKILL.md says it reads .security/firewall-rules.yaml and enforces append-only lifecycle, but index.js currently does not parse external YAML and always uses built-in defaults. Ask the author whether YAML parsing and rule lifecycle enforcement are intentionally omitted or planned.
- Logging and data retention: the skill logs actions and context to .security/firewall-logs and returns originalData in responses. Determine what exactly is logged, whether logs are encrypted, who can read them, and how long they're retained. In high-sensitivity environments, store logs securely or disable logging of full payloads.
- File locations & permissions: the skill reads/writes under process.cwd() (.security/*). Decide whether that path is acceptable and ensure filesystem permissions prevent unauthorized reads/writes. The skill itself cannot enforce 'read-only' rules — use OS-level permissions or an external policy engine.
- Test in a sandbox: run the skill in an isolated environment with representative inputs to confirm redaction behavior (including edge cases) and to verify no external exfiltration occurs.
- Request hardening details from the author: YAML parsing implementation, rate-limiter/global state design, how the skill avoids accidental exposure (e.g., returning originalData), and whether there are configuration options for log encryption/retention.
If the author can provide a version that actually parses and validates external rules, avoids returning originalData (or makes that configurable), and documents log access/retention/encryption, the concerns would be materially reduced.
Capability Assessment
Purpose & Capability
Name, description, and code align with an input/output firewall: the code implements injection detection, secret/PII redaction, path scrubbing, etc. It requests no unrelated credentials or binaries. HOWEVER the SKILL.md presents features (reading and applying .security/firewall-rules.yaml, append-only rule lifecycle, enforcement that the firewall cannot be disabled) that the code does not actually implement: loadRules explicitly warns it does not parse YAML and always falls back to built-in defaults. That mismatch should be explained.
Instruction Scope
SKILL.md and index.js operate only on provided data and local files; they do not call external endpoints. But the skill promises logging of 'full context' and in code execute() returns originalData alongside processedData — meaning unredacted inputs/outputs may be written to disk and included in responses. The docs' claims about making the rules read-only and preventing agent disablement are not enforced in code. These gaps increase risk of accidental sensitive-data persistence or misconfiguration.
Install Mechanism
No install spec and no external downloads — lowest-risk install. The skill is delivery-as-code (index.js + SKILL.md) and only uses standard fs/path modules.
Credentials
The skill requests no credentials or env vars (appropriate), but it writes logs containing actions and context under .security/firewall-logs and returns originalData in responses. That creates a local storage surface for potentially sensitive secrets/PII (which the skill is supposed to detect). There is no encryption, rotation, retention policy, or config/comments explaining access controls for those logs.
Persistence & Privilege
always:false and no autonomous-privilege flags are fine. However the skill creates and writes to .security/firewall-logs and will read a rules file path under process.cwd(). Those filesystem writes are normal for a firewall but may conflict with expectations in multi-tenant or restricted environments. The SKILL.md's statements about 'firewall cannot be disabled by agent instructions' and 'rules file read-only from agent perspective' are policy claims not enforced by the code — the skill cannot on its own prevent other processes/skills or users from modifying those files.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install agent-firewall - After installation, invoke the skill by name or use
/agent-firewall - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of agent-firewall: real-time input/output filtering for agent communications.
- Blocks prompt injections, data exfiltration, and unauthorized commands before reaching the model.
- Includes layered input filters: injection detection, Unicode sanitization, encoding checks, rate/size limits, and role confusion detection.
- Adds output filters: secret scanning, PII redaction, internal path scrubbing, malicious URL blocking, and consistency checks.
- YAML-based configuration with clear, customizable rules for both input and output.
- Built-in guardrails: append-only rules, logging for all blocks, human approval for rules deletion, and resistance to agent tampering.
Metadata
Frequently Asked Questions
What is Agent Firewall?
Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model. It is an AI Agent Skill for Claude Code / OpenClaw, with 90 downloads so far.
How do I install Agent Firewall?
Run "/install agent-firewall" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Agent Firewall free?
Yes, Agent Firewall is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Agent Firewall support?
Agent Firewall is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Agent Firewall?
It is built and maintained by Adnane Arharbi (@arhadnane); the current version is v1.0.0.
More Skills