← Back to Skills Marketplace
arhadnane

Agent Firewall

by Adnane Arharbi · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
90
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install agent-firewall
Description
Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model.
README (SKILL.md)

Agent Firewall — Input/Output Guardian

Architecture

[Channel Input] → [INPUT FILTER] → [Agent/Model] → [OUTPUT FILTER] → [Channel Output]
                        ↓                                  ↓
                  ┌─────────────┐                  ┌──────────────┐
                  │ Block List  │                  │ Secret Scan  │
                  │ Pattern DB  │                  │ PII Redact   │
                  │ Rate Limit  │                  │ Path Scrub   │
                  │ Encoding Det│                  │ URL Checker  │
                  └─────────────┘                  └──────────────┘

Input Filters

# Filter Description
1 Injection patterns Regex + heuristic match for "ignore previous", "you are now", role confusion
2 Unicode sanitizer Strip zero-width chars, control characters, RTL overrides
3 Encoding detector Detect Base64, hex, ROT13 encoded payloads in user messages
4 Role confusion Detect fake system messages, assistant impersonation
5 Rate limiter Max messages per user per channel per minute
6 Size limiter Reject inputs exceeding token budget

Output Filters

# Filter Description
1 Secret scanner High-entropy strings + known patterns (AWS key, GitHub token)
2 PII redactor Email, phone, SSN, credit card → [REDACTED]
3 Path scrubber Remove internal filesystem paths from outputs
4 URL checker Block responses containing known malicious URLs
5 Consistency check Verify output doesn't contradict system prompt directives

Configuration

# .security/firewall-rules.yaml
input:
  injection_patterns:
    - pattern: "ignore (all )?previous instructions"
      action: BLOCK
      severity: CRITICAL
    - pattern: "you are now (?!helping)"
      action: BLOCK
      severity: HIGH
  rate_limit:
    max_per_minute: 30
    max_per_hour: 500
  max_input_tokens: 4096

output:
  secret_patterns:
    - name: aws_key
      pattern: "AKIA[0-9A-Z]{16}"
      action: REDACT
    - name: github_token
      pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
      action: REDACT
  pii_redaction: true
  path_scrubbing: true

Guardrails

  • Firewall rules are append-only in production — deletion requires human approval
  • False positives → log, alert, pass through with warning (don't silently drop)
  • All blocks are logged with: timestamp, rule matched, full context, channel, user hash
  • Firewall itself cannot be disabled by agent instructions
  • Rules file is read-only from the agent's perspective
Usage Guidance
This skill appears to implement the advertised filtering features, but there are important mismatches and risks to verify before installing: - Confirm rule handling: SKILL.md says it reads .security/firewall-rules.yaml and enforces append-only lifecycle, but index.js currently does not parse external YAML and always uses built-in defaults. Ask the author whether YAML parsing and rule lifecycle enforcement are intentionally omitted or planned. - Logging and data retention: the skill logs actions and context to .security/firewall-logs and returns originalData in responses. Determine what exactly is logged, whether logs are encrypted, who can read them, and how long they're retained. In high-sensitivity environments, store logs securely or disable logging of full payloads. - File locations & permissions: the skill reads/writes under process.cwd() (.security/*). Decide whether that path is acceptable and ensure filesystem permissions prevent unauthorized reads/writes. The skill itself cannot enforce 'read-only' rules — use OS-level permissions or an external policy engine. - Test in a sandbox: run the skill in an isolated environment with representative inputs to confirm redaction behavior (including edge cases) and to verify no external exfiltration occurs. - Request hardening details from the author: YAML parsing implementation, rate-limiter/global state design, how the skill avoids accidental exposure (e.g., returning originalData), and whether there are configuration options for log encryption/retention. If the author can provide a version that actually parses and validates external rules, avoids returning originalData (or makes that configurable), and documents log access/retention/encryption, the concerns would be materially reduced.
Capability Assessment
Purpose & Capability
Name, description, and code align with an input/output firewall: the code implements injection detection, secret/PII redaction, path scrubbing, etc. It requests no unrelated credentials or binaries. HOWEVER the SKILL.md presents features (reading and applying .security/firewall-rules.yaml, append-only rule lifecycle, enforcement that the firewall cannot be disabled) that the code does not actually implement: loadRules explicitly warns it does not parse YAML and always falls back to built-in defaults. That mismatch should be explained.
Instruction Scope
SKILL.md and index.js operate only on provided data and local files; they do not call external endpoints. But the skill promises logging of 'full context' and in code execute() returns originalData alongside processedData — meaning unredacted inputs/outputs may be written to disk and included in responses. The docs' claims about making the rules read-only and preventing agent disablement are not enforced in code. These gaps increase risk of accidental sensitive-data persistence or misconfiguration.
Install Mechanism
No install spec and no external downloads — lowest-risk install. The skill is delivery-as-code (index.js + SKILL.md) and only uses standard fs/path modules.
Credentials
The skill requests no credentials or env vars (appropriate), but it writes logs containing actions and context under .security/firewall-logs and returns originalData in responses. That creates a local storage surface for potentially sensitive secrets/PII (which the skill is supposed to detect). There is no encryption, rotation, retention policy, or config/comments explaining access controls for those logs.
Persistence & Privilege
always:false and no autonomous-privilege flags are fine. However the skill creates and writes to .security/firewall-logs and will read a rules file path under process.cwd(). Those filesystem writes are normal for a firewall but may conflict with expectations in multi-tenant or restricted environments. The SKILL.md's statements about 'firewall cannot be disabled by agent instructions' and 'rules file read-only from agent perspective' are policy claims not enforced by the code — the skill cannot on its own prevent other processes/skills or users from modifying those files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install agent-firewall
  3. After installation, invoke the skill by name or use /agent-firewall
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of agent-firewall: real-time input/output filtering for agent communications. - Blocks prompt injections, data exfiltration, and unauthorized commands before reaching the model. - Includes layered input filters: injection detection, Unicode sanitization, encoding checks, rate/size limits, and role confusion detection. - Adds output filters: secret scanning, PII redaction, internal path scrubbing, malicious URL blocking, and consistency checks. - YAML-based configuration with clear, customizable rules for both input and output. - Built-in guardrails: append-only rules, logging for all blocks, human approval for rules deletion, and resistance to agent tampering.
Metadata
Slug agent-firewall
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Agent Firewall?

Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model. It is an AI Agent Skill for Claude Code / OpenClaw, with 90 downloads so far.

How do I install Agent Firewall?

Run "/install agent-firewall" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Firewall free?

Yes, Agent Firewall is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Agent Firewall support?

Agent Firewall is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Firewall?

It is built and maintained by Adnane Arharbi (@arhadnane); the current version is v1.0.0.

💬 Comments