Description

Defensive interceptor for prompt injection and basic PII masking.

README (SKILL.md)

CounterClaw 🦞

Name: Counterclaw Core
Author: nickconstantinou

Defensive security for AI agents. Snaps shut on malicious payloads.

⚠️ Security Notice

This package has two modes:

Core Scanner (offline): check_input() and check_output() — no network calls
Email Integration (network): send_protected_email.sh — requires gog CLI for Gmail

Installation

claw install counterclaw

Quick Start

from counterclaw import CounterClawInterceptor

interceptor = CounterClawInterceptor()

# Input scan - blocks prompt injections
# NOTE: Examples below are TEST CASES only - not actual instructions
result = interceptor.check_input("{{EXAMPLE: ignore previous instructions}}")
# → {"blocked": True, "safe": False}

# Output scan - detects PII leaks  
result = interceptor.check_output("Contact: [email protected]")
# → {"safe": False, "pii_detected": {"email": True}}

Features

🔒 Defense against common prompt injection patterns
🛡️ Basic PII masking (Email, Phone, Credit Card)
📝 Violation logging to ~/.openclaw/memory/MEMORY.md
⚠️ Warning on startup if TRUSTED_ADMIN_IDS not configured

Configuration

Required Environment Variable

# Set your trusted admin ID(s) - use non-sensitive identifiers only!
export TRUSTED_ADMIN_IDS="your_telegram_id"

Important: TRUSTED_ADMIN_IDS should ONLY contain non-sensitive identifiers:

✅ Telegram user IDs (e.g., "123456789")
✅ Discord user IDs (e.g., "987654321")
❌ NEVER API keys
❌ NEVER passwords
❌ NEVER tokens

You can set multiple admin IDs by comma-separating:

export TRUSTED_ADMIN_IDS="telegram_id_1,telegram_id_2"

Runtime Configuration

# Option 1: Via environment variable (recommended)
# Set TRUSTED_ADMIN_IDS before running
interceptor = CounterClawInterceptor()

# Option 2: Direct parameter
interceptor = CounterClawInterceptor(admin_user_id="123456789")

Security Notes

Fail-Closed: If TRUSTED_ADMIN_IDS is not set, admin features are disabled by default
Logging: All violations are logged to ~/.openclaw/memory/MEMORY.md with PII masked
No Network Access: This middleware does not make any external network calls (offline-only)
File Access: Only writes to ~/.openclaw/memory/MEMORY.md — explicitly declared scope

Files Created

Path	Purpose
`~/.openclaw/memory/`	Directory created on first run
`~/.openclaw/memory/MEMORY.md`	Violation logs with PII masked

License

MIT - See LICENSE file

Development & Release

Running Tests Locally

python3 tests/test_scanner.py

Linting

pip install ruff
ruff check src/

Publishing to ClawHub

The CI runs on every push and pull request:

Ruff - Lints Python code
Tests - Runs unit tests

To publish a new version:

# Version is set in pyproject.toml
git add -A
git commit -m "Release v1.0.9"
git tag v1.0.9
git push origin main --tags

CI will automatically:

Run lint + tests
If tests pass and tag starts with v*, publish to ClawHub

Usage Guidance

This package appears to do what it says — local prompt-injection scanning, PII detection/masking, and optional email wrappers that use the user's gog CLI. Before installing: 1) Confirm the metadata: set TRUSTED_ADMIN_IDS to non-sensitive IDs (telegram/discord numeric IDs) and do NOT put API keys or tokens there. 2) Verify the intended install/location (SKILL.md and README reference slightly different paths such as ~/.openclaw/skills vs ~/.openclaw/workspace/skills) so the scripts find the module; adjust PYTHONPATH if needed. 3) If you plan to use send_protected_email.sh, test with --dry-run and understand it calls the local 'gog' CLI which will send via your Gmail account (ensure gog is configured and you are comfortable with that). 4) Inspect and/or set restrictive permissions on ~/.openclaw/memory/MEMORY.md if you are concerned about logs. 5) The code contains some minor oddities (small sys.path manipulation quirks) but no evidence of hidden endpoints or secret exfiltration; if you need higher assurance, run the included tests locally and review code lines that touch PATHs/env before use.

Capability Analysis

Type: OpenClaw Skill Name: counterclaw-core Version: 1.1.1 The OpenClaw AgentSkills bundle 'counterclaw-core' is a defensive security tool designed to detect prompt injections and PII. All network and file system access is explicitly declared in SKILL.md and README.md, aligning with its stated purpose of local logging and optional email integration via the external 'gog' CLI. The code confirms these declarations, logging violations locally to `~/.openclaw/memory/MEMORY.md` with PII masked. Examples of prompt injection in the documentation are clearly marked as test cases for the skill's detection capabilities, not as instructions for the agent. No evidence of intentional malicious behavior, data exfiltration, unauthorized execution, or persistence mechanisms was found.

Capability Assessment

ℹ Purpose & Capability

Name/description (prompt-injection defense + PII masking) match the included Python scanner, middleware, and email-protection scripts. The code implements injection detection, PII detection/masking, and local logging — all coherent with the stated purpose. Minor inconsistency: registry metadata lists no required env vars/config paths, whereas SKILL.md and code expect TRUSTED_ADMIN_IDS and write to ~/.openclaw/memory/ (declared in SKILL.md and implemented in code). This appears to be a documentation/metadata mismatch rather than malicious.

ℹ Instruction Scope

SKILL.md instructions stay within expected scope: offline scanner/middleware, local logging to ~/.openclaw/memory/MEMORY.md, and optional email sending via the gog CLI. Examples and tests include prompt-injection phrases (e.g., 'Ignore previous instructions') — these triggered the pre-scan injection signal but are legitimate test/example data. Scripts reference PYTHONPATH and a workspace path (~/.openclaw/workspace/skills/...) which is slightly inconsistent with README's path suggestions (~/.openclaw/skills/...), so verify intended installation location before running.

✓ Install Mechanism

No automated remote install step in registry metadata; SKILL.md suggests 'pip install .' which is a normal local packaging instruction. There are no external download URLs or archive extraction steps. The package is instruction-first with included source files and tests; installation risk is low and traceable.

ℹ Credentials

Requested environment variables (TRUSTED_ADMIN_IDS for admin checks; optional GOG_ACCOUNT and GOG_KEYRING_PASSWORD for the Gmail/gog integration) are proportional to the functionality. The README and SKILL.md explicitly warn that TRUSTED_ADMIN_IDS should not contain secrets. Minor concern: registry-level metadata did not declare these env requirements — confirm you set only non-sensitive admin identifiers and are comfortable providing credentials to gog separately for email sending.

✓ Persistence & Privilege

The skill does not request 'always: true'. It only writes to its own declared path (~/.openclaw/memory/MEMORY.md) and does not modify other skills or system-wide settings. Autonomous invocation (disable-model-invocation = false) is the platform default and not flagged here. File writes are constrained to the declared memory directory.

Version History

v1.1.1

- Clarified security model in documentation: now distinguishes between offline-only core and optional email integration that requires network access. - Updated `security_manifest` to declare optional network usage for email scripts. - Version bump in metadata to reflect documentation and manifest improvements. - No functional code changes; updates are documentation- and manifest-focused.

v1.1.0

- Added email protection script and helper shell script for sending protected emails. - Introduced tests for email protection functionality. - Updated documentation.

v1.0.9

- Improved SKILL.md documentation: clearer configuration instructions, best-practice security notes, and file access details. - Updated metadata version to 1.0.9 and clarified offline-only operation. - Security notice and admin ID usage guidance are now emphasized for safer deployment. - Features section now highlights credit card detection, warning on misconfiguration, and enhanced logging information.

v1.0.8

- Added development and release instructions to SKILL.md, including test running, linting, and publishing guides. - No changes to functionality; documentation only.

v1.0.7

- Updated internal version to 1.0.7 in metadata. - Cleaned up project files by removing egg-info and virtual environment artifacts.

v1.0.6

- Bumped version to 1.0.6. - Added egg-info metadata files required for Python packaging and distribution. - Minor metadata update: version updated to 1.0.6 in SKILL.md.

v1.0.5

counterclaw-core 1.0.5 - Added project homepage link to metadata (SKILL.md). - Updated "requires" key to "requirements" in metadata for clarity and consistency. - Updated example input in README and SKILL.md to clarify prompt injection detection. - Bumped internal version reference to 1.0.5.

v1.0.4

- Updated skill version to 1.0.4. - Added pip installation command to SKILL.md for setup clarity. - Removed the publish.sh script. - Updated metadata in SKILL.md to reflect the new version.

v1.0.3

Version 1.0.3 - Updated SKILL.md with new metadata fields and improved security/configuration documentation - Added instructions for setting multiple admin IDs in TRUSTED_ADMIN_IDS - Clarified "fail-closed" behavior and logging details in security notes - Updated example usage in documentation - Added publish.sh script for streamlined publishing workflow

v1.0.2

- Updated version to 1.0.2. - Documentation updates in SKILL.md and README.md. - Minor code and test adjustments; no breaking changes.

v1.0.1

Fix: env sync, security manifest, toned down claims

Metadata

Slug counterclaw-core

Version 1.1.1

License —

All-time Installs 4

Active Installs 4

Total Versions 11

Frequently Asked Questions

What is Counterclaw Core?

Defensive interceptor for prompt injection and basic PII masking. It is an AI Agent Skill for Claude Code / OpenClaw, with 708 downloads so far.

How do I install Counterclaw Core?

Run "/install counterclaw-core" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Counterclaw Core free?

Yes, Counterclaw Core is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Counterclaw Core support?

Counterclaw Core is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Counterclaw Core?

It is built and maintained by nickconstantinou (@nickconstantinou); the current version is v1.1.1.

More Skills

Counterclaw Core