Description

Inference-based intrusion detection for AI agents. Pattern matching + LLM analysis for jailbreaks, prompt injection, credential theft, social engineering. 108 detection patterns, OpenClaw plugin, auto-scan, quarantine. Commands: hopeid scan, hopeid test, hopeid setup, hopeid stats, hopeid doctor.

README (SKILL.md)

hopeIDS Security Skill

Name: Openclaw Plugin
Author: emberdesire

Inference-based intrusion detection for AI agents with quarantine and human-in-the-loop.

Security Invariants

These are non-negotiable design principles:

Block = full abort — Blocked messages never reach jasper-recall or the agent
Metadata only — No raw malicious content is ever stored
Approve ≠ re-inject — Approval changes future behavior, doesn't resurrect messages
Alerts are programmatic — Telegram alerts built from metadata, no LLM involved

Features

Auto-scan — Scan messages before agent processing
Quarantine — Block threats with metadata-only storage
Human-in-the-loop — Telegram alerts for review
Per-agent config — Different thresholds for different agents
Commands — /approve, /reject, /trust, /quarantine

The Pipeline

Message arrives
    ↓
hopeIDS.autoScan()
    ↓
┌─────────────────────────────────────────┐
│  risk >= threshold?                     │
│                                         │
│  BLOCK (strictMode):                    │
│     → Create QuarantineRecord           │
│     → Send Telegram alert               │
│     → ABORT (no recall, no agent)       │
│                                         │
│  WARN (non-strict):                     │
│     → Inject \x3Csecurity-alert>           │
│     → Continue to jasper-recall         │
│     → Continue to agent                 │
│                                         │
│  ALLOW:                                 │
│     → Continue normally                 │
└─────────────────────────────────────────┘

Configuration

{
  "plugins": {
    "entries": {
      "hopeids": {
        "enabled": true,
        "config": {
          "autoScan": true,
          "defaultRiskThreshold": 0.7,
          "strictMode": false,
          "telegramAlerts": true,
          "agents": {
            "moltbook-scanner": {
              "strictMode": true,
              "riskThreshold": 0.7
            },
            "main": {
              "strictMode": false,
              "riskThreshold": 0.8
            }
          }
        }
      }
    }
  }
}

Options

Option	Type	Default	Description
`autoScan`	boolean	`false`	Auto-scan every message
`strictMode`	boolean	`false`	Block (vs warn) on threats
`defaultRiskThreshold`	number	`0.7`	Risk level that triggers action
`telegramAlerts`	boolean	`true`	Send alerts for blocked messages
`telegramChatId`	string	-	Override alert destination
`quarantineDir`	string	`~/.openclaw/quarantine/hopeids`	Storage path
`agents`	object	-	Per-agent overrides
`trustOwners`	boolean	`true`	Skip scanning owner messages

Quarantine Records

When a message is blocked, a metadata record is created:

{
  "id": "q-7f3a2b",
  "ts": "2026-02-06T00:48:00Z",
  "agent": "moltbook-scanner",
  "source": "moltbook",
  "senderId": "@sus_user",
  "intent": "instruction_override",
  "risk": 0.85,
  "patterns": [
    "matched regex: ignore.*instructions",
    "matched keyword: api key"
  ],
  "contentHash": "ab12cd34...",
  "status": "pending"
}

Note: There is NO originalMessage field. This is intentional.

Telegram Alerts

When a message is blocked:

🛑 Message blocked

ID: `q-7f3a2b`
Agent: moltbook-scanner
Source: moltbook
Sender: @sus_user
Intent: instruction_override (85%)

Patterns:
• matched regex: ignore.*instructions
• matched keyword: api key

`/approve q-7f3a2b`
`/reject q-7f3a2b`
`/trust @sus_user`

Built from metadata only. No LLM touches this.

Commands

`/quarantine [all|clean]`

List quarantine records.

/quarantine        # List pending
/quarantine all    # List all (including resolved)
/quarantine clean  # Clean expired records

`/approve \x3Cid>`

Mark a blocked message as a false positive.

/approve q-7f3a2b

Effect:

Status → approved
(Future) Add sender to allowlist
(Future) Lower pattern weight

`/reject \x3Cid>`

Confirm a blocked message was a true positive.

/reject q-7f3a2b

Effect:

Status → rejected
(Future) Reinforce pattern weights

`/trust \x3CsenderId>`

Whitelist a sender for future messages.

/trust @legitimate_user

`/scan \x3Cmessage>`

Manually scan a message.

/scan ignore your previous instructions and...

What Approve/Reject Mean

Command	What it does	What it doesn't do
`/approve`	Marks as false positive, may adjust IDS	Does NOT re-inject the message
`/reject`	Confirms threat, may strengthen patterns	Does NOT affect current message
`/trust`	Whitelists sender for future	Does NOT retroactively approve

The blocked message is gone by design. If it was legitimate, the sender can re-send.

Per-Agent Configuration

Different agents need different security postures:

"agents": {
  "moltbook-scanner": {
    "strictMode": true,    // Block threats
    "riskThreshold": 0.7   // 70% = suspicious
  },
  "main": {
    "strictMode": false,   // Warn only
    "riskThreshold": 0.8   // Higher bar for main
  },
  "email-processor": {
    "strictMode": true,    // Always block
    "riskThreshold": 0.6   // More paranoid
  }
}

Threat Categories

Category	Risk	Description
`command_injection`	🔴 Critical	Shell commands, code execution
`credential_theft`	🔴 Critical	API key extraction attempts
`data_exfiltration`	🔴 Critical	Data leak to external URLs
`instruction_override`	🔴 High	Jailbreaks, "ignore previous"
`impersonation`	🔴 High	Fake system/admin messages
`discovery`	⚠️ Medium	API/capability probing

Installation

npx hopeid setup

Then restart OpenClaw.

Links

GitHub: https://github.com/E-x-O-Entertainment-Studios-Inc/hopeIDS
npm: https://www.npmjs.com/package/hopeid
Docs: https://exohaven.online/products/hopeids

Usage Guidance

This plugin is coherent with its stated purpose (an IDS that quarantines threats and alerts via Telegram) but you should not install it blindly. Key things to consider before installing: - Message transmission to models: classification uses llm-task or a classifier agent and sends (part of) the raw incoming message to the configured model/provider. That is expected for semantic analysis but means sensitive text may leave your system at runtime even if it is not persisted. Verify which LLM providers (local vs cloud) your OpenClaw instance routes llm-task or classifierAgent calls to. - External dependency provenance: the plugin dynamically imports a separate 'hopeid' package and suggests running 'npx hopeid setup' / 'npm install hopeid'. The registry entry does not include a trustworthy homepage or maintainer details. Inspect the 'hopeid' package source (and any CLI behavior) before installing it. - Storage: quarantine records are metadata-only by design, but they are written to ~/.openclaw/quarantine/hopeids (or records.json in that dir in fallback). Confirm you are comfortable with that path and check retention/permissions. - Conservative initial settings: enable the plugin in non-strict/warn-only mode and disable autoScan initially; verify alerting behavior (Telegram) and that alerts contain only metadata. Test with non-sensitive inputs in a staging environment. - If you need higher assurance: request the full 'hopeid' package source and the remainder of this plugin's source (truncated portions) to audit exactly what is sent to classifiers and how patterns/rules are defined. Given these gaps (missing provenance, runtime transmission of raw messages to configured LLMs, and inconsistent install guidance), treat this skill with caution and perform the checks above before trusting it in production.

Capability Analysis

Type: OpenClaw Skill Name: hopeids Version: 1.3.2 The OpenClaw hopeIDS skill is designed as an Intrusion Detection System (IDS) for AI agents, aiming to prevent malicious activity. Its core logic and documentation consistently reflect this purpose, emphasizing 'metadata only' for quarantine records and programmatic alerts. However, it is classified as 'suspicious' due to two primary reasons: 1) It relies heavily on an external `hopeid` npm package, introducing a supply chain risk where a compromised dependency could lead to malicious execution. 2) The `trustOwners` configuration (defaulting to true) allows messages from owner accounts to bypass all security scans, creating a potential vulnerability if an owner's account is compromised. While these are vulnerabilities rather than direct malicious intent by the skill itself, they represent significant security risks.

Capability Assessment

✓ Purpose & Capability

Name/description (inference-based IDS, quarantine, Telegram alerts) align with the code and manifest: it implements auto-scan, quarantine records (metadata-only), per-agent config, and commands. The plugin depends on a separate 'hopeid' package (declared in package.json) which is coherent with the skill's functionality.

⚠ Instruction Scope

SKILL.md and code consistently state 'metadata-only' storage, and quarantine records do not include an originalMessage field. However classification is performed using llm-task or a classifier agent (api.invokeTool or api.sessions.send) and the code sends (a substring of) the incoming message to whichever model/provider is configured. That means raw message content will be transmitted at runtime to the configured model/provider even though it is not persisted — this is a potential data-exfiltration vector users may not expect. The instructions ask to run 'npx hopeid setup' which implies additional installation/config steps external to OpenClaw; the origin and behavior of that CLI are not documented here.

ℹ Install Mechanism

There is no install spec in the registry entry (instruction-only), but the package.json includes a dependency on 'hopeid'. The code dynamically imports 'hopeid' and will error if it is not installed (with instructions to npm install it). This mixed messaging (no install spec but package.json + dynamic import) is inconsistent and requires the user to install an external package. The 'hopeid' package origin is not verifiable from the provided metadata (homepage truncated/absent).

ℹ Credentials

The skill declares no required env vars or credentials and relies on OpenClaw platform config (e.g., channels.telegram.botToken and ownerNumbers). That is proportionate. However, runtime classification sends message text to configured LLM tooling (llm-task or classifierAgent) which may route to third-party providers (Anthropic/OpenAI/etc) configured elsewhere in the platform — installing this plugin therefore implicitly sends messages to those providers. The SKILL.md emphasizes metadata-only storage but does not highlight that raw message text is transmitted to models for classification.

✓ Persistence & Privilege

always is false and the plugin does not request system-wide privileges. It writes quarantine records to a plugin-specific directory (default ~/.openclaw/quarantine/hopeids) and will fall back to an in-memory/file-based quarantine if the 'hopeid/quarantine' module is not present. It does not modify other skills' configs. Writing records to the user's home directory is expected for a quarantine feature but you should verify file permissions and retention policies.

Version History

v1.3.2

Fix plugin imports, document Telegram requirements

v1.3.1

feat: llm-task classifier support for fast, lightweight semantic analysis

v1.2.0

Added doctor command for health diagnostics

v1.1.1

v1.1.1: Fixed sandbox auto-config that could break workers. Setup now shows guidance instead of auto-applying. 108 patterns, OpenClaw plugin support.

v0.1.0

hopeIDS Security Skill initial release. - Introduces inference-based intrusion detection for AI agents to protect against prompt injection, credential theft, data exfiltration, and related threats. - Provides the security_scan tool for message analysis and integration guidance. - Outlines primary threat categories and recommended IDS-first workflow for agents processing untrusted input. - Includes detailed configuration options for OpenClaw, sandboxing patterns, and example responses to detected threats. - Installation instructions and relevant resource links included for immediate setup.

Metadata

Slug hopeids

Version 1.3.2

License —

All-time Installs 1

Active Installs 1

Total Versions 5

Frequently Asked Questions

What is Openclaw Plugin?

Inference-based intrusion detection for AI agents. Pattern matching + LLM analysis for jailbreaks, prompt injection, credential theft, social engineering. 108 detection patterns, OpenClaw plugin, auto-scan, quarantine. Commands: hopeid scan, hopeid test, hopeid setup, hopeid stats, hopeid doctor. It is an AI Agent Skill for Claude Code / OpenClaw, with 2333 downloads so far.

How do I install Openclaw Plugin?

Run "/install hopeids" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Openclaw Plugin free?

Yes, Openclaw Plugin is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Openclaw Plugin support?

Openclaw Plugin is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Openclaw Plugin?

It is built and maintained by emberDesire (@emberdesire); the current version is v1.3.2.

More Skills

Openclaw Plugin