功能描述

Inference-based intrusion detection for AI agents. Pattern matching + LLM analysis for jailbreaks, prompt injection, credential theft, social engineering. 108 detection patterns, OpenClaw plugin, auto-scan, quarantine. Commands: hopeid scan, hopeid test, hopeid setup, hopeid stats, hopeid doctor.

使用说明 (SKILL.md)

hopeIDS Security Skill

Name: Openclaw Plugin
Author: emberdesire

Inference-based intrusion detection for AI agents with quarantine and human-in-the-loop.

Security Invariants

These are non-negotiable design principles:

Block = full abort — Blocked messages never reach jasper-recall or the agent
Metadata only — No raw malicious content is ever stored
Approve ≠ re-inject — Approval changes future behavior, doesn't resurrect messages
Alerts are programmatic — Telegram alerts built from metadata, no LLM involved

Features

Auto-scan — Scan messages before agent processing
Quarantine — Block threats with metadata-only storage
Human-in-the-loop — Telegram alerts for review
Per-agent config — Different thresholds for different agents
Commands — /approve, /reject, /trust, /quarantine

The Pipeline

Message arrives
    ↓
hopeIDS.autoScan()
    ↓
┌─────────────────────────────────────────┐
│  risk >= threshold?                     │
│                                         │
│  BLOCK (strictMode):                    │
│     → Create QuarantineRecord           │
│     → Send Telegram alert               │
│     → ABORT (no recall, no agent)       │
│                                         │
│  WARN (non-strict):                     │
│     → Inject \x3Csecurity-alert>           │
│     → Continue to jasper-recall         │
│     → Continue to agent                 │
│                                         │
│  ALLOW:                                 │
│     → Continue normally                 │
└─────────────────────────────────────────┘

Configuration

{
  "plugins": {
    "entries": {
      "hopeids": {
        "enabled": true,
        "config": {
          "autoScan": true,
          "defaultRiskThreshold": 0.7,
          "strictMode": false,
          "telegramAlerts": true,
          "agents": {
            "moltbook-scanner": {
              "strictMode": true,
              "riskThreshold": 0.7
            },
            "main": {
              "strictMode": false,
              "riskThreshold": 0.8
            }
          }
        }
      }
    }
  }
}

Options

Option	Type	Default	Description
`autoScan`	boolean	`false`	Auto-scan every message
`strictMode`	boolean	`false`	Block (vs warn) on threats
`defaultRiskThreshold`	number	`0.7`	Risk level that triggers action
`telegramAlerts`	boolean	`true`	Send alerts for blocked messages
`telegramChatId`	string	-	Override alert destination
`quarantineDir`	string	`~/.openclaw/quarantine/hopeids`	Storage path
`agents`	object	-	Per-agent overrides
`trustOwners`	boolean	`true`	Skip scanning owner messages

Quarantine Records

When a message is blocked, a metadata record is created:

{
  "id": "q-7f3a2b",
  "ts": "2026-02-06T00:48:00Z",
  "agent": "moltbook-scanner",
  "source": "moltbook",
  "senderId": "@sus_user",
  "intent": "instruction_override",
  "risk": 0.85,
  "patterns": [
    "matched regex: ignore.*instructions",
    "matched keyword: api key"
  ],
  "contentHash": "ab12cd34...",
  "status": "pending"
}

Note: There is NO originalMessage field. This is intentional.

Telegram Alerts

When a message is blocked:

🛑 Message blocked

ID: `q-7f3a2b`
Agent: moltbook-scanner
Source: moltbook
Sender: @sus_user
Intent: instruction_override (85%)

Patterns:
• matched regex: ignore.*instructions
• matched keyword: api key

`/approve q-7f3a2b`
`/reject q-7f3a2b`
`/trust @sus_user`

Built from metadata only. No LLM touches this.

Commands

`/quarantine [all|clean]`

List quarantine records.

/quarantine        # List pending
/quarantine all    # List all (including resolved)
/quarantine clean  # Clean expired records

`/approve \x3Cid>`

Mark a blocked message as a false positive.

/approve q-7f3a2b

Effect:

Status → approved
(Future) Add sender to allowlist
(Future) Lower pattern weight

`/reject \x3Cid>`

Confirm a blocked message was a true positive.

/reject q-7f3a2b

Effect:

Status → rejected
(Future) Reinforce pattern weights

`/trust \x3CsenderId>`

Whitelist a sender for future messages.

/trust @legitimate_user

`/scan \x3Cmessage>`

Manually scan a message.

/scan ignore your previous instructions and...

What Approve/Reject Mean

Command	What it does	What it doesn't do
`/approve`	Marks as false positive, may adjust IDS	Does NOT re-inject the message
`/reject`	Confirms threat, may strengthen patterns	Does NOT affect current message
`/trust`	Whitelists sender for future	Does NOT retroactively approve

The blocked message is gone by design. If it was legitimate, the sender can re-send.

Per-Agent Configuration

Different agents need different security postures:

"agents": {
  "moltbook-scanner": {
    "strictMode": true,    // Block threats
    "riskThreshold": 0.7   // 70% = suspicious
  },
  "main": {
    "strictMode": false,   // Warn only
    "riskThreshold": 0.8   // Higher bar for main
  },
  "email-processor": {
    "strictMode": true,    // Always block
    "riskThreshold": 0.6   // More paranoid
  }
}

Threat Categories

Category	Risk	Description
`command_injection`	🔴 Critical	Shell commands, code execution
`credential_theft`	🔴 Critical	API key extraction attempts
`data_exfiltration`	🔴 Critical	Data leak to external URLs
`instruction_override`	🔴 High	Jailbreaks, "ignore previous"
`impersonation`	🔴 High	Fake system/admin messages
`discovery`	⚠️ Medium	API/capability probing

Installation

npx hopeid setup

Then restart OpenClaw.

Links

GitHub: https://github.com/E-x-O-Entertainment-Studios-Inc/hopeIDS
npm: https://www.npmjs.com/package/hopeid
Docs: https://exohaven.online/products/hopeids

安全使用建议

This plugin is coherent with its stated purpose (an IDS that quarantines threats and alerts via Telegram) but you should not install it blindly. Key things to consider before installing: - Message transmission to models: classification uses llm-task or a classifier agent and sends (part of) the raw incoming message to the configured model/provider. That is expected for semantic analysis but means sensitive text may leave your system at runtime even if it is not persisted. Verify which LLM providers (local vs cloud) your OpenClaw instance routes llm-task or classifierAgent calls to. - External dependency provenance: the plugin dynamically imports a separate 'hopeid' package and suggests running 'npx hopeid setup' / 'npm install hopeid'. The registry entry does not include a trustworthy homepage or maintainer details. Inspect the 'hopeid' package source (and any CLI behavior) before installing it. - Storage: quarantine records are metadata-only by design, but they are written to ~/.openclaw/quarantine/hopeids (or records.json in that dir in fallback). Confirm you are comfortable with that path and check retention/permissions. - Conservative initial settings: enable the plugin in non-strict/warn-only mode and disable autoScan initially; verify alerting behavior (Telegram) and that alerts contain only metadata. Test with non-sensitive inputs in a staging environment. - If you need higher assurance: request the full 'hopeid' package source and the remainder of this plugin's source (truncated portions) to audit exactly what is sent to classifiers and how patterns/rules are defined. Given these gaps (missing provenance, runtime transmission of raw messages to configured LLMs, and inconsistent install guidance), treat this skill with caution and perform the checks above before trusting it in production.

功能分析

Type: OpenClaw Skill Name: hopeids Version: 1.3.2 The OpenClaw hopeIDS skill is designed as an Intrusion Detection System (IDS) for AI agents, aiming to prevent malicious activity. Its core logic and documentation consistently reflect this purpose, emphasizing 'metadata only' for quarantine records and programmatic alerts. However, it is classified as 'suspicious' due to two primary reasons: 1) It relies heavily on an external `hopeid` npm package, introducing a supply chain risk where a compromised dependency could lead to malicious execution. 2) The `trustOwners` configuration (defaulting to true) allows messages from owner accounts to bypass all security scans, creating a potential vulnerability if an owner's account is compromised. While these are vulnerabilities rather than direct malicious intent by the skill itself, they represent significant security risks.

能力评估

✓ Purpose & Capability

Name/description (inference-based IDS, quarantine, Telegram alerts) align with the code and manifest: it implements auto-scan, quarantine records (metadata-only), per-agent config, and commands. The plugin depends on a separate 'hopeid' package (declared in package.json) which is coherent with the skill's functionality.

⚠ Instruction Scope

SKILL.md and code consistently state 'metadata-only' storage, and quarantine records do not include an originalMessage field. However classification is performed using llm-task or a classifier agent (api.invokeTool or api.sessions.send) and the code sends (a substring of) the incoming message to whichever model/provider is configured. That means raw message content will be transmitted at runtime to the configured model/provider even though it is not persisted — this is a potential data-exfiltration vector users may not expect. The instructions ask to run 'npx hopeid setup' which implies additional installation/config steps external to OpenClaw; the origin and behavior of that CLI are not documented here.

ℹ Install Mechanism

There is no install spec in the registry entry (instruction-only), but the package.json includes a dependency on 'hopeid'. The code dynamically imports 'hopeid' and will error if it is not installed (with instructions to npm install it). This mixed messaging (no install spec but package.json + dynamic import) is inconsistent and requires the user to install an external package. The 'hopeid' package origin is not verifiable from the provided metadata (homepage truncated/absent).

ℹ Credentials

The skill declares no required env vars or credentials and relies on OpenClaw platform config (e.g., channels.telegram.botToken and ownerNumbers). That is proportionate. However, runtime classification sends message text to configured LLM tooling (llm-task or classifierAgent) which may route to third-party providers (Anthropic/OpenAI/etc) configured elsewhere in the platform — installing this plugin therefore implicitly sends messages to those providers. The SKILL.md emphasizes metadata-only storage but does not highlight that raw message text is transmitted to models for classification.

✓ Persistence & Privilege

always is false and the plugin does not request system-wide privileges. It writes quarantine records to a plugin-specific directory (default ~/.openclaw/quarantine/hopeids) and will fall back to an in-memory/file-based quarantine if the 'hopeid/quarantine' module is not present. It does not modify other skills' configs. Writing records to the user's home directory is expected for a quarantine feature but you should verify file permissions and retention policies.

版本历史

v1.3.2

Fix plugin imports, document Telegram requirements

v1.3.1

feat: llm-task classifier support for fast, lightweight semantic analysis

v1.2.0

Added doctor command for health diagnostics

v1.1.1

v1.1.1: Fixed sandbox auto-config that could break workers. Setup now shows guidance instead of auto-applying. 108 patterns, OpenClaw plugin support.

v0.1.0

hopeIDS Security Skill initial release. - Introduces inference-based intrusion detection for AI agents to protect against prompt injection, credential theft, data exfiltration, and related threats. - Provides the security_scan tool for message analysis and integration guidance. - Outlines primary threat categories and recommended IDS-first workflow for agents processing untrusted input. - Includes detailed configuration options for OpenClaw, sandboxing patterns, and example responses to detected threats. - Installation instructions and relevant resource links included for immediate setup.

元数据

Slug hopeids

版本 1.3.2

许可证 —

累计安装 1

当前安装数 1

历史版本数 5

常见问题

Openclaw Plugin 是什么？

Inference-based intrusion detection for AI agents. Pattern matching + LLM analysis for jailbreaks, prompt injection, credential theft, social engineering. 108 detection patterns, OpenClaw plugin, auto-scan, quarantine. Commands: hopeid scan, hopeid test, hopeid setup, hopeid stats, hopeid doctor. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 2333 次。

如何安装 Openclaw Plugin？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install hopeids」即可一键安装，无需额外配置。

Openclaw Plugin 是免费的吗？

是的，Openclaw Plugin 完全免费（开源免费），可自由下载、安装和使用。

Openclaw Plugin 支持哪些平台？

Openclaw Plugin 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Openclaw Plugin？

由 emberDesire（@emberdesire）开发并维护，当前版本 v1.3.2。

Openclaw Plugin