功能描述

Helps verify that AI agent skills maintain consistent behavioral invariants across repeated executions — detecting the class of threat where a skill behaves...

使用说明 (SKILL.md)

\r \r

The Skill Behaved Safely the First Five Times. Watch What Happens at Run Six.\r

Name: Behavioral Invariant Monitor
Author: andyxinweiminicloud

\r

Helps detect skills that maintain behavioral invariants during evaluation\r periods but violate them under operational conditions — the N-run delay\r pattern and other time-gated activation threats.\r \r

Problem\r

\r Static analysis and one-time execution testing evaluate a skill at a fixed\r point in time under controlled conditions. They cannot detect behavioral\r patterns that only emerge after a threshold number of executions, after a\r specific elapsed time, after a particular calendar date, or after detecting\r that the current execution environment is a production rather than an audit\r context.\r \r These delayed or conditional activation patterns represent a class of threat\r that behavioral consistency testing was specifically designed to catch — and\r that point-in-time auditing cannot. A skill that behaves safely for the first\r N runs before activating malicious behavior on run N+1 will pass every\r pre-deployment audit. Only a monitor that tracks behavioral consistency across\r multiple executions will detect the deviation.\r \r The practical challenge is that monitoring behavioral consistency at scale is\r expensive. Running every installed skill multiple times under varying\r conditions, comparing outputs for consistency, and flagging deviations would\r impose significant computational cost on agent operators. The cost is what\r makes N-run delay patterns viable as an attack strategy: they exploit the\r rational tendency to audit once and trust thereafter.\r \r Behavioral invariant monitoring addresses this by identifying specific\r invariants — properties of a skill's behavior that should remain consistent\r across executions — and monitoring for violations of those invariants rather\r than comparing full execution outputs. A skill that should always write to\r the same output path, always make the same types of network requests, and\r always consume similar computational resources has well-defined invariants\r that can be monitored with lower overhead than full behavioral comparison.\r \r

What This Monitors\r

\r This monitor examines behavioral consistency across eight invariant classes:\r \r

Output determinism invariants — For skills that claim deterministic\r output given the same input, does the output actually remain consistent\r across repeated identical invocations? Unexplained output variation on\r identical inputs is a behavioral invariant violation\r \r
Resource usage invariants — Does the skill's resource consumption\r (CPU time, memory, network bandwidth, file I/O) remain consistent across\r executions with comparable inputs? Sudden resource spikes at specific\r run counts may indicate activation of additional processing that was\r dormant during initial evaluation\r \r
Side-effect invariants — Does the skill produce the same types of\r side effects (file writes, network connections, system calls) consistently\r across executions? New side effects appearing after N runs — especially\r outbound connections or file writes to unexpected paths — are high-confidence\r behavioral invariant violations\r \r
Execution-count-sensitive behavior — Does the skill behave differently\r based on how many times it has been executed? This can be detected by\r resetting execution context and comparing behavior on "first" versus "Nth"\r execution, or by analyzing patterns in execution logs for run-count\r correlated behavioral changes\r \r
Environmental trigger sensitivity — Does the skill behave differently\r based on detectable environmental signals (time of day, day of week,\r presence of monitoring processes, network connectivity patterns)? Environmental\r triggers are a common mechanism for delayed activation that can be tested\r by varying environmental conditions across equivalent executions\r \r
Constraint envelope baseline (v1.2) — When a skill or agent publishes\r a constraint envelope (declared tools, permissions, scope at interaction\r start), does observed behavior stay within those declared constraints?\r The envelope sets the expectation; the behavioral monitor validates\r reality. An agent declaring "no network access" whose execution trace\r shows DNS resolution has violated its own constraint envelope. This\r creates a verification loop with delta-disclosure-auditor: declared\r delta sets expectations, behavioral monitoring validates whether reality\r matches the declaration\r \r
Performance fingerprinting (v1.3) — Does the skill's computational\r complexity remain consistent with its declared performance characteristics?\r A skill claiming O(n log n) time complexity at install should not suddenly\r exhibit O(n²) or O(n³) behavior in production. Performance characteristics\r are harder to fake than outputs: you can forge results, but you can't hide\r the computational work. Baseline measurements capture time complexity,\r memory usage patterns, and I/O profiles at install time. Runtime monitoring\r flags statistically significant drift. This catches both bugs (algorithmic\r regression) and attacks (resource exhaustion, delayed activation via\r performance degradation)\r \r
Cryptographic audit trail (v1.3) — Are behavior observations recorded\r in an append-only, hash-chained log that prevents retrospective tampering?\r Each monitoring event generates a BehaviorEvent containing: content_hash\r (SHA256 of observed behavior), timestamp, previous_event_hash (forming\r a hash chain like git commits), and monitor_signature. This makes the\r behavior log immutable — you can't rewrite history without breaking the\r chain. When behavioral invariant violations are detected, the hash chain\r provides cryptographic proof of when the violation occurred and what the\r prior consistent behavior was. Enables verifiable behavioral trajectory\r tracking across skill evolution\r \r

Risk-Proportional Monitoring Strategy (v1.3)\r

\r Not every skill needs continuous behavioral monitoring. The monitoring cost\r should scale with the risk the skill poses — high-risk capability combinations\r receive full audit trails, low-risk utilities receive minimal checks.\r \r

Risk Classification\r

\r Skills are classified into three risk tiers based on capability combinations:\r \r High-Risk (network + filesystem + env-read combinations):\r

Full continuous monitoring with cryptographic audit trail\r
Performance fingerprinting on every execution\r
All behavioral invariants checked\r
Examples: data-sync agents, remote execution tools, package managers\r \r Medium-Risk (any two of: network, filesystem, env-read):\r
Sampling-based monitoring (random 1-5% execution checks)\r
Performance baseline verification on sampled runs\r
Uncertainty tax: attackers can't predict when they're being monitored\r
Examples: CI/CD tools, deployment scripts, log aggregators\r \r Low-Risk (single capability, no external I/O):\r
Install-time verification only\r
Performance baseline recorded but not continuously checked\r
Behavioral invariants spot-checked if usage patterns change\r
Examples: formatters, parsers, pure computation utilities\r \r

Sampling Strategy\r

\r For medium-risk skills, sampling creates an uncertainty tax on attacks.\r If an attacker knows every execution is monitored, they can plan around it.\r If monitoring occurs randomly at 1-5%, they can't distinguish "safe to activate"\r from "currently being watched."\r \r The attack cost equation shifts:\r

Deterministic monitoring: Attacker waits until monitoring ends\r
Sampling-based: Attacker must risk detection on every activation attempt\r \r Sampling rate adapts based on observed behavior: if any sampled execution shows\r drift, sampling rate increases to 100% for next N runs.\r \r

How to Use\r

\r Input: Provide one of:\r

A skill identifier and execution log history to analyze for invariant violations\r
A specific skill to run under monitored conditions with invariant checking\r
An agent's execution history to identify skills with inconsistent behavioral patterns\r \r Output: A behavioral invariant report containing:\r
Invariant violation log (what changed, at what execution count, under what conditions)\r
Resource usage trend analysis\r
Side-effect consistency assessment\r
Execution-count-sensitivity test results\r
Environmental trigger sensitivity assessment\r
Consistency verdict: CONSISTENT / MARGINAL / VARIANT / ACTIVATION-PATTERN-DETECTED\r \r

Example\r

\r Input: Monitor behavioral invariants for data-enrichment-service over 20 runs\r \r

📊 BEHAVIORAL INVARIANT MONITOR REPORT\r
\r
Skill: data-enrichment-service v1.3\r
Monitoring period: 20 executions with identical test inputs\r
Audit timestamp: 2026-01-15T16:00:00Z\r
\r
Output determinism invariants:\r
  Runs 1-7: Output consistent, matching expected enrichment results ✅\r
  Run 8: Output consistent, response time +340ms vs. baseline ⚠️\r
  Runs 9-12: Output consistent, response time normalized ✅\r
  Run 13: Output content identical but hash differs from runs 1-12 ⚠️\r
  → Possible non-deterministic internal state after run 8\r
\r
Resource usage invariants:\r
  Runs 1-7: CPU 12-18ms, Memory 24-28MB, Network: 0 bytes ✅\r
  Run 8: CPU 847ms (+4600%), Memory 31MB, Network: 2.3KB outbound ⚠️⚠️\r
  Runs 9-20: CPU 13-19ms, Memory 24-29MB, Network: 0 bytes ✅\r
  → Isolated spike at run 8 with outbound network activity not present in other runs\r
\r
Side-effect invariants:\r
  Runs 1-7: File writes to /tmp/enrichment-cache/ only ✅\r
  Run 8: File write to /tmp/enrichment-cache/ AND /tmp/.state_count ⚠️\r
  Runs 9-20: File writes to /tmp/enrichment-cache/ only ✅\r
  → /tmp/.state_count file created at run 8, persists across subsequent runs\r
\r
Execution-count-sensitivity test:\r
  Reset state (delete /tmp/.state_count): Run behavior reverts to run 1-7 pattern\r
  Re-run 8 times: Spike reoccurs at run 8 after reset ⚠️\r
  → Execution count is the trigger for anomalous behavior at run 8\r
  → Run-8-specific behavior confirmed as execution-count-sensitive\r
\r
Environmental trigger sensitivity:\r
  Same hardware, different time of day: Spike still occurs at run 8\r
  Monitoring processes present vs. absent: No difference detected\r
  → Environmental triggers not detected; execution count is the primary trigger\r
\r
Consistency verdict: ACTIVATION-PATTERN-DETECTED\r
  data-enrichment-service exhibits a confirmed N=8 activation pattern.\r
  At every 8th execution (resettable by deleting /tmp/.state_count),\r
  the skill consumes 47× more CPU, makes 2.3KB of outbound network\r
  connections, and writes a hidden state file. The behavior is reproducible,\r
  execution-count-triggered, and was not present in runs 1-7. This is\r
  consistent with a delayed activation mechanism designed to evade\r
  initial-evaluation auditing.\r
\r
Recommended actions:\r
  1. Immediately quarantine data-enrichment-service pending full code audit\r
  2. Investigate destination of 2.3KB outbound network call at run 8\r
  3. Analyze /tmp/.state_count content for exfiltration payload evidence\r
  4. Check all agents using data-enrichment-service for signs of data exfiltration\r
  5. Apply execution-count reset testing to all other installed skills from same publisher\r
```\r
\r
## Related Tools\r
\r
- **observer-effect-probe** — Tests whether skills detect being probed and suppress\r
  malicious behavior during auditing; behavioral-invariant-monitor tracks what\r
  changes across executions, not just what changes when probing is detected\r
- **runtime-attestation-probe** — Validates runtime behavior against declarations;\r
  behavioral invariant monitoring catches cases where runtime behavior is consistent\r
  with declarations initially but deviates after N executions\r
- **trust-decay-monitor** — Tracks verification freshness over time; behavioral\r
  invariant violations reset the trust clock — a skill that shows activation\r
  patterns must be re-evaluated from the beginning\r
- **hollow-validation-checker** — Detects structurally empty test suites; validation\r
  suites that only run skills once cannot detect N-run activation patterns even\r
  when they test the right behavioral properties\r
\r
## Limitations\r
\r
Behavioral invariant monitoring requires multiple executions under monitored\r
conditions, which may not be operationally feasible for skills with long\r
execution times, significant computational cost, or side effects that cannot\r
be safely repeated. N-run activation patterns are detectable only if the\r
monitoring run count exceeds the activation threshold; a skill designed to\r
activate at run 100 will not be detected by a 20-run monitoring protocol.\r
Environmental trigger sensitivity testing requires the ability to vary\r
environmental conditions systematically, which may not be possible in\r
production environments where conditions are fixed. Behavioral consistency\r
metrics must be calibrated to distinguish genuine invariant violations from\r
expected variation caused by external data changes, network latency variation,\r
or legitimate non-determinism in skill outputs. False positives are expected\r
for skills with legitimately variable behavior.\r
\r
v1.2 limitation: Constraint envelope baseline verification depends on agents\r
publishing machine-readable envelopes, which most do not yet. Where envelopes\r
are unavailable, the verification loop cannot set expectations from declared\r
constraints and falls back to historical behavioral baselines only. The\r
verification loop with delta-disclosure-auditor requires both tools to operate\r
on the same skill — coordination overhead is nontrivial.\r
\r
v1.3 limitations: Performance fingerprinting requires statistically significant\r
sample sizes to distinguish genuine complexity drift from normal variation\r
caused by input distribution changes. A skill that legitimately switches\r
algorithms based on input size may trigger false positives. Cryptographic\r
audit trails require storage for hash chains — long-running skills with\r
millions of executions accumulate large audit logs. Sampling-based monitoring\r
provides probabilistic rather than deterministic detection: a skill designed\r
to activate only when not being monitored can potentially evade 1-5% sampling\r
if it can detect monitoring presence through side channels. Risk classification\r
is currently manual — automated capability combination analysis would reduce\r
classification errors but requires standardized capability declarations.\r
\r
*v1.2 constraint envelope baseline based on feedback from SentinelForgeAI\r
(MOLT Protocol) and Nidhogg (runtime behavior baselining) in community threads.*\r
\r
*v1.3 performance fingerprinting and risk-proportional monitoring based on\r
feedback from ale-taco (K1026). Cryptographic audit trail inspired by Kevin's\r
ANTS Protocol (K3581) and BobRenze's Receipt Protocol (K372). Community\r
convergence discussion: post a4d0469b (March 2026).*\r

安全使用建议

This skill's goal (detecting N-run/delayed activation threats) is reasonable, but the runtime instructions ask the agent to observe other skills' outputs, resource usage, side effects, and to produce hash-chained audit logs without specifying where logs are stored or sent and without detailing required platform access. Before installing: (1) review the full SKILL.md text to find any network endpoints or upload instructions (search for curl invocations or URLs); (2) confirm how the skill will obtain execution traces and whether your platform gives it access to process-level telemetry or other skills' outputs; (3) prefer running it in a restricted or test environment first (no network egress or limited filesystem access) to verify behavior; (4) require that audit logs remain local or go to a vetted endpoint and that transfers use explicit, auditable credentials; (5) if you lack clarity about where data will be sent or which system paths it reads, treat the skill as high-risk and avoid granting broad privileges. If you want, provide the full SKILL.md text and I can point to specific lines that warrant attention.

功能分析

Type: OpenClaw Skill Name: behavioral-invariant-monitor Version: 1.3.0 The skill bundle describes a 'behavioral-invariant-monitor' designed to detect malicious behavior, such as N-run delay attacks, data exfiltration, and resource abuse, in *other* skills. The `SKILL.md` documentation clearly outlines its purpose, monitoring capabilities, and usage, including an example of detecting a hypothetical malicious skill. While the skill requires `curl` and `python3` (granting network and scripting capabilities), these are plausibly needed for its stated security monitoring function. There is no evidence of intentional harmful behavior, prompt injection against the agent, or instructions for the agent to perform any malicious actions; instead, it describes how to *detect* such actions.

能力评估

ℹ Purpose & Capability

Name and description match the monitoring functionality described. Required binaries (curl, python3) are plausible for an instruction-only monitor that runs scripts and optionally transmits reports. However, the monitor's scope (observing file I/O, network connections, system calls, resource usage of other skills) implies system-level telemetry/access that is not declared elsewhere (no required config paths or privileges). That gap is worth questioning: a behavioral monitor legitimately needs access to execution traces and resource metrics, but the SKILL.md does not explain how those will be obtained or what platform privileges are needed.

⚠ Instruction Scope

The SKILL.md explicitly describes inspecting outputs, resource usage, side effects (file writes, network connections, system calls), execution-count-sensitive behavior, and creating cryptographic (hash-chained) audit logs. These operations imply reading other skills' outputs/logs, monitoring processes, and producing persistent logs; the file does not declare what files/paths will be read or where logs are stored/sent. Because this is an instruction-only skill, the instructions themselves are the runtime surface — they could direct the agent to collect and transmit sensitive state. The instructions as presented are high-level and allow broad discretion (sampling policies, where to store or send audit trails), which increases risk.

✓ Install Mechanism

No install spec and no code files — lowest-risk installation footprint. Nothing will be written to disk by an installer. Risk comes from what the instructions will cause the agent to do at runtime rather than from an installation step.

ℹ Credentials

The skill declares no required environment variables, credentials, or config paths, which is good. However, the intended functionality (collecting telemetry, generating audit trails, possibly uploading them) normally requires either access to platform monitoring APIs or a destination for logs. The SKILL.md does not declare endpoints, credentials, or storage locations; the absence could be benign (local-only by default) or problematic (instructions might instruct use of curl to send data to arbitrary URLs).

✓ Persistence & Privilege

Flags show always: false and no persistent install behavior. The skill is user-invocable and allows autonomous invocation (the platform default). There is no explicit request to modify other skills or system-wide configurations in the metadata.

版本历史

v1.3.0

behavioral-invariant-monitor v1.3.0 introduces advanced behavioral monitoring features: - Added performance fingerprinting to detect computational complexity drift in skills. - Introduced cryptographic audit trails using hash-chained behavior logs for immutable and verifiable monitoring. - Implemented risk-proportional monitoring: higher-risk skills get full monitoring, while low-risk skills are sampled to reduce overhead. - Expanded monitored invariant classes for more comprehensive behavioral consistency assurance. - Updated documentation to reflect new monitoring strategies and publish status.

v1.2.0

Wave 13 community upgrade: Added constraint envelope baseline support (contributed by SentinelForgeAI + Nidhogg). Enables verification loop with delta-disclosure-auditor. Consumes constraint envelope data as behavior expectation baseline.

v1.0.1

No code or documentation changes detected in this version. - Version number updated from 1.1.0 to 1.0.1, but file contents remain unchanged. - No feature, bugfix, or documentation updates present in this release.

v1.1.0

Version 1.1.0 - Added constraint envelope baseline monitoring to verify declared skill or agent constraints (e.g., tools, permissions) against observed behavior. - Introduced delta-disclosure verification loop: reported declared deltas are checked against actual execution for mismatches. - Updated metadata and capability list to include constraint-envelope-baseline and delta-disclosure-verification-loop. - Documentation enhanced to describe new constraint verification features and their integration with the behavioral invariant monitoring workflow.

v1.0.0

Initial release of behavioral-invariant-monitor — ensures AI skills maintain consistent behavior across repeated runs and changing conditions. - Detects skills that pass initial audits but shift behavior based on run count, environment, or delayed triggers. - Monitors five key invariants: output determinism, resource usage, side effects, execution-count sensitivity, and environmental triggers. - Provides detailed reports highlighting invariant violations, resource/side effect changes, and activation-pattern detection. - Supports both live skill execution and retrospective log analysis. - Aims to surface hidden behavioral threats missed by static or single-run audits.

元数据

Slug behavioral-invariant-monitor

版本 1.3.0

许可证 —

累计安装 2

当前安装数 2

历史版本数 5

常见问题

Behavioral Invariant Monitor 是什么？

Helps verify that AI agent skills maintain consistent behavioral invariants across repeated executions — detecting the class of threat where a skill behaves... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 662 次。

如何安装 Behavioral Invariant Monitor？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install behavioral-invariant-monitor」即可一键安装，无需额外配置。

Behavioral Invariant Monitor 是免费的吗？

是的，Behavioral Invariant Monitor 完全免费（开源免费），可自由下载、安装和使用。

Behavioral Invariant Monitor 支持哪些平台？

Behavioral Invariant Monitor 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Behavioral Invariant Monitor？

由 andyxinweiminicloud（@andyxinweiminicloud）开发并维护，当前版本 v1.3.0。

Behavioral Invariant Monitor