← Back to Skills Marketplace

LLM Regression Monitor

Name: LLM Regression Monitor
Author: swanand33

by Swanand33 · GitHub ↗ · v1.0.2 · MIT-0

cross-platform ✓ Security Clean

115

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install llm-regression-monitor

Description

Use this skill when the user wants to monitor LLM behavior over time and get alerted when outputs change unexpectedly. Triggers on requests like "set up LLM...

README (SKILL.md)

LLM Regression Monitor

Overview

Automated behavioral regression monitoring for LLM apps. Captures baseline outputs, detects drift on a schedule, and fires WhatsApp or Slack alerts the moment something regresses.

Workflow Decision Tree

User request
├── "set up monitoring" / first time    → Full Setup (steps 1–5)
├── "run the monitor now"               → Step 4 only
├── "I changed my prompt/model"         → Step 3b (update baseline)
└── "configure alerts"                  → Step 5

Step 1 — Install

pip install llm-behave[semantic] pyyaml requests

Step 2 — Create test_suite.yaml

Create in the project root. Minimal example:

tests:
  - name: support_response
    prompt: "A customer says they never received their order. How do you respond?"
    provider: openai        # openai | anthropic | ollama | custom
    model: gpt-4o-mini
    assertions:
      - type: tone
        expected: "empathetic"
    drift:
      enabled: true
      threshold: 0.80

Set the API key for the chosen provider:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...   # if using anthropic
# ollama needs no key

Read references/test-suite-format.md for the full field spec. Read references/providers.md for env vars and Ollama setup.

Step 3 — Capture Baselines

python scripts/capture_baseline.py

Saves ground-truth outputs to .llm_behave_baselines/. Run once before monitoring begins.

3b — Update after intentional prompt/model change

# Reset one test
python scripts/capture_baseline.py --update-baseline \x3Ctest-name>

# Reset all
python scripts/capture_baseline.py --force

Step 4 — Run the Monitor

python scripts/run_monitor.py

Writes monitor_report.json. Exits 0 on all-pass, 1 on any failure (CI-compatible).

Step 5 — Configure Alerts

# WhatsApp (requires wacli installed and logged in)
export ALERT_WHATSAPP_TO="+1234567890"

# Slack
export ALERT_SLACK_WEBHOOK="https://hooks.slack.com/services/..."

Add to .env in project root — scripts load it automatically. Send via:

python scripts/send_alert.py

Silent on green runs. Logs every alert to monitor_alerts.log regardless.

Step 6 — Schedule with OpenClaw Cron

Confirm the schedule with the user (default: 9am daily), then add:

Schedule: 0 9 * * *
Command: python run_monitor.py && true || python send_alert.py
Directory: project root (where test_suite.yaml lives)

The || send_alert.py fires only when run_monitor.py exits 1 (failures found).

Common Errors

Error	Fix
`llm-behave is not installed`	`pip install llm-behave[semantic]`
`OPENAI_API_KEY is not set`	Export key or add to `.env`
`No baseline found`	Run step 3 first
`test_suite.yaml not found`	Create it in project root
LLM call errors in report	API issue — not a regression

Usage Guidance

This skill appears internally consistent for monitoring LLM outputs. Before installing, review and/or vet the llm-behave package (it will make the provider API calls), and only set provider/webhook environment variables you trust. Note: baselines and alert logs are stored in the project directory (.llm_behave_baselines/, monitor_alerts.log). Slack webhooks and a WhatsApp CLI (wacli) are optional and only used if you supply their configuration. If you run this in CI or on shared infrastructure, ensure API keys (OPENAI/Anthropic/CUSTOM) and internal LLM endpoints are stored securely and that you are comfortable their outputs will be transmitted to the configured providers/webhooks.

Capability Analysis

Type: OpenClaw Skill Name: llm-regression-monitor Version: 1.0.2 The llm-regression-monitor skill is a legitimate tool for tracking LLM performance and detecting behavioral drift. The scripts (capture_baseline.py, run_monitor.py, and send_alert.py) perform expected tasks such as calling LLM APIs, comparing semantic similarity, and sending alerts via Slack webhooks or WhatsApp. While the tool handles sensitive API keys and requires network access, these behaviors are transparently documented and necessary for its functionality. No evidence of malicious intent, data exfiltration, or prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description match what the files do: capture baselines, run behavioral/drift checks, and send alerts. Primary credential OPENAI_API_KEY and optional provider keys correspond to the supported providers described in references.

✓ Instruction Scope

SKILL.md steps (install dependencies, create test_suite.yaml, capture baselines, run monitor, configure alerts, schedule) match the scripts. The scripts only read expected files (.env, test_suite.yaml, baselines, monitor_report.json) and environment variables declared in the docs; they don't access unrelated system paths or hidden endpoints.

ℹ Install Mechanism

No registry install spec (instruction-only), but SKILL.md directs users to pip-install third-party packages (llm-behave[semantic], pyyaml, requests). This is reasonable for the task but means you should vet the llm-behave package and its provider adapters before installing.

✓ Credentials

Requested env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY, OLLAMA_BASE_URL, CUSTOM_LLM_BASE_URL/CUSTOM_LLM_API_KEY, ALERT_WHATSAPP_TO, ALERT_SLACK_WEBHOOK) are all justified by the code and listed as optional in SKILL.md. No unrelated secrets or broad system credentials are requested.

✓ Persistence & Privilege

Skill is not always-enabled and does not modify other skills or global agent settings. It writes baselines and logs to project-local files (.llm_behave_baselines/, monitor_alerts.log) which is appropriate for its function.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install llm-regression-monitor
After installation, invoke the skill by name or use /llm-regression-monitor
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.2

All provider keys are now optional — only set the key for the provider you actually use.

v1.0.1

Declared required env vars in metadata to fix registry security warning.

v1.0.0

Monitors LLM outputs for behavioral drift and regressions. Alerts via WhatsApp or Slack.

Metadata

Slug llm-regression-monitor

Version 1.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is LLM Regression Monitor?

Use this skill when the user wants to monitor LLM behavior over time and get alerted when outputs change unexpectedly. Triggers on requests like "set up LLM... It is an AI Agent Skill for Claude Code / OpenClaw, with 115 downloads so far.

How do I install LLM Regression Monitor?

Run "/install llm-regression-monitor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is LLM Regression Monitor free?

Yes, LLM Regression Monitor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does LLM Regression Monitor support?

LLM Regression Monitor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created LLM Regression Monitor?

It is built and maintained by Swanand33 (@swanand33); the current version is v1.0.2.

More Skills

LLM Regression Monitor

LLM Regression Monitor

Overview

Workflow Decision Tree

Step 1 — Install

Step 2 — Create test_suite.yaml

Step 3 — Capture Baselines

3b — Update after intentional prompt/model change

Step 4 — Run the Monitor

Step 5 — Configure Alerts

Step 6 — Schedule with OpenClaw Cron

Common Errors

What is LLM Regression Monitor?

How do I install LLM Regression Monitor?

Is LLM Regression Monitor free?

Which platforms does LLM Regression Monitor support?

Who created LLM Regression Monitor?

💬 Comments