/install llm-regression-monitor
LLM Regression Monitor
Overview
Automated behavioral regression monitoring for LLM apps. Captures baseline outputs, detects drift on a schedule, and fires WhatsApp or Slack alerts the moment something regresses.
Workflow Decision Tree
User request
├── "set up monitoring" / first time → Full Setup (steps 1–5)
├── "run the monitor now" → Step 4 only
├── "I changed my prompt/model" → Step 3b (update baseline)
└── "configure alerts" → Step 5
Step 1 — Install
pip install llm-behave[semantic] pyyaml requests
Step 2 — Create test_suite.yaml
Create in the project root. Minimal example:
tests:
- name: support_response
prompt: "A customer says they never received their order. How do you respond?"
provider: openai # openai | anthropic | ollama | custom
model: gpt-4o-mini
assertions:
- type: tone
expected: "empathetic"
drift:
enabled: true
threshold: 0.80
Set the API key for the chosen provider:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-... # if using anthropic
# ollama needs no key
Read references/test-suite-format.md for the full field spec.
Read references/providers.md for env vars and Ollama setup.
Step 3 — Capture Baselines
python scripts/capture_baseline.py
Saves ground-truth outputs to .llm_behave_baselines/. Run once before monitoring begins.
3b — Update after intentional prompt/model change
# Reset one test
python scripts/capture_baseline.py --update-baseline \x3Ctest-name>
# Reset all
python scripts/capture_baseline.py --force
Step 4 — Run the Monitor
python scripts/run_monitor.py
Writes monitor_report.json. Exits 0 on all-pass, 1 on any failure (CI-compatible).
Step 5 — Configure Alerts
# WhatsApp (requires wacli installed and logged in)
export ALERT_WHATSAPP_TO="+1234567890"
# Slack
export ALERT_SLACK_WEBHOOK="https://hooks.slack.com/services/..."
Add to .env in project root — scripts load it automatically. Send via:
python scripts/send_alert.py
Silent on green runs. Logs every alert to monitor_alerts.log regardless.
Step 6 — Schedule with OpenClaw Cron
Confirm the schedule with the user (default: 9am daily), then add:
- Schedule:
0 9 * * * - Command:
python run_monitor.py && true || python send_alert.py - Directory: project root (where
test_suite.yamllives)
The || send_alert.py fires only when run_monitor.py exits 1 (failures found).
Common Errors
| Error | Fix |
|---|---|
llm-behave is not installed |
pip install llm-behave[semantic] |
OPENAI_API_KEY is not set |
Export key or add to .env |
No baseline found |
Run step 3 first |
test_suite.yaml not found |
Create it in project root |
| LLM call errors in report | API issue — not a regression |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install llm-regression-monitor - After installation, invoke the skill by name or use
/llm-regression-monitor - Provide required inputs per the skill's parameter spec and get structured output
What is LLM Regression Monitor?
Use this skill when the user wants to monitor LLM behavior over time and get alerted when outputs change unexpectedly. Triggers on requests like "set up LLM... It is an AI Agent Skill for Claude Code / OpenClaw, with 115 downloads so far.
How do I install LLM Regression Monitor?
Run "/install llm-regression-monitor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is LLM Regression Monitor free?
Yes, LLM Regression Monitor is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does LLM Regression Monitor support?
LLM Regression Monitor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created LLM Regression Monitor?
It is built and maintained by Swanand33 (@swanand33); the current version is v1.0.2.