/install llm-regression-monitor
LLM Regression Monitor
Overview
Automated behavioral regression monitoring for LLM apps. Captures baseline outputs, detects drift on a schedule, and fires WhatsApp or Slack alerts the moment something regresses.
Workflow Decision Tree
User request
├── "set up monitoring" / first time → Full Setup (steps 1–5)
├── "run the monitor now" → Step 4 only
├── "I changed my prompt/model" → Step 3b (update baseline)
└── "configure alerts" → Step 5
Step 1 — Install
pip install llm-behave[semantic] pyyaml requests
Step 2 — Create test_suite.yaml
Create in the project root. Minimal example:
tests:
- name: support_response
prompt: "A customer says they never received their order. How do you respond?"
provider: openai # openai | anthropic | ollama | custom
model: gpt-4o-mini
assertions:
- type: tone
expected: "empathetic"
drift:
enabled: true
threshold: 0.80
Set the API key for the chosen provider:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-... # if using anthropic
# ollama needs no key
Read references/test-suite-format.md for the full field spec.
Read references/providers.md for env vars and Ollama setup.
Step 3 — Capture Baselines
python scripts/capture_baseline.py
Saves ground-truth outputs to .llm_behave_baselines/. Run once before monitoring begins.
3b — Update after intentional prompt/model change
# Reset one test
python scripts/capture_baseline.py --update-baseline \x3Ctest-name>
# Reset all
python scripts/capture_baseline.py --force
Step 4 — Run the Monitor
python scripts/run_monitor.py
Writes monitor_report.json. Exits 0 on all-pass, 1 on any failure (CI-compatible).
Step 5 — Configure Alerts
# WhatsApp (requires wacli installed and logged in)
export ALERT_WHATSAPP_TO="+1234567890"
# Slack
export ALERT_SLACK_WEBHOOK="https://hooks.slack.com/services/..."
Add to .env in project root — scripts load it automatically. Send via:
python scripts/send_alert.py
Silent on green runs. Logs every alert to monitor_alerts.log regardless.
Step 6 — Schedule with OpenClaw Cron
Confirm the schedule with the user (default: 9am daily), then add:
- Schedule:
0 9 * * * - Command:
python run_monitor.py && true || python send_alert.py - Directory: project root (where
test_suite.yamllives)
The || send_alert.py fires only when run_monitor.py exits 1 (failures found).
Common Errors
| Error | Fix |
|---|---|
llm-behave is not installed |
pip install llm-behave[semantic] |
OPENAI_API_KEY is not set |
Export key or add to .env |
No baseline found |
Run step 3 first |
test_suite.yaml not found |
Create it in project root |
| LLM call errors in report | API issue — not a regression |
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install llm-regression-monitor - 安装完成后,直接呼叫该 Skill 的名称或使用
/llm-regression-monitor触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
LLM Regression Monitor 是什么?
Use this skill when the user wants to monitor LLM behavior over time and get alerted when outputs change unexpectedly. Triggers on requests like "set up LLM... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 115 次。
如何安装 LLM Regression Monitor?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install llm-regression-monitor」即可一键安装,无需额外配置。
LLM Regression Monitor 是免费的吗?
是的,LLM Regression Monitor 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
LLM Regression Monitor 支持哪些平台?
LLM Regression Monitor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 LLM Regression Monitor?
由 Swanand33(@swanand33)开发并维护,当前版本 v1.0.2。