Description

Detect personality drift, sycophancy creep, and capability degradation in AI agents before they become problems. Tracks behavior metrics over time against he...

README (SKILL.md)

Drift Guard Agent Behavior Monitor

Name: Drift Guard
Author: theshadowrose

Detect personality drift, sycophancy creep, and capability degradation in AI agents before they become problems. Tracks behavior metrics over time against healthy baselines.

Detect personality drift, sycophancy creep, and capability degradation in AI agents before they become problems.

Drift Guard tracks agent behavior metrics over time, compares them against healthy baselines, and alerts you when your agent starts drifting from its intended personality or capability level.

The Problem

AI agents evolve during use. Sometimes that evolution is productive learning. Sometimes it's drift into undesirable behaviors:

Personality drift: Agent becomes more verbose, changes tone, loses its edge
Sycophancy creep: Excessive agreement, validation-seeking, compliment inflation
Capability degradation: Hedging language increases, technical depth decreases, confidence drops
Memory pollution: Corrupted context files influence all future responses

You don't notice it happening until your sharp, capable agent has turned into a people-pleasing chatbot.

What Drift Guard Does

1. Baseline Capture (`drift_baseline.py`)

Record "healthy" agent behavior from known-good responses
Analyze multiple samples to create robust baseline metrics
Store baseline for ongoing comparison
Compare baselines over time to track evolution

2. Continuous Monitoring (`drift_guard.py`)

Analyze each agent response for behavior metrics
Calculate drift score against baseline (0.0 = perfect, 1.0 = complete drift)
Track metrics: response length, vocabulary diversity, sycophancy markers, hedging language, technical depth
Record all measurements with timestamps
Trigger alerts when drift exceeds configured thresholds

3. Trend Analysis (`drift_report.py`)

Generate drift trend reports over time
Detect anomalies (outlier measurements)
Identify which specific metrics are changing
Track whether drift is worsening or improving
Time-range filtering (last 24h, last week, all time)

Quick Start

1. Configure

cp config_example.py config.py
# Edit config.py with your thresholds, patterns, and alert settings

2. Capture Baseline

Collect 10-20 agent responses that represent your agent's "healthy" behavior. Save each to a text file.

python drift_baseline.py capture --files response1.txt response2.txt response3.txt \
  --output baseline.json

3. Monitor

Each time your agent responds, analyze it:

python drift_guard.py agent_response.txt

Or pipe from stdin:

echo "Agent response here..." | python drift_guard.py --stdin

4. Review Trends

# Last 24 hours
python drift_report.py --hours 24

# All time
python drift_report.py

# JSON output for scripting
python drift_report.py --format json

Integration Examples

Integration with Agent Workflow

from drift_guard import DriftGuard

# Load config
from config import CONFIG
dg = DriftGuard(CONFIG)

# After agent responds
agent_response = "..."
result = dg.monitor(agent_response)

if result['alert_level'] == 'critical':
    print(f"ALERT: Agent drift detected ({result['drift_score']:.3f})")
    # Trigger recovery: load checkpoint, reset memory, etc.

Automatic Drift Checks via Cron

# Check drift every hour
0 * * * * cd /path/to/agent && python drift_guard.py latest_response.txt

# Weekly drift report
0 9 * * 1 cd /path/to/agent && python drift_report.py --hours 168 > weekly_drift.txt

Pairing with CPR (Context Preservation & Restore)

Drift Guard detects the problem. CPR fixes it.

# Monitor drift
python drift_guard.py agent_response.txt
# Drift score: 0.72 (CRITICAL)

# Restore from checkpoint
python cpr.py restore --checkpoint 2024-01-15-healthy

# Verify recovery
python drift_guard.py agent_response.txt
# Drift score: 0.12 (normal)

How It Works

Metrics Tracked

Metric	What It Measures	Why It Matters
`char_count`	Response length in characters	Verbosity drift
`word_count`	Response length in words	Verbosity drift
`sentence_count`	Number of sentences	Structure changes
`avg_sentence_length`	Words per sentence	Complexity drift
`vocabulary_diversity`	Unique words / total words	Language degradation
`sycophancy_score`	Frequency of agreement/validation language	People-pleasing behavior
`hedging_score`	Frequency of uncertainty language	Confidence degradation
`validation_score`	Frequency of compliments/encouragement	Sycophancy creep
`exclamation_count`	Number of exclamation marks	Enthusiasm drift
`technical_score`	Frequency of technical terminology	Capability tracking

Drift Score Calculation

For each metric:

Calculate percentage difference from baseline
Apply configured weight (important metrics count more)
Average weighted differences across all metrics
Result: drift score from 0.0 (perfect baseline match) to 1.0 (completely different)

Alert Levels

Warning (0.3): Minor drift detected. Monitor closely.
Critical (0.6): Significant drift. Intervention recommended.
Emergency (0.9): Severe drift. Immediate action required.

Use Cases

Personality preservation: Ensure your agent maintains its configured tone and style
Quality monitoring: Detect when response quality degrades over time
Context corruption detection: Identify when bad memory files are influencing behavior
Fine-tuning validation: Verify fine-tuned models maintain desired characteristics
Multi-agent consistency: Monitor multiple agents to ensure behavioral consistency
Recovery triggers: Automatically restore from checkpoint when drift exceeds threshold

What's Included

File	Purpose
`drift_guard.py`	Main monitoring engine
`drift_baseline.py`	Baseline capture and comparison
`drift_report.py`	Trend analysis and reporting
`config_example.py`	Configuration template
`LIMITATIONS.md`	What Drift Guard doesn't do
`LICENSE`	MIT License

Requirements

Python 3.8+
No external dependencies (stdlib only)
Works with any AI agent that generates text responses

quality-verified

License

MIT — See LICENSE file.

Author: Shadow Rose

⚠️ Disclaimer

This software is provided "AS IS", without warranty of any kind, express or implied.

USE AT YOUR OWN RISK.

The author(s) are NOT liable for any damages, losses, or consequences arising from the use or misuse of this software — including but not limited to financial loss, data loss, security breaches, business interruption, or any indirect/consequential damages.
This software does NOT constitute financial, legal, trading, or professional advice.
Users are solely responsible for evaluating whether this software is suitable for their use case, environment, and risk tolerance.
No guarantee is made regarding accuracy, reliability, completeness, or fitness for any particular purpose.
The author(s) are not responsible for how third parties use, modify, or distribute this software after purchase.

By downloading, installing, or using this software, you acknowledge that you have read this disclaimer and agree to use the software entirely at your own risk.

DATA DISCLAIMER: This software processes and stores data locally on your system. The author(s) are not responsible for data loss, corruption, or unauthorized access resulting from software bugs, system failures, or user error. Always maintain independent backups of important data. This software does not transmit data externally unless explicitly configured by the user.

Support & Links


🐛 Bug Reports	[email protected]
☕ Ko-fi	ko-fi.com/theshadowrose
🛒 Gumroad	shadowyrose.gumroad.com
🐦 Twitter	@TheShadowyRose
🐙 GitHub	github.com/TheShadowRose
🧠 PromptBase	promptbase.com/profile/shadowrose

Built with OpenClaw — thank you for making this possible.

🛠️ Need something custom? Custom OpenClaw agents & skills starting at $500. If you can describe it, I can build it. → Hire me on Fiverr

Usage Guidance

This skill is internally consistent and works locally with Python stdlib. Before installing or integrating: (1) be aware it will store analyzed responses and metrics on disk (baseline.json, drift_history.json, drift_alerts.log, current_alert.json) — those files can contain sensitive content, so choose storage paths and file permissions carefully; (2) test on non-sensitive example responses first; (3) if you or someone else modifies the code to add webhooks or HTTP clients, audit network behavior and credentials then — the current repo contains a webhook_url placeholder but no implementation; (4) schedule/cron usage is supported — review retention/rotation of history to avoid unbounded sensitive data growth; (5) note Drift Guard detects drift but does not remediate — pair it with your recovery tooling (CPR) if you want automated restore. Overall: coherent and reasonable for the stated purpose.

Capability Analysis

Type: OpenClaw Skill Name: drift-guard-sr Version: 1.0.3 The Drift Guard bundle is a legitimate utility designed to monitor AI agent behavior metrics such as sycophancy, hedging, and technical depth. The Python scripts (drift_guard.py, drift_baseline.py, drift_report.py) use only standard libraries to perform regex-based text analysis and local file I/O for logging and history tracking. There is no evidence of data exfiltration, network activity, or malicious execution; the documentation (LIMITATIONS.md) even explicitly notes that the webhook functionality is a placeholder and not implemented.

Capability Assessment

✓ Purpose & Capability

Name/description match the included code and instructions: the scripts compute text-based metrics, capture baselines, record history, and produce reports. Required capabilities (none) are proportional to the stated function.

ℹ Instruction Scope

Runtime instructions are consistent with the purpose. The tool requires you to save agent responses to files and run analyzers or cron jobs; it does not automatically hook into agent runtimes. Important note: the tool records full metrics and writes history/alert files containing timestamps, metrics, and (indirectly) the analyzed text; this can persist potentially sensitive agent responses on disk.

✓ Install Mechanism

No install spec and no external packages or downloads. Code is stdlib-only Python; nothing in the files pulls remote code or runs installers.

ℹ Credentials

No environment variables, secrets, or external credentials are requested. The config contains an optional webhook_url placeholder but the stdlib-only version does not perform HTTP POSTs; enabling webhooks or modifying the code to add network calls would change the threat model and should be audited. The script writes to local files (baseline, history, alerts) which may contain sensitive data.

ℹ Persistence & Privilege

Skill is not always-enabled and is user-invocable. It does write persistent files (baseline.json, drift_history.json, drift_alerts.log, current_alert.json) in the configured paths and will append/write them on each measurement; scheduled use via cron is documented — consider file permissions and retention. No modifications to other skills or system-wide settings are performed.

Version History

v1.0.3

- Updated skill name from "Drift Guard Agent Behavior Monitor" to "Drift Guard: Agent Behavior Monitor". - Bumped version to 1.0.3 in the documentation. - No functionality or code changes; documentation now reflects updated name and version.

v1.0.2

No user-facing changes in this version. - No file changes detected between versions 1.0.1 and 1.0.2. - All features, documentation, and behavior remain unchanged.

v1.0.1

- Added a `slug` field to the metadata for improved identification. - Corrected the skill name from "Drift Guard � Agent Behavior Monitor" to "Drift Guard Agent Behavior Monitor". - Updated the version number to 1.0.1. - Fixed character encoding issues in the title. - No functionality or usage changes.

v1.0.0

Initial upload

Metadata

Slug drift-guard-sr

Version 1.0.3

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 4

Frequently Asked Questions

What is Drift Guard?

Detect personality drift, sycophancy creep, and capability degradation in AI agents before they become problems. Tracks behavior metrics over time against he... It is an AI Agent Skill for Claude Code / OpenClaw, with 300 downloads so far.

How do I install Drift Guard?

Run "/install drift-guard-sr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Drift Guard free?

Yes, Drift Guard is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Drift Guard support?

Drift Guard is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Drift Guard?

It is built and maintained by Shadow Rose (@theshadowrose); the current version is v1.0.3.

More Skills

Drift Guard