Description

Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models.

README (SKILL.md)

Cost Watchdog 💰

Name: LLM Cost Watchdog
Author: nimaansari

Real-time cost tracking layer for LLM-based agents. Prices every call live, detects runaway loops in code, enforces budget ceilings mid-execution.

1. Identity

Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop disasters by making cost a first-class concern: priced at write time, budgeted at check time, surfaced in reports.

2. Triggers

Activate when:

User mentions cost, budget, tokens, billing.
Code contains LLM API calls (Anthropic, OpenAI, OpenRouter, Google, Groq, ...).
Agent loops or recursive workflows.
Batch / streaming processing with unclear bounds.
/cost-watchdog [command] is invoked.

3. Commands

Run via python3 scripts/cost_watchdog.py \x3Ccmd> (or hook into your own CLI).

Command	What it does
`session`	Spend totals from `usage.jsonl` — calls, tokens, cost, top models.
`report`	24h / 7d / 30d windows with top model per window.
`tail [--once]`	Watch OpenClaw session JSONL and log every assistant turn.
`detect [--json]`	Identify which model the agent is currently using (5 probes).
`audit \x3Cfile.py>`	AST-based code risk scan: unbounded loops, recursion, missing `max_tokens`.
`price \x3Cmodel>`	Live pricing for one model, with source + cache age.
`estimate \x3Cmodel>`	Project cost for `n` iterations of a given call.
`alternatives \x3Cmodel>`	Cheaper same-unit models.
`errors [--limit N]`	Recent swallowed exceptions (silent failures made visible).
`validate-tokens \x3Cmodel>`	Compare our heuristic against provider's authoritative count.
`reset [--all]`	Clear current-day log (`--all` also clears rolled files).

4. Pricing layer

Source chain

openrouter/*            → OpenRouter API (live)           → static fallback
anything else           → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback

2600+ models indexed across chat, completion, embedding, image, audio, video, rerank, OCR, search modes.
30+ providers in the static fallback: Anthropic, OpenAI, Google, Groq, Mistral, Cohere, DeepSeek, Perplexity, xAI, Bedrock, Azure, and more.
Unit-aware: token, image, second, query, page, character, pixel. Alternatives never compare across units.
Circuit breaker opens after 3 consecutive network failures for a host; falls through to cache/static until the cool-down ends (60s).

Tuning

Env var	Default	Effect
`CW_PRICE_TTL_SECONDS`	86400 (24h)	Cache lifetime. `0` = hit network every call.
`CW_OFFLINE`	unset	If `1`, never touch the network.
`CW_STATIC_ONLY`	unset	If `1`, skip live sources entirely. Used by tests.
`CW_LOG_DIR`	`~/.cost-watchdog`	Where usage/errors/cache files live.
`CW_BUDGET_USD`	unset	Ceiling; wrappers raise `BudgetExceeded` when crossed.

Refresh static pricing

python3 scripts/refresh_pricing.py

Regenerates references/pricing.md from the live sources so the offline fallback is fresh. Aborts if fewer than 100 rows came back (protects against clobbering on a network outage).

5. Tracking layer — how we know what was spent

Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:

Path	When to use	Covers streams?
`openclaw_tailer.py --watch`	Running OpenClaw. Zero code changes.	yes (reads completed turns)
`track_openai(client)`	You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...).	yes (tee'd iterator, auto-injects `stream_options={"include_usage": True}`)
`track_anthropic(client)`	Direct Anthropic SDK.	yes (wraps `messages.stream()`)
`track_gemini(model)` / `track_cohere(client)` / `track_bedrock(client)`	Direct provider SDKs.	no (add wrappers if you need streams)
`install_global_capture()` (httpx)	Any modern Python SDK using httpx.	no — streams are flagged into `errors.jsonl` so the gap is visible. Use the SDK wrappers for stream coverage.

Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...) skips files outside the window before scanning.

Aggregation uses canonical_family() so claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5 are one row in reports.

6. Budget enforcement

Two mechanisms:

Write-time check (race-safe): append_usage(entry, budget_ceiling=X) takes an fcntl.flock on a sidecar, sums the current session, and refuses the write (raises BudgetExceeded) if the call would cross X.
Post-write check: wrappers compare cumulative spend to CW_BUDGET_USD after logging and raise if over. Used when the wrapper doesn't know the ceiling at call time.

Either path stops the agent mid-loop; the LLM call still returns to the caller, but the next one blocks.

7. Code audit (AST)

python3 scripts/cost_watchdog.py audit path/to/agent.py

Walks the AST and reports:

CRITICAL — while True with an LLM call and no max_iterations-style bound.
CRITICAL — function that recurses and calls an LLM API with no depth argument.
HIGH — plain while that calls an API with no retry/iteration counter.
MEDIUM — LLM call missing max_tokens / max_completion_tokens.
MEDIUM — function with ≥5 sequential LLM calls (batching candidate).

Every finding has a file line number. No more count('def ') > 3 and count('self.') > 5 → "recursion detected" false positives.

8. Detection — "what model is the agent using?"

python3 scripts/cost_watchdog.py detect

Five probe layers, ranked by confidence:

Probe	Confidence
OpenClaw session JSONL	high
Claude Code session JSONL	high
Most recent usage-log entry	high
Claude Code `settings.json`	medium
Env vars (`ANTHROPIC_MODEL`, `OPENAI_MODEL`, ...)	medium

Emits a table or --json.

9. Files

Path	Purpose
`scripts/_pricing.py`	Router: picks LiteLLM / OpenRouter / static per query.
`scripts/_sources.py`	Three `PricingSource` classes + disk cache + circuit breaker.
`scripts/tokenizer.py`	Provider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others).
`scripts/model_canon.py`	`canonical_family()` — collapses model variants.
`scripts/code_audit.py`	AST cost-risk walker.
`scripts/usage_log.py`	JSONL writer + rotation + aggregation.
`scripts/tracker.py`	SDK wrappers + streaming + budget enforcement.
`scripts/http_capture.py`	`install_global_capture()` — httpx transport hook.
`scripts/openclaw_tailer.py`	Watches OpenClaw sessions.
`scripts/detect_model.py`	Multi-layer detector.
`scripts/errors.py`	`errors.jsonl` writer + reader.
`scripts/io_utils.py`	`write_json_atomic` / `read_json`.
`scripts/refresh_pricing.py`	Regenerates static `pricing.md` from live sources.
`scripts/cost_watchdog.py`	Unified CLI dispatcher.
`references/pricing.md`	Static fallback (regenerated; ~2600 models).
`tests/test_cost_watchdog.py`	73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization.

10. Quality checklist

Live pricing from LiteLLM + OpenRouter, 24h-cached, with static fallback.
Exact-match model lookup (no substring conflation).
Multi-modal (token / image / second / query / page / character).
Unit-aware alternatives (never compares tokens to images).
AST-based code audit with line numbers.
Provider-aware tokenization (no more tiktoken-for-Claude).
Variance-based confidence (no += 0.05 theater).
Atomic writes to all shared state files.
fcntl.flock-guarded budget check-and-log (no race).
Circuit breaker on flaky networks (no 5s hang per call).
Streaming capture via SDK wrappers; streams flagged in errors.jsonl via HTTP capture.
Daily log rotation + date-scoped aggregation.
Canonical model families (variants collapse in reports).
errors.jsonl surfaces silent failures; cost_watchdog errors shows them.
Cassette tests for LiteLLM + OpenRouter parse paths (schema-drift safety net).
73 logic tests passing.

11. Known limits (be honest)

Tokenizer heuristics for Claude/Gemini/etc. are calibrated from docs, not measured. Run cost_watchdog validate-tokens \x3Cmodel> to check drift against the provider's authoritative count when you have an API key.
install_global_capture() can't see streaming responses — httpx exposes an empty body until the user reads the stream. Use track_openai / track_anthropic for stream coverage; http_capture logs skipped streams to errors.jsonl so the gap is visible.
Non-httpx SDKs (older Cohere, boto3 with custom transport) need the per-SDK wrappers — HTTP capture won't see them.
LiteLLM community data can lag 24-48h on brand-new models. OpenRouter's API is truly live for anything it routes.

12. Testing

python3 -m unittest tests.test_cost_watchdog     # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report          # current spend summary

Usage Guidance

This skill appears to implement what it claims and does not request unrelated credentials, but it can capture and log full LLM requests/responses and (if enabled) any Python httpx traffic. Before installing: 1) Review the code paths that enable global capture (install_global_capture, openclaw_tailer, and the SDK wrappers) to ensure you understand what is logged and where. 2) Set CW_LOG_DIR to a directory with restricted permissions and consider encrypting or rotating logs if they may contain secrets. 3) Prefer per-SDK wrappers (track_openai, track_anthropic) over the global httpx capture to reduce unintended data capture. 4) Run in an isolated environment (container or dedicated VM) if you will enable live pricing/network access. 5) If you need deterministic tests or to avoid network lookups, set CW_STATIC_ONLY=1 (the test suite already does this). 6) If you aren’t comfortable with the possibility of recording sensitive payloads, do not enable global capture or automatic tailing; otherwise the skill is coherent with its purpose. Additional code review is recommended if you will deploy this in production or on systems with sensitive data.

Capability Analysis

Type: OpenClaw Skill Name: llm-cost-watchdog Version: 1.0.0 The skill bundle provides a comprehensive suite for monitoring LLM costs and enforcing budgets, but it employs several high-risk technical methods. Specifically, `scripts/http_capture.py` monkey-patches the `httpx` library globally to intercept and inspect network traffic for telemetry, and `scripts/detect_model.py` performs broad file system access to read session logs and settings from other applications (e.g., Claude Code in `~/.claude`). While these behaviors are aligned with the tool's stated purpose and no evidence of data exfiltration was found, the use of global library hooks and cross-application data access represents a significant security surface. The tool fetches pricing data from legitimate endpoints at `githubusercontent.com` (LiteLLM) and `openrouter.ai`.

Capability Tags

cryptocan-make-purchasesrequires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description match what the package implements: pricing loaders, trackers, SDK wrappers, AST audit, visualizer, and budget enforcement. It does not request unrelated credentials or external OS-level access in the registry metadata. The included scripts and references (pricing.md, trackers, audit) are proportionate to the stated feature set.

⚠ Instruction Scope

SKILL.md and scripts instruct the agent to: tail OpenClaw session JSONL, wrap provider SDK clients (track_openai, track_anthropic, etc.), and optionally install a global httpx capture (install_global_capture()). These mechanisms write to ~/.cost-watchdog/usage.jsonl and errors.jsonl and may record full prompts, completions, and HTTP payloads. Capturing arbitrary httpx traffic can include non-LLM endpoints or secrets in request bodies/headers; the README/tests also reference regenerating pricing via live network calls. Tests expect a LICENSE file that isn't listed in the manifest — a small inconsistency. Overall the actions are explainable for cost-tracking but broaden the data surface (sensitive content may be logged) and grant broad discretion to the skill when enabled.

✓ Install Mechanism

No external install spec or remote downloads are declared; the bundle contains Python scripts and docs. That reduces supply-chain concerns. The README suggests optional symlinking into global skill directories (user action). No use of URL-based extract/install was observed in metadata.

ℹ Credentials

No privileged environment variables or external credentials are required by the skill metadata. The skill uses its own env toggles (CW_PRICE_TTL_SECONDS, CW_OFFLINE, CW_STATIC_ONLY, CW_LOG_DIR, CW_BUDGET_USD) which are proportional to caching, offline/test mode, log location, and budget enforcement. It does not ask for unrelated service keys, which is appropriate.

ℹ Persistence & Privilege

always:false (not force-included). The skill writes logs to ~/.cost-watchdog by default and exposes wrappers that can patch SDK behavior at runtime (inject stream options, tee iterators). Those changes are expected for a monitoring/enforcement layer but increase the blast radius if the global httpx capture or automatic wrapper injection is enabled—review and restrict usage to controlled environments.

Version History

v1.0.0

Cost Watchdog v1.0.0 — Initial Release - Tracks real-time LLM spending across 2600+ models and 30+ providers, enforcing budget ceilings to prevent runaway costs. - Detects runaway agent loops and unbounded API calls with AST-based code auditing. - Supports live model pricing, usage tracking, and budget enforcement via CLI and SDK wrappers. - Aggregates and reports spend by session, time window, and top models; provides alerts for silent errors and budgeting breaches. - Offers multi-layer detection of active models and robust offline/static fallbacks for pricing. - Includes unified CLI with commands for session tracking, auditing, pricing, alternatives, reporting, error review, and more.

Metadata

Slug llm-cost-watchdog

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is LLM Cost Watchdog?

Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models. It is an AI Agent Skill for Claude Code / OpenClaw, with 108 downloads so far.

How do I install LLM Cost Watchdog?

Run "/install llm-cost-watchdog" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is LLM Cost Watchdog free?

Yes, LLM Cost Watchdog is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does LLM Cost Watchdog support?

LLM Cost Watchdog is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created LLM Cost Watchdog?

It is built and maintained by Nima Ansari (@nimaansari); the current version is v1.0.0.

More Skills

LLM Cost Watchdog