← Back to Skills Marketplace
nimaansari

LLM Cost Watchdog

by Nima Ansari · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
108
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install llm-cost-watchdog
Description
Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models.
README (SKILL.md)

Cost Watchdog 💰

Real-time cost tracking layer for LLM-based agents. Prices every call live, detects runaway loops in code, enforces budget ceilings mid-execution.

1. Identity

Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop disasters by making cost a first-class concern: priced at write time, budgeted at check time, surfaced in reports.

2. Triggers

Activate when:

  • User mentions cost, budget, tokens, billing.
  • Code contains LLM API calls (Anthropic, OpenAI, OpenRouter, Google, Groq, ...).
  • Agent loops or recursive workflows.
  • Batch / streaming processing with unclear bounds.
  • /cost-watchdog [command] is invoked.

3. Commands

Run via python3 scripts/cost_watchdog.py \x3Ccmd> (or hook into your own CLI).

Command What it does
session Spend totals from usage.jsonl — calls, tokens, cost, top models.
report 24h / 7d / 30d windows with top model per window.
tail [--once] Watch OpenClaw session JSONL and log every assistant turn.
detect [--json] Identify which model the agent is currently using (5 probes).
audit \x3Cfile.py> AST-based code risk scan: unbounded loops, recursion, missing max_tokens.
price \x3Cmodel> Live pricing for one model, with source + cache age.
estimate \x3Cmodel> Project cost for n iterations of a given call.
alternatives \x3Cmodel> Cheaper same-unit models.
errors [--limit N] Recent swallowed exceptions (silent failures made visible).
validate-tokens \x3Cmodel> Compare our heuristic against provider's authoritative count.
reset [--all] Clear current-day log (--all also clears rolled files).

4. Pricing layer

Source chain

openrouter/*            → OpenRouter API (live)           → static fallback
anything else           → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback
  • 2600+ models indexed across chat, completion, embedding, image, audio, video, rerank, OCR, search modes.
  • 30+ providers in the static fallback: Anthropic, OpenAI, Google, Groq, Mistral, Cohere, DeepSeek, Perplexity, xAI, Bedrock, Azure, and more.
  • Unit-aware: token, image, second, query, page, character, pixel. Alternatives never compare across units.
  • Circuit breaker opens after 3 consecutive network failures for a host; falls through to cache/static until the cool-down ends (60s).

Tuning

Env var Default Effect
CW_PRICE_TTL_SECONDS 86400 (24h) Cache lifetime. 0 = hit network every call.
CW_OFFLINE unset If 1, never touch the network.
CW_STATIC_ONLY unset If 1, skip live sources entirely. Used by tests.
CW_LOG_DIR ~/.cost-watchdog Where usage/errors/cache files live.
CW_BUDGET_USD unset Ceiling; wrappers raise BudgetExceeded when crossed.

Refresh static pricing

python3 scripts/refresh_pricing.py

Regenerates references/pricing.md from the live sources so the offline fallback is fresh. Aborts if fewer than 100 rows came back (protects against clobbering on a network outage).

5. Tracking layer — how we know what was spent

Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:

Path When to use Covers streams?
openclaw_tailer.py --watch Running OpenClaw. Zero code changes. yes (reads completed turns)
track_openai(client) You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...). yes (tee'd iterator, auto-injects stream_options={"include_usage": True})
track_anthropic(client) Direct Anthropic SDK. yes (wraps messages.stream())
track_gemini(model) / track_cohere(client) / track_bedrock(client) Direct provider SDKs. no (add wrappers if you need streams)
install_global_capture() (httpx) Any modern Python SDK using httpx. no — streams are flagged into errors.jsonl so the gap is visible. Use the SDK wrappers for stream coverage.

Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...) skips files outside the window before scanning.

Aggregation uses canonical_family() so claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5 are one row in reports.

6. Budget enforcement

Two mechanisms:

  1. Write-time check (race-safe): append_usage(entry, budget_ceiling=X) takes an fcntl.flock on a sidecar, sums the current session, and refuses the write (raises BudgetExceeded) if the call would cross X.
  2. Post-write check: wrappers compare cumulative spend to CW_BUDGET_USD after logging and raise if over. Used when the wrapper doesn't know the ceiling at call time.

Either path stops the agent mid-loop; the LLM call still returns to the caller, but the next one blocks.

7. Code audit (AST)

python3 scripts/cost_watchdog.py audit path/to/agent.py

Walks the AST and reports:

  • CRITICALwhile True with an LLM call and no max_iterations-style bound.
  • CRITICAL — function that recurses and calls an LLM API with no depth argument.
  • HIGH — plain while that calls an API with no retry/iteration counter.
  • MEDIUM — LLM call missing max_tokens / max_completion_tokens.
  • MEDIUM — function with ≥5 sequential LLM calls (batching candidate).

Every finding has a file line number. No more count('def ') > 3 and count('self.') > 5 → "recursion detected" false positives.

8. Detection — "what model is the agent using?"

python3 scripts/cost_watchdog.py detect

Five probe layers, ranked by confidence:

Probe Confidence
OpenClaw session JSONL high
Claude Code session JSONL high
Most recent usage-log entry high
Claude Code settings.json medium
Env vars (ANTHROPIC_MODEL, OPENAI_MODEL, ...) medium

Emits a table or --json.

9. Files

Path Purpose
scripts/_pricing.py Router: picks LiteLLM / OpenRouter / static per query.
scripts/_sources.py Three PricingSource classes + disk cache + circuit breaker.
scripts/tokenizer.py Provider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others).
scripts/model_canon.py canonical_family() — collapses model variants.
scripts/code_audit.py AST cost-risk walker.
scripts/usage_log.py JSONL writer + rotation + aggregation.
scripts/tracker.py SDK wrappers + streaming + budget enforcement.
scripts/http_capture.py install_global_capture() — httpx transport hook.
scripts/openclaw_tailer.py Watches OpenClaw sessions.
scripts/detect_model.py Multi-layer detector.
scripts/errors.py errors.jsonl writer + reader.
scripts/io_utils.py write_json_atomic / read_json.
scripts/refresh_pricing.py Regenerates static pricing.md from live sources.
scripts/cost_watchdog.py Unified CLI dispatcher.
references/pricing.md Static fallback (regenerated; ~2600 models).
tests/test_cost_watchdog.py 73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization.

10. Quality checklist

  • Live pricing from LiteLLM + OpenRouter, 24h-cached, with static fallback.
  • Exact-match model lookup (no substring conflation).
  • Multi-modal (token / image / second / query / page / character).
  • Unit-aware alternatives (never compares tokens to images).
  • AST-based code audit with line numbers.
  • Provider-aware tokenization (no more tiktoken-for-Claude).
  • Variance-based confidence (no += 0.05 theater).
  • Atomic writes to all shared state files.
  • fcntl.flock-guarded budget check-and-log (no race).
  • Circuit breaker on flaky networks (no 5s hang per call).
  • Streaming capture via SDK wrappers; streams flagged in errors.jsonl via HTTP capture.
  • Daily log rotation + date-scoped aggregation.
  • Canonical model families (variants collapse in reports).
  • errors.jsonl surfaces silent failures; cost_watchdog errors shows them.
  • Cassette tests for LiteLLM + OpenRouter parse paths (schema-drift safety net).
  • 73 logic tests passing.

11. Known limits (be honest)

  • Tokenizer heuristics for Claude/Gemini/etc. are calibrated from docs, not measured. Run cost_watchdog validate-tokens \x3Cmodel> to check drift against the provider's authoritative count when you have an API key.
  • install_global_capture() can't see streaming responses — httpx exposes an empty body until the user reads the stream. Use track_openai / track_anthropic for stream coverage; http_capture logs skipped streams to errors.jsonl so the gap is visible.
  • Non-httpx SDKs (older Cohere, boto3 with custom transport) need the per-SDK wrappers — HTTP capture won't see them.
  • LiteLLM community data can lag 24-48h on brand-new models. OpenRouter's API is truly live for anything it routes.

12. Testing

python3 -m unittest tests.test_cost_watchdog     # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report          # current spend summary
Usage Guidance
This skill appears to implement what it claims and does not request unrelated credentials, but it can capture and log full LLM requests/responses and (if enabled) any Python httpx traffic. Before installing: 1) Review the code paths that enable global capture (install_global_capture, openclaw_tailer, and the SDK wrappers) to ensure you understand what is logged and where. 2) Set CW_LOG_DIR to a directory with restricted permissions and consider encrypting or rotating logs if they may contain secrets. 3) Prefer per-SDK wrappers (track_openai, track_anthropic) over the global httpx capture to reduce unintended data capture. 4) Run in an isolated environment (container or dedicated VM) if you will enable live pricing/network access. 5) If you need deterministic tests or to avoid network lookups, set CW_STATIC_ONLY=1 (the test suite already does this). 6) If you aren’t comfortable with the possibility of recording sensitive payloads, do not enable global capture or automatic tailing; otherwise the skill is coherent with its purpose. Additional code review is recommended if you will deploy this in production or on systems with sensitive data.
Capability Analysis
Type: OpenClaw Skill Name: llm-cost-watchdog Version: 1.0.0 The skill bundle provides a comprehensive suite for monitoring LLM costs and enforcing budgets, but it employs several high-risk technical methods. Specifically, `scripts/http_capture.py` monkey-patches the `httpx` library globally to intercept and inspect network traffic for telemetry, and `scripts/detect_model.py` performs broad file system access to read session logs and settings from other applications (e.g., Claude Code in `~/.claude`). While these behaviors are aligned with the tool's stated purpose and no evidence of data exfiltration was found, the use of global library hooks and cross-application data access represents a significant security surface. The tool fetches pricing data from legitimate endpoints at `githubusercontent.com` (LiteLLM) and `openrouter.ai`.
Capability Tags
cryptocan-make-purchasesrequires-sensitive-credentials
Capability Assessment
Purpose & Capability
Name/description match what the package implements: pricing loaders, trackers, SDK wrappers, AST audit, visualizer, and budget enforcement. It does not request unrelated credentials or external OS-level access in the registry metadata. The included scripts and references (pricing.md, trackers, audit) are proportionate to the stated feature set.
Instruction Scope
SKILL.md and scripts instruct the agent to: tail OpenClaw session JSONL, wrap provider SDK clients (track_openai, track_anthropic, etc.), and optionally install a global httpx capture (install_global_capture()). These mechanisms write to ~/.cost-watchdog/usage.jsonl and errors.jsonl and may record full prompts, completions, and HTTP payloads. Capturing arbitrary httpx traffic can include non-LLM endpoints or secrets in request bodies/headers; the README/tests also reference regenerating pricing via live network calls. Tests expect a LICENSE file that isn't listed in the manifest — a small inconsistency. Overall the actions are explainable for cost-tracking but broaden the data surface (sensitive content may be logged) and grant broad discretion to the skill when enabled.
Install Mechanism
No external install spec or remote downloads are declared; the bundle contains Python scripts and docs. That reduces supply-chain concerns. The README suggests optional symlinking into global skill directories (user action). No use of URL-based extract/install was observed in metadata.
Credentials
No privileged environment variables or external credentials are required by the skill metadata. The skill uses its own env toggles (CW_PRICE_TTL_SECONDS, CW_OFFLINE, CW_STATIC_ONLY, CW_LOG_DIR, CW_BUDGET_USD) which are proportional to caching, offline/test mode, log location, and budget enforcement. It does not ask for unrelated service keys, which is appropriate.
Persistence & Privilege
always:false (not force-included). The skill writes logs to ~/.cost-watchdog by default and exposes wrappers that can patch SDK behavior at runtime (inject stream options, tee iterators). Those changes are expected for a monitoring/enforcement layer but increase the blast radius if the global httpx capture or automatic wrapper injection is enabled—review and restrict usage to controlled environments.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install llm-cost-watchdog
  3. After installation, invoke the skill by name or use /llm-cost-watchdog
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Cost Watchdog v1.0.0 — Initial Release - Tracks real-time LLM spending across 2600+ models and 30+ providers, enforcing budget ceilings to prevent runaway costs. - Detects runaway agent loops and unbounded API calls with AST-based code auditing. - Supports live model pricing, usage tracking, and budget enforcement via CLI and SDK wrappers. - Aggregates and reports spend by session, time window, and top models; provides alerts for silent errors and budgeting breaches. - Offers multi-layer detection of active models and robust offline/static fallbacks for pricing. - Includes unified CLI with commands for session tracking, auditing, pricing, alternatives, reporting, error review, and more.
Metadata
Slug llm-cost-watchdog
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is LLM Cost Watchdog?

Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models. It is an AI Agent Skill for Claude Code / OpenClaw, with 108 downloads so far.

How do I install LLM Cost Watchdog?

Run "/install llm-cost-watchdog" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is LLM Cost Watchdog free?

Yes, LLM Cost Watchdog is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does LLM Cost Watchdog support?

LLM Cost Watchdog is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created LLM Cost Watchdog?

It is built and maintained by Nima Ansari (@nimaansari); the current version is v1.0.0.

💬 Comments