LLM Cost Watchdog
/install llm-cost-watchdog
Cost Watchdog 💰
Real-time cost tracking layer for LLM-based agents. Prices every call live, detects runaway loops in code, enforces budget ceilings mid-execution.
1. Identity
Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop
disasters by making cost a first-class concern: priced at write time,
budgeted at check time, surfaced in reports.
2. Triggers
Activate when:
- User mentions cost, budget, tokens, billing.
- Code contains LLM API calls (Anthropic, OpenAI, OpenRouter, Google, Groq, ...).
- Agent loops or recursive workflows.
- Batch / streaming processing with unclear bounds.
/cost-watchdog [command]is invoked.
3. Commands
Run via python3 scripts/cost_watchdog.py \x3Ccmd> (or hook into your own CLI).
| Command | What it does |
|---|---|
session |
Spend totals from usage.jsonl — calls, tokens, cost, top models. |
report |
24h / 7d / 30d windows with top model per window. |
tail [--once] |
Watch OpenClaw session JSONL and log every assistant turn. |
detect [--json] |
Identify which model the agent is currently using (5 probes). |
audit \x3Cfile.py> |
AST-based code risk scan: unbounded loops, recursion, missing max_tokens. |
price \x3Cmodel> |
Live pricing for one model, with source + cache age. |
estimate \x3Cmodel> |
Project cost for n iterations of a given call. |
alternatives \x3Cmodel> |
Cheaper same-unit models. |
errors [--limit N] |
Recent swallowed exceptions (silent failures made visible). |
validate-tokens \x3Cmodel> |
Compare our heuristic against provider's authoritative count. |
reset [--all] |
Clear current-day log (--all also clears rolled files). |
4. Pricing layer
Source chain
openrouter/* → OpenRouter API (live) → static fallback
anything else → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback
- 2600+ models indexed across chat, completion, embedding, image, audio, video, rerank, OCR, search modes.
- 30+ providers in the static fallback: Anthropic, OpenAI, Google, Groq, Mistral, Cohere, DeepSeek, Perplexity, xAI, Bedrock, Azure, and more.
- Unit-aware:
token,image,second,query,page,character,pixel. Alternatives never compare across units. - Circuit breaker opens after 3 consecutive network failures for a host; falls through to cache/static until the cool-down ends (60s).
Tuning
| Env var | Default | Effect |
|---|---|---|
CW_PRICE_TTL_SECONDS |
86400 (24h) | Cache lifetime. 0 = hit network every call. |
CW_OFFLINE |
unset | If 1, never touch the network. |
CW_STATIC_ONLY |
unset | If 1, skip live sources entirely. Used by tests. |
CW_LOG_DIR |
~/.cost-watchdog |
Where usage/errors/cache files live. |
CW_BUDGET_USD |
unset | Ceiling; wrappers raise BudgetExceeded when crossed. |
Refresh static pricing
python3 scripts/refresh_pricing.py
Regenerates references/pricing.md from the live sources so the offline
fallback is fresh. Aborts if fewer than 100 rows came back (protects
against clobbering on a network outage).
5. Tracking layer — how we know what was spent
Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:
| Path | When to use | Covers streams? |
|---|---|---|
openclaw_tailer.py --watch |
Running OpenClaw. Zero code changes. | yes (reads completed turns) |
track_openai(client) |
You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...). | yes (tee'd iterator, auto-injects stream_options={"include_usage": True}) |
track_anthropic(client) |
Direct Anthropic SDK. | yes (wraps messages.stream()) |
track_gemini(model) / track_cohere(client) / track_bedrock(client) |
Direct provider SDKs. | no (add wrappers if you need streams) |
install_global_capture() (httpx) |
Any modern Python SDK using httpx. | no — streams are flagged into errors.jsonl so the gap is visible. Use the SDK wrappers for stream coverage. |
Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...)
skips files outside the window before scanning.
Aggregation uses canonical_family() so
claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5
are one row in reports.
6. Budget enforcement
Two mechanisms:
- Write-time check (race-safe):
append_usage(entry, budget_ceiling=X)takes anfcntl.flockon a sidecar, sums the current session, and refuses the write (raisesBudgetExceeded) if the call would crossX. - Post-write check: wrappers compare cumulative spend to
CW_BUDGET_USDafter logging and raise if over. Used when the wrapper doesn't know the ceiling at call time.
Either path stops the agent mid-loop; the LLM call still returns to the caller, but the next one blocks.
7. Code audit (AST)
python3 scripts/cost_watchdog.py audit path/to/agent.py
Walks the AST and reports:
- CRITICAL —
while Truewith an LLM call and nomax_iterations-style bound. - CRITICAL — function that recurses and calls an LLM API with no depth argument.
- HIGH — plain
whilethat calls an API with no retry/iteration counter. - MEDIUM — LLM call missing
max_tokens/max_completion_tokens. - MEDIUM — function with ≥5 sequential LLM calls (batching candidate).
Every finding has a file line number. No more count('def ') > 3 and count('self.') > 5 → "recursion detected" false positives.
8. Detection — "what model is the agent using?"
python3 scripts/cost_watchdog.py detect
Five probe layers, ranked by confidence:
| Probe | Confidence |
|---|---|
| OpenClaw session JSONL | high |
| Claude Code session JSONL | high |
| Most recent usage-log entry | high |
Claude Code settings.json |
medium |
Env vars (ANTHROPIC_MODEL, OPENAI_MODEL, ...) |
medium |
Emits a table or --json.
9. Files
| Path | Purpose |
|---|---|
scripts/_pricing.py |
Router: picks LiteLLM / OpenRouter / static per query. |
scripts/_sources.py |
Three PricingSource classes + disk cache + circuit breaker. |
scripts/tokenizer.py |
Provider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others). |
scripts/model_canon.py |
canonical_family() — collapses model variants. |
scripts/code_audit.py |
AST cost-risk walker. |
scripts/usage_log.py |
JSONL writer + rotation + aggregation. |
scripts/tracker.py |
SDK wrappers + streaming + budget enforcement. |
scripts/http_capture.py |
install_global_capture() — httpx transport hook. |
scripts/openclaw_tailer.py |
Watches OpenClaw sessions. |
scripts/detect_model.py |
Multi-layer detector. |
scripts/errors.py |
errors.jsonl writer + reader. |
scripts/io_utils.py |
write_json_atomic / read_json. |
scripts/refresh_pricing.py |
Regenerates static pricing.md from live sources. |
scripts/cost_watchdog.py |
Unified CLI dispatcher. |
references/pricing.md |
Static fallback (regenerated; ~2600 models). |
tests/test_cost_watchdog.py |
73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization. |
10. Quality checklist
- Live pricing from LiteLLM + OpenRouter, 24h-cached, with static fallback.
- Exact-match model lookup (no substring conflation).
- Multi-modal (token / image / second / query / page / character).
- Unit-aware alternatives (never compares tokens to images).
- AST-based code audit with line numbers.
- Provider-aware tokenization (no more tiktoken-for-Claude).
- Variance-based confidence (no
+= 0.05theater). - Atomic writes to all shared state files.
-
fcntl.flock-guarded budget check-and-log (no race). - Circuit breaker on flaky networks (no 5s hang per call).
- Streaming capture via SDK wrappers; streams flagged in
errors.jsonlvia HTTP capture. - Daily log rotation + date-scoped aggregation.
- Canonical model families (variants collapse in reports).
-
errors.jsonlsurfaces silent failures;cost_watchdog errorsshows them. - Cassette tests for LiteLLM + OpenRouter parse paths (schema-drift safety net).
- 73 logic tests passing.
11. Known limits (be honest)
- Tokenizer heuristics for Claude/Gemini/etc. are calibrated from docs,
not measured. Run
cost_watchdog validate-tokens \x3Cmodel>to check drift against the provider's authoritative count when you have an API key. install_global_capture()can't see streaming responses — httpx exposes an empty body until the user reads the stream. Usetrack_openai/track_anthropicfor stream coverage;http_capturelogs skipped streams toerrors.jsonlso the gap is visible.- Non-httpx SDKs (older Cohere, boto3 with custom transport) need the per-SDK wrappers — HTTP capture won't see them.
- LiteLLM community data can lag 24-48h on brand-new models. OpenRouter's API is truly live for anything it routes.
12. Testing
python3 -m unittest tests.test_cost_watchdog # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report # current spend summary
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install llm-cost-watchdog - After installation, invoke the skill by name or use
/llm-cost-watchdog - Provide required inputs per the skill's parameter spec and get structured output
What is LLM Cost Watchdog?
Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models. It is an AI Agent Skill for Claude Code / OpenClaw, with 108 downloads so far.
How do I install LLM Cost Watchdog?
Run "/install llm-cost-watchdog" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is LLM Cost Watchdog free?
Yes, LLM Cost Watchdog is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does LLM Cost Watchdog support?
LLM Cost Watchdog is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created LLM Cost Watchdog?
It is built and maintained by Nima Ansari (@nimaansari); the current version is v1.0.0.