/install anthropic-token-optimizer
Anthropic Token Optimizer
Minimize Anthropic API costs without sacrificing quality. Focus on cache read reduction — the #1 cost driver for long sessions with Opus.
Anthropic Pricing Reference
| Model | Input | Cache Write | Cache Read | Output |
|---|---|---|---|---|
| Haiku | $0.80 | $1.00 | $0.08 | $4 |
| Sonnet | $3 | $3.75 | $0.30 | $15 |
| Opus | $15 | $18.75 | $1.50 | $75 |
(per MTok — multiply by millions of tokens)
Key insight: Cache read = context_size × num_turns. 200k context × 50 turns on Opus = ~$15 just cache reads.
Part 1: Config Optimizations (openclaw.json)
1. Compaction model — use cheaper model (highest ROI ✅)
"compaction": {
"mode": "safeguard",
"model": "anthropic/claude-sonnet-4-20250514"
}
Sonnet summarizes at 5x less cost than Opus. Quality comparable for summaries.
2. Context pruning — trim old tool results
"contextPruning": { "mode": "cache-ttl", "ttl": "1h" }
| TTL | Behavior | Verdict |
|---|---|---|
1h |
Trims after 1 hour | ✅ Best balance |
30m |
Moderate | OK for active sessions |
5m |
Aggressive | ⚠️ Loses tool results |
3. Cache retention — keep "long"
"params": { "cacheRetention": "long", "context1m": true }
"long" = fewer cache writes ($18.75/MTok on Opus!). Don't switch to "default".
4. Cache TTL heartbeat alignment
Anthropic cache expires after ~1h idle. Heartbeat at 55min keeps it warm:
"heartbeat": { "every": "55m" }
Prevents expensive cache re-writes when agent resumes after idle.
5. Bootstrap size limits — cap workspace injection
"bootstrapMaxChars": 8000,
"bootstrapTotalMaxChars": 30000
Check current injection: /context list. Prevents oversized files from inflating every turn.
Part 2: Workspace Hygiene (biggest long-term win)
File budgets
| File | Target | Review cycle |
|---|---|---|
| AGENTS.md | \x3C500 tokens (~2KB) | Monthly |
| SOUL.md | \x3C250 tokens (~1KB) | Rarely |
| TOOLS.md | \x3C500 tokens (~2KB) | When tools change |
| MEMORY.md | \x3C400 tokens (~1.5KB) | Weekly prune |
| HEARTBEAT.md | \x3C400 tokens (~1.5KB) | When done |
| memory/YYYY-MM-DD.md | \x3C30 lines | Collapse at EOD |
| Total injected | \x3C2,800 tokens |
Reduction tactics
- Merge redundant files (IDENTITY.md + USER.md → SOUL.md)
- Move non-essential docs to subfolders (
docs/,notes/) — not auto-injected - Collapse exploration into decisions: keep "what we decided", delete "how we got here"
- Prune ghost context: references to old paths, removed tools, fixed bugs
- Deduplicate: info in SOUL.md shouldn't repeat in MEMORY.md or AGENTS.md
Daily memory rules
Write: decisions + why (1 line), new tools/config, lessons, user preferences. Skip: exploration steps, command outputs, things already in MEMORY.md, delivered content. Format: Bullets, not paragraphs. One fact per line.
Part 3: Behavioral Patterns
6. /compact after each topic (most effective manual action)
/compact Focus on [topic summary]
7. /new when switching topics entirely
Context resets to 0. Don't carry 200k into unrelated work.
8. Subagents for tool-heavy work
Spawn subagents (cheaper model) for: codebase grep, reading 5+ files, research/web fetch. Tool results stay isolated.
9. Tool output discipline
- Truncate:
| head -20,| jq '.key' - Request only needed fields from APIs
- Never paste full JSON when you need one value
- Output >50 lines → summarize, don't quote
10. File loading discipline
- Startup: only today + yesterday memory files
- Read SKILL.md only when task needs that skill
- Don't re-read files already in context
Part 4: Context Budgeting
Information partitioning
| Budget | Content |
|---|---|
| 10% | Task instructions + constraints |
| 40% | Recent 5-10 turns of dialogue |
| 20% | Decision logs ("tried X, failed because Y") |
| 20% | High-relevance MEMORY.md snippets |
| 10% | Tool schemas + system prompt |
Compaction survival
Before compaction hits, critical state must be captured:
- WAL Protocol: On corrections, decisions, specific values → write to
SESSION-STATE.mdbefore responding - Working buffer: At 60%+ context → append exchange summaries to
memory/working-buffer.md - Recovery: After compaction, read buffer + session state first. Never ask "where were we?"
Session lifespan
After 85% context or 3+ compactions → start fresh with /new. Good MEMORY.md means minimal context loss.
Part 5: Codebase Map Caching (Atris Pattern)
Problem: Every code review or exploration session re-reads the same files, burning tokens on repeated grep/read calls. A 500-file project can cost 50k+ tokens just for navigation.
Solution: Generate a persistent codebase map (atris/MAP.md) once, reuse across sessions.
Setup (one-time per project)
# Install atris skill
npx clawhub@latest install atris
# Or manually: create atris/ folder in project root, then scan
rg "^(export|function|class|const|def |async def |router\.|app\.)" \
--line-number -g "!node_modules" -g "!.git" -g "!dist" -g "!.env*"
MAP.md structure
# MAP.md — [Project] Navigation Guide
> Last updated: YYYY-MM-DD
## Quick Reference
- `src/index.ts:1` — App entry point
- `src/routes/auth.ts:15` — POST /login handler
- `src/models/user.ts:8` — User schema
### Feature: Authentication
- **Entry:** `src/auth/login.ts:45-89` (handleLogin)
- **Validation:** `src/auth/validate.ts:12` (validateToken)
- **Routes:** `src/routes/auth.ts:5-28`
### Feature: Billing
- **Controller:** `src/controllers/billing.ts:20`
- **Service:** `src/services/stripe.ts:1-45`
MAP-first rule
Before searching codebase:
- Read
atris/MAP.md— found? Go directly to file:line - Not found? Search with
rg, then add result to MAP.md
Map gets smarter every session. Never let a discovery go unrecorded.
Keeping fresh
- New file → add to relevant section
- Deleted file → remove from map
- Major refactor → regenerate affected sections only
- Small updates, not full regeneration
Token savings
| Codebase | Without map | With map | Savings |
|---|---|---|---|
| Small (50 files) | ~5k tokens/explore | ~1k | 80% |
| Medium (200 files) | ~20k tokens/explore | ~3k | 85% |
| Large (500+ files) | ~50k tokens/explore | ~5k | 90% |
Diagnostics
/context list → token count per injected file
/context detail → full breakdown (tools, skills, system)
/usage tokens → append token count to every reply
/usage cost → cumulative cost summary
/status → model, context %, cost estimate
Decision Matrix
| Situation | Action |
|---|---|
| Session >100k context | /compact immediately |
| Switching topics | /new or /compact |
| Reading 5+ files | Spawn subagent |
| Compaction cost high | Set compaction model to Sonnet |
| Daily cost >$10 | Audit session count, compact more |
| Cache writes spiking | Heartbeat ≤55min, keep cacheRetention: long |
| Workspace injection >20KB | Merge/move files, set bootstrapMaxChars |
| Context >85% | /new — start fresh |
Impact Summary
| Technique | Savings | UX Impact |
|---|---|---|
| Compaction model = Sonnet | ~80% compaction cost | None |
| Workspace file budgets | ~30-50% base cost | None |
/compact after topics |
~40-60% cache read | Manual step |
| Cache TTL heartbeat (55m) | ~20-30% cache writes | None |
| Bootstrap size limits | ~20-30% base cost | None |
| Subagent delegation | ~30% cache read | Better (parallel) |
| Tool output discipline | ~10-20% per turn | Requires habit |
/new for new topics |
~100% (reset) | Lose old context |
| Codebase map (Atris) | ~80-90% code exploration | One-time setup |
Credits
Incorporates ideas from: openclaw-token-optimizer, context-slimmer, context-budgeting, compaction-survival, context-hygiene, atris (codebase map caching).
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install anthropic-token-optimizer - 安装完成后,直接呼叫该 Skill 的名称或使用
/anthropic-token-optimizer触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Anthropic Token Optimizer 是什么?
Reduce Anthropic API costs (cache read, compaction, context bloat) for OpenClaw agents. Use when users ask about token optimization, reducing API costs, cach... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 112 次。
如何安装 Anthropic Token Optimizer?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install anthropic-token-optimizer」即可一键安装,无需额外配置。
Anthropic Token Optimizer 是免费的吗?
是的,Anthropic Token Optimizer 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Anthropic Token Optimizer 支持哪些平台?
Anthropic Token Optimizer 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Anthropic Token Optimizer?
由 ngocgd(@ngocgd)开发并维护,当前版本 v1.3.0。