claude-api-cost-optimizer
/install claude-api-cost-optimizer
Claude API Cost Optimizer
Cut Claude API costs by 70–90% using intelligent model selection, caching, and batching.
Quick Start
- Audit your current API calls — identify which tasks use Opus or Sonnet that could use Haiku. Model selection alone saves 10–18x on simple tasks.
- Pick the cheapest model tier for each task: Haiku (cheapest) → Sonnet (mid) → Opus (most expensive, use sparingly). See
references/pricing.mdfor current rates. - Enable prompt caching for repeated context (system prompts, codebases) by adding
"cache_control": {"type": "ephemeral"}to message blocks - Implement cost reporting — track
input_tokens,output_tokens, and cache metrics from API responses
Key Concepts
- Model selection — Haiku for simple tasks (formatting, comments) — cheapest tier. Sonnet for medium (refactoring, debugging) — mid tier. Opus for complex only (architecture, security) — most expensive, use sparingly. See
references/pricing.mdfor current rates. - Prompt caching — Cache large static content (system prompts, codebase context). Cache reads cost 90% less; writes pay off after 1–2 reuses.
- Batching — Combine multiple requests into one API call to eliminate per-request overhead. 80% fewer calls ≈ 80% lower cost.
- Local caching — Cache identical responses locally to skip redundant API calls entirely.
- Context extraction — Send only relevant snippets, not whole files. Smaller inputs = lower costs.
- max_tokens discipline — Set realistic limits; unused token budget is wasted money.
Common Usage
Code examples are in Python but concepts apply to any language or SDK.
Model selection pattern:
def select_model(task_type: str) -> str:
simple_tasks = ["formatting", "comments", "explanation", "rename"]
complex_tasks = ["architecture", "algorithm", "security_audit"]
return ("claude-haiku-4-5-20251001" if task_type in simple_tasks else
"claude-opus-4-6" if task_type in complex_tasks else
"claude-sonnet-4-6")
Prompt caching:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}
}],
messages=[{
"role": "user",
"content": [
{"type": "text", "text": f"Code:\
{source_code}",
"cache_control": {"type": "ephemeral"}},
{"type": "text", "text": query}
]
}]
)
Cost tracking:
usage = response.usage
cost = (usage.input_tokens * INPUT_RATE +
usage.cache_creation_input_tokens * CACHE_WRITE_RATE +
usage.cache_read_input_tokens * CACHE_READ_RATE +
usage.output_tokens * OUTPUT_RATE)
References
references/implementation.md— Full implementation patterns, model routing, caching setup, batching, retry logic, and anti-patternsreferences/pricing.md— Current pricing, cache cost math, savings calculations, and batch API details
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install claude-api-cost-optimizer - After installation, invoke the skill by name or use
/claude-api-cost-optimizer - Provide required inputs per the skill's parameter spec and get structured output
What is claude-api-cost-optimizer?
Minimize Anthropic Claude API costs through model selection, prompt caching, batching, and cost tracking. Trigger phrases: reduce API costs, optimize Claude... It is an AI Agent Skill for Claude Code / OpenClaw, with 183 downloads so far.
How do I install claude-api-cost-optimizer?
Run "/install claude-api-cost-optimizer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is claude-api-cost-optimizer free?
Yes, claude-api-cost-optimizer is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does claude-api-cost-optimizer support?
claude-api-cost-optimizer is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux, darwin, win32).
Who created claude-api-cost-optimizer?
It is built and maintained by Deonte Cooper (@djc00p); the current version is v1.0.3.