/install inception-token-optimizer
Inception Token Optimizer
Reduce Inception API token consumption through prompt engineering, context management, and budget enforcement.
Free-Tier Limits (Inception Labs)
| Metric | Cap |
|---|---|
| Requests/min | 100 |
| Input tokens/min | 100,000 |
| Output tokens/min | 10,000 |
Core Strategies
1. Prompt Compression
- Remove redundant instructions, filler words, and repeated context.
- Use short system prompts: "Concise answers. French." beats a 200-word persona block.
- Avoid re-sending unchanged context — only send deltas.
- Ask for short replies: "Réponds en \x3C 100 mots."
2. Context Pruning
- Before sending, estimate tokens:
len(text) // 4(rough heuristic). - If total context > target budget, drop oldest messages and replace with a 1-2 sentence summary.
- Use
references/pruning-strategies.mdfor detailed patterns.
3. Caching
- Identical prompts → reuse previous response. Do not re-call.
- Hash the prompt; if seen recently (within session), return cached reply.
scripts/lru_cache.pyprovides a drop-in LRU cache (256 items default).
4. Model Selection
- Use cheaper/faster models for simple tasks (summarisation, classification).
- Reserve Mercury (or flagship) for complex reasoning only.
- Batch trivial queries into a single prompt instead of multiple calls.
5. Output Budgeting
- Set
max_tokensexplicitly — never leave it open-ended. - Target 150-200 output tokens for conversational replies.
- Use
temperature=0.7to reduce verbose wandering.
Token Budget Guard
scripts/token_bucket.py enforces per-minute caps using a sliding window:
from scripts.token_bucket import TokenBucket
bucket = TokenBucket(req_per_min=100, in_tok_per_min=100_000, out_tok_per_min=10_000)
bucket.wait_for_slot(in_tokens=500, out_tokens=200)
# proceed with API call
Blocks until a slot is available. Use before every Inception API call.
When to Use This Skill
- Before sending a prompt to Inception → compress & prune first.
- When monitoring costs → check token estimates.
- When near free-tier limits → activate budget guard.
- When building automation → integrate caching + bucket guard.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install inception-token-optimizer - After installation, invoke the skill by name or use
/inception-token-optimizer - Provide required inputs per the skill's parameter spec and get structured output
What is Inception Token Optimizer?
Optimize Inception Labs token usage to minimize costs. Use when choosing Inception models (Mercury, etc.), crafting prompts for Inception, analyzing token co... It is an AI Agent Skill for Claude Code / OpenClaw, with 147 downloads so far.
How do I install Inception Token Optimizer?
Run "/install inception-token-optimizer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Inception Token Optimizer free?
Yes, Inception Token Optimizer is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Inception Token Optimizer support?
Inception Token Optimizer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Inception Token Optimizer?
It is built and maintained by nelmaz (@nelmaz); the current version is v1.0.0.