← 返回 Skills 市场
certainlogicai

Token Reduction Engine

作者 CertainLogicAI · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
66
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install certainlogic-tre
功能描述
Cut AI token costs by 60-85% with deterministic query validation and intelligent caching. Standalone or integrates with Company Brain Core OS. Free, open-sou...
使用说明 (SKILL.md)

Token Reduction Engine (TRE)

Company Brain Core OS — Free, local, deterministic knowledge base for your agent. Start here if you need persistent facts before optimizing token spend.

Cut AI costs. Not quality. Not accuracy.

v1.0.0

Built and dogfooded by CertainLogicAI — Saves us 79%+ on every cached query. Measurement beats hope.


Part of the CertainLogic Stack

This skill works standalone. You don't need anything else to start cutting token costs today.

Works even better with Company Brain Core OS (clawhub install company-brain-os):

  • Brain stores validated facts — TRE caches deterministic responses
  • AgentPathfinder — audits which cache policies led to good outcomes
  • Smart Router — routes non-cached queries to cheaper model tiers

All four are independent. Pick the one that solves your problem. Add others when you need them.


How It Works — Three Layers

TRE sits between your agent and the LLM. It intercepts queries and decides: cache hit, deterministic lookup, or expensive LLM call.

User Query → TRE Decision Layer → Result (cache | brain | LLM)

Layer 1: Query Cache (Answer Cache)

Stores LLM responses so repeated queries return instantly — zero tokens.

  • SHA-256 hash of exact query as cache key
  • LRU eviction — keeps hot answers, drops cold ones
  • TTL-based expiry — stale data auto-purges (default: 3600s)
  • Hallucination Guard — hedged answers ("I think", "maybe", "not sure") are shown to user but NOT cached
# After first LLM call:
tre.cache_answer("What is our refund policy?", "30 days, no questions.")

# Every subsequent call:
answer = tre.get_cached_answer("What is our refund policy?")
# → Instant. Zero tokens. No LLM hit.

Layer 2: Deterministic Lookup

For questions with factual answers, skip the LLM entirely.

  • Intent classification — matches query to known domain (strategy, product, security, etc.)
  • Facts DB overlay — structured key-value pairs with source attribution
  • Zero LLM cost — pre-validated facts served direct from SQLite
"What products do we offer?" → Intent: product → Facts DB → Instant answer
"How does the brain work?" → Intent: strategy → Facts DB → Instant answer
"Who wrote the security policy?" → Intent: security → Facts DB → Instant answer

Layer 3: Intent Filtering (Sanity Gate)

Prevents expensive LLM calls that can't possibly have good answers.

  • Forbidden intents — blocks "brain.delete_brain", "brain.purge", malicious commands
  • Required fields gate — rejects malformed queries before they hit the LLM
  • Domain alignment — refuses answers outside declared intent scope (prevents hallucinated guesses)

Major Benefits

1. 60-85% Token Cost Reduction

  • Cache hit rate: up to 79% (measured on Brain workloads — your results depend on query repetition)
  • Deterministic path: \x3C10ms vs 500-2000ms LLM round-trip
  • Cost comparison: $0 per cached answer vs $0.01-0.15 per LLM call
Workload Without TRE With TRE Savings
Repetitive queries (support, FAQ) $10-15/day $1-3/day 85%
Mixed unique + repeat $300/mo $45-90/mo 40-70%

2. Zero Hallucination on Cached Facts

  • Hallucination Guard detects hedged language: "I think", "maybe", "probably", "not sure"
  • Flagged responses are shown but not cached — prevents poisoned cache
  • SH A-256 verification on every write — tamper detection for audit trails
  • Source attribution — every cached fact traces back to who loaded it and when

3. Instant Sub-10ms Responses

  • Instant responses — SQLite-backed local cache serves answers from RAM (no network round-trip)
  • Cold → hot acceleration — cache warmup brings first-query latency from ~700ms to sub-50ms

4. Agent Won't Run Amok

  • Forbidden command list — "brain.delete_brain", "brain.purge" are blocked before execution
  • Intent scope enforcement — agent refuses queries outside declared domain
  • SHA-256 write verification — any tampered cache entry is rejected on read
  • AgentPathfinder integration — HMAC-signed audit trail on every decision

5. Works With Any LLM Stack

  • OpenAI (GPT-4o, GPT-4o-mini)
  • Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
  • Local models (Ollama, llama.cpp, vLLM)
  • Multi-provider setups — TRE sits in front of all of them
# Works with any backend
llm_provider: openai  # or anthropic, local, azure, etc.
tre:
  cache_ttl: 3600
  max_cache_size: 10000
  hallucination_guard: true

6. Persisted Across Restarts

  • Cache auto-saves to disk on every write
  • Load persisted cache on startup — no warm-up period
  • Configurable persistence path — store anywhere (local disk, shared volume, etc.)

7. Standalone — No Lock-in

  • Runs on localhost — no external API, no SaaS dependency
  • MIT License — free to fork, modify, commercialize
  • No telemetry — your queries never leave your machine
  • 50-line integration — drop into existing agent in minutes

Standalone Usage

clawhub install certainlogic-tre
from tre import cache_answer, get_cached_answer, get_metrics

# After getting an LLM response
tre.cache_answer(query, answer)

# Next time — zero tokens
cached = tre.get_cached_answer(query)
if cached:
    answer, token_count = cached
    print(f"Instant! Saved {token_count} tokens.")

# Check your savings
print(tre.get_metrics())
# {'cache_hits': 853, 'cache_misses': 223, 'cache_hit_rate_percent': 79.3}

See references/CONFIGURATION.md for full config options (TTL, cache size, guard sensitivity, forbidden commands).

See references/API.md for Python API reference.


Integration with Company Brain Core OS

When Brain is installed, TRE gets smarter:

  • Facts DB pre-populates cache — Brain's 52+ facts are available instantly
  • Intent classification improves — Brain's domain model refines routing
  • Query classification — Brain categorizes new queries, expanding cache coverage
# Install both
clawhub install company-brain-os certainlogic-tre

# Brain loads facts → TRE caches them → instant responses for common questions

Metrics (Measured Ourselves)

Metric Value Measurement Period
Cache hit rate 79.3% 2026-05-09, 615 queries
Avg latency (cache hit) \x3C10ms Local SQLite
Avg latency (LLM miss) 757ms OpenRouter fallback
Hallucination guard flags 0.2% Hedge language detection
Token cost savings 85% Cached queries vs LLM calls
Forbidden command blocks 100% All delete_brain, purge blocked

Free vs Pro

Feature Free Pro (Planned)
Query answer cache
Deterministic lookup
Hallucination guard
Intent filtering
Cache persistence
Brain integration
Distributed cache (Redis)
Cache analytics dashboard
A/B testing policies
Team policy sharing
Auto policy optimization

Pro pricing: TBD — join the waitlist at certainlogic.ai/tre


License

MIT — free to use, modify, distribute.

Built with brutal honesty by CertainLogic

安全使用建议
Install only if you are comfortable with a local plaintext answer cache. Do not rely on this skill as a complete hallucination, tamper, or command-safety guard unless the publisher adds and documents those controls. Protect or clear the cache for sensitive work, and use the optional Brain integration only with a trusted local service.
功能分析
Type: OpenClaw Skill Name: certainlogic-tre Version: 1.0.1 The Token Reduction Engine (TRE) is a utility designed to cache LLM responses locally to reduce API costs and latency. The bundle includes a 'Hallucination Guard' (hallucination_detector.py) that uses regular expressions to identify uncertain or speculative language, preventing low-confidence answers from being cached. While there are discrepancies between the documentation and the implementation (e.g., the 'configure' function and intent filtering mentioned in SKILL.md and API.md are missing from tre.py), these appear to be unintentional omissions or development artifacts rather than malicious intent. The code performs local file operations for persistence in a dedicated directory (~/.tre) and includes an optional integration with a local service at 127.0.0.1:8000, which is consistent with the stated purpose.
能力标签
cryptocan-make-purchases
能力评估
Purpose & Capability
The local caching purpose is coherent, but the documentation claims stronger safety features than the included code supports, including intent enforcement, command blocking, tamper rejection, and zero-hallucination cached facts.
Instruction Scope
The examples and integration instructions are mostly user-directed and purpose-aligned, but users should not treat the advertised safety gates as active controls based on the provided implementation.
Install Mechanism
There is no registry install spec that auto-runs code, but an included install.sh can be run manually and writes local Python files under TRE_HOME or ~/.tre.
Credentials
No credentials or external SaaS calls are required. The code includes an optional localhost Brain API helper that can write query and answer snippets to a local service if called.
Persistence & Privilege
The cache persists answers in a local JSON file and loads them on import. This is expected for the stated cache purpose, but it can retain sensitive or stale content across sessions.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install certainlogic-tre
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /certainlogic-tre 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
Fixed all known issues: bundled real Python modules (tre.py, hallucination_detector.py), install.sh now does real installation with example.py, health validation test.py, SKILL.md claims separated Brain metrics vs TRE-specific, removed speculative 9/mo pricing.
v1.0.0
Initial release: 60-85% token cost reduction via intelligent caching and deterministic query validation. Hallucination Guard. Intent filtering. 7 major benefits documented. 79.3% production cache hit rate.
元数据
Slug certainlogic-tre
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Token Reduction Engine 是什么?

Cut AI token costs by 60-85% with deterministic query validation and intelligent caching. Standalone or integrates with Company Brain Core OS. Free, open-sou... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 66 次。

如何安装 Token Reduction Engine?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install certainlogic-tre」即可一键安装,无需额外配置。

Token Reduction Engine 是免费的吗?

是的,Token Reduction Engine 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Token Reduction Engine 支持哪些平台?

Token Reduction Engine 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Token Reduction Engine?

由 CertainLogicAI(@certainlogicai)开发并维护,当前版本 v1.0.1。

💬 留言讨论