cascadeflow: Cost + Latency Reduction
/install cascadeflow
CascadeFlow: Cost + Latency Reduction | 17+ Domain-Aware Models + OpenClaw-Native Events
Use CascadeFlow as an OpenClaw provider to lower cost and latency via cascading. Assign up to 17 domain-specific models (for coding, web search, reasoning, and more), including OpenClaw-native event handling, and cascade between them (small model first, verifier when needed). Keep setup minimal, then verify with one health check and one chat call.
Why Use It
- Reduce spend with drafter/verifier cascading.
- Run 17+ domain-aware model assignments (code, reasoning, web-search, and more).
- Support cascading with streaming and multi-step agent loops.
- Handle OpenClaw-native event/domain signals for smarter model selection.
Security Defaults
- Install from PyPI and verify package artifact before first run.
- Keep the server bound to localhost by default.
- Use explicit auth tokens for chat and stats endpoints (recommended for production).
- Expose remote access only behind TLS/reverse proxy with strong tokens.
- Use least-privilege provider keys (separate test keys from production keys).
How It Works
- OpenClaw sends requests to CascadeFlow through OpenAI-compatible
/v1/chat/completions. - CascadeFlow reads prompt context plus OpenClaw-native event/domain metadata (for example
metadata.method,metadata.event, and channel/category hints). - CascadeFlow selects a domain-aware drafter/verifier pair (small model first).
- If quality passes threshold, drafter answer is returned (cost/latency advantage).
- If quality fails threshold, verifier runs and final answer is upgraded.
- The same cascading behavior is supported for streaming and multi-step agent loops.
Advantages
- Lower average cost by avoiding verifier calls when not needed.
- Lower average latency for simple and medium tasks.
- Better quality on hard tasks through verifier fallback.
- Better operational handling through OpenClaw-native event/domain understanding.
Quick Start
Or ask your OpenClaw agent to set it up for you as an OpenClaw custom provider with OpenClaw-native events and domain understanding.
- Install and verify package source:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade "cascadeflow[openclaw]>=0.7,\x3C0.8"
python -m pip show cascadeflow
python -m pip download --no-deps "cascadeflow[openclaw]>=0.7,\x3C0.8" -d /tmp/cascadeflow_pkg
python -m pip hash /tmp/cascadeflow_pkg/cascadeflow-*.whl
Optional variants:
python -m pip install --upgrade "cascadeflow[openclaw,anthropic]>=0.7,\x3C0.8" # Anthropic-only preset
python -m pip install --upgrade "cascadeflow[openclaw,openai]>=0.7,\x3C0.8" # OpenAI-only preset
python -m pip install --upgrade "cascadeflow[openclaw,providers]>=0.7,\x3C0.8" # Mixed preset
- Pick preset + credentials:
- Presets:
examples/configs/anthropic-only.yaml,examples/configs/openai-only.yaml,examples/configs/mixed-anthropic-openai.yaml - Provider key(s):
ANTHROPIC_API_KEY=...and/orOPENAI_API_KEY=...(required based on selected preset) - Service tokens:
--auth-token ...and--stats-auth-token ...(recommended for production; use long random values)
- Start server (safe local default):
set -a; source .env; set +a
python3 -m cascadeflow.integrations.openclaw.openai_server \
--host 127.0.0.1 --port 8084 \
--config examples/configs/anthropic-only.yaml \
--auth-token local-openclaw-token \
--stats-auth-token local-stats-token
Optional harness activation (runtime in-loop policy controls):
# Observe first (recommended): log decisions, no blocking
python3 -m cascadeflow.integrations.openclaw.openai_server \
--host 127.0.0.1 --port 8084 \
--config examples/configs/anthropic-only.yaml \
--harness-mode observe
# Enforce mode with limits
python3 -m cascadeflow.integrations.openclaw.openai_server \
--host 127.0.0.1 --port 8084 \
--config examples/configs/anthropic-only.yaml \
--harness-mode enforce \
--harness-budget 1.0 \
--harness-max-tool-calls 12 \
--harness-max-latency-ms 3500 \
--harness-compliance strict
- Configure OpenClaw provider:
baseUrl:http://\x3Ccascadeflow-host>:8084/v1(local default:http://127.0.0.1:8084/v1)- If remote:
http://\x3Cserver-ip>:8084/v1orhttps://\x3Cdomain>/v1(TLS/reverse proxy) api:openai-completionsmodel:cascadeflowapiKey: same value as your--auth-token
Commands
/model cflow: default OpenClaw model switch using aliascflow./cascade: optional custom command (if configured in OpenClaw)./cascade savings: optional custom subcommand for cost stats./cascade health: optional custom subcommand for service status.
Links
- Full setup + configs:
references/clawhub_publish_pack.md - Listing strategy:
references/market_positioning.md - Official docs:
https://github.com/lemony-ai/cascadeflow/blob/main/docs/guides/openclaw_provider.md - GitHub repository:
https://github.com/lemony-ai/cascadeflow
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install cascadeflow - 安装完成后,直接呼叫该 Skill 的名称或使用
/cascadeflow触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
cascadeflow: Cost + Latency Reduction 是什么?
OpenClaw-native domain cascading. Use when users need cost/latency reduction via cascading, domain-aware model assignment, OpenClaw-native event handling, an... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 694 次。
如何安装 cascadeflow: Cost + Latency Reduction?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install cascadeflow」即可一键安装,无需额外配置。
cascadeflow: Cost + Latency Reduction 是免费的吗?
是的,cascadeflow: Cost + Latency Reduction 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
cascadeflow: Cost + Latency Reduction 支持哪些平台?
cascadeflow: Cost + Latency Reduction 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 cascadeflow: Cost + Latency Reduction?
由 Sascha Buehrle(@saschabuehrle)开发并维护,当前版本 v1.1.1。