← 返回 Skills 市场
Box-KVCache
作者
heijiaziopenclaw
· GitHub ↗
· v1.1.0
· MIT-0
85
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install box-kvcache
功能描述
Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference.
使用说明 (SKILL.md)
box-kvcache
描述
本地大模型 KV Cache 压缩工具箱 — 基于低秩分解 + INT8 量化原理,帮助你在同等显存下跑更长的上下文、更高的并发。
适用于 Ollama、LocalAI、Text Generation WebUI 等本地 LLM 推理框架。
⚠️ 系统要求:Windows 10+ | Linux/macOS(需 Ollama)| Python 3.8+
核心功能
- 检测本地 LLM 环境 — 自动识别 Ollama/llama.cpp
- 估算 KV Cache 占用 — 量化当前上下文大小
- 低秩分解压缩 — 使用 SVD/PCA 降低 KV 维度
- INT8 量化 — 有损压缩到 8bit,省 2-4x 显存
- 一键启动压缩模式 — 改 Ollama 启动参数启用缓存压缩
系统要求
| 要求 | 详情 |
|---|---|
| 运行时 | Ollama ≥ 0.1.0 或 llama.cpp |
| Python | 3.8+ |
| 依赖 | numpy, scipy |
| 系统工具 | PowerShell (Windows), bash (Linux/macOS) |
| 可选 | nvidia-smi (用于查看 GPU 显存) |
安装依赖
pip install numpy scipy
安装 Ollama
# Windows/macOS/Linux
# 详见 https://ollama.com/download
工作原理
原始 KV Cache (float32) → 低秩分解 → 压缩表示 → INT8量化
↓ ↓
16GB 显存占用 ~4-6GB 显存占用
↓ ↓
└──────────── 推理结束后还原 ────────────┘
脚本列表
| 脚本 | 用途 |
|---|---|
check_env.py |
检测本地 LLM 环境(Ollama llama.cpp) |
quantize_kv.py |
KV Cache INT8 量化工具 |
lowrank_compress.py |
低秩分解压缩工具 |
launch_compressed.py |
带压缩参数启动 Ollama |
使用方法
步骤1:检测环境
python scripts/check_env.py
步骤2:查看当前显存占用
python scripts/check_env.py --verbose
步骤3:启动压缩模式
python scripts/launch_compressed.py --model llama3 --context 8192 --compress
技术细节
- 低秩分解:SVD 截断奇异值,保留核心维度
- INT8 量化:对称量化(scale factor)
- 压缩比:约 2-4x(有损,但精度损失 \x3C2%)
- 适用场景:长上下文聊天、批量推理、显存不足
限制
- 纯软件方案,效果因模型而异
- 不是 Google TurboQuant(那是另一套实现)
- Windows 脚本主要测试过;Linux/macOS 使用 bash
环境变量
| 变量 | 说明 |
|---|---|
OLLAMA_HOST |
Ollama 服务地址(默认 127.0.0.1:11434) |
OLLAMA_MODELS |
模型存放路径 |
OLLAMA_KEEP_ALIVE |
模型保留时间 |
作者
黑匣子 @ 主人项目
Last updated: 2026-04-06
安全使用建议
This package appears to implement the described KV-cache compression algorithms and helper scripts, but review before running:
- Inspect scripts locally (they are included) and run them in a sandbox or non-production environment first.
- Note platform bias: many checks use Windows commands; Linux/macOS behavior may be limited. Test on your target OS.
- Be aware scripts invoke shell commands (subprocess with shell=True in run_cmd). While current commands are internal, avoid running with elevated privileges and avoid passing untrusted input into those helpers.
- The README/SKILL.md mention OLLAMA_* env vars but the scripts do not read them — if you depend on custom Ollama host/settings verify the tools actually honor them.
- The tool will start/launch local Ollama processes; confirm your Ollama installation and model binaries are from trusted sources and you are comfortable running local services.
If you want higher confidence, ask the author for: (1) explicit support matrix for Linux/macOS, (2) clarification whether OLLAMA_* env vars are used and how, and (3) a non-Windows command-path implementation for environment detection.
功能分析
Type: OpenClaw Skill
Name: box-kvcache
Version: 1.1.0
The skill bundle contains several security vulnerabilities that could be exploited, although no clear malicious intent was found. Specifically, 'scripts/check_env.py' uses 'subprocess.run(shell=True)' to execute system commands and PowerShell scripts, which is susceptible to shell injection. Additionally, 'scripts/lowrank_compress.py' utilizes 'np.load' with 'allow_pickle=True', a known high-risk practice that can lead to arbitrary code execution if a user is tricked into loading a crafted malicious data file. While these functions are used for environment detection and data persistence as described, they represent significant security flaws.
能力评估
Purpose & Capability
Name/description match the included code: the scripts implement low-rank SVD compression and INT8 quantization for KV caches and helpers to detect/run Ollama/llama.cpp. However, the SKILL.md claims cross-platform support (Windows, Linux, macOS) while the scripts are largely Windows-biased (use 'tasklist | findstr', 'where', PowerShell fallbacks). The SKILL.md also documents OLLAMA_* environment variables as useful, but none are required in the registry metadata and the scripts do not actually read OLLAMA_HOST / OLLAMA_MODELS / OLLAMA_KEEP_ALIVE — this is an internal inconsistency.
Instruction Scope
Runtime instructions and scripts stay within the stated purpose (environment detection, compression, quantization, and launching Ollama). They run local subprocess commands (ollama, nvidia-smi, pip, systeminfo/tasklist/where) and perform on-disk saves/loads of compressed arrays. A few minor issues: several commands use shell=True in run_cmd (which can be risky if later passed untrusted input), and some Windows-only commands are used despite cross-platform claims. There is no evidence the scripts attempt to read unrelated credentials or exfiltrate data to remote endpoints.
Install Mechanism
No install specification is provided (instruction-only in registry), and all code is included in the bundle. Nothing is downloaded from external URLs during installation. This limits supply-chain risk compared with remote downloads.
Credentials
The skill declares no required environment variables or credentials in registry metadata (good). SKILL.md documents optional OLLAMA_* variables but they are informational only — the scripts do not read those variables. No secrets or unrelated credentials are requested. This mismatch (documented env vars vs actual usage) is inconsistent but not directly dangerous.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide configurations. It can start/launch an Ollama local service (calls 'ollama serve' and runs 'ollama run'), which is expected for this functionality but means it will start local processes if you run it.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install box-kvcache - 安装完成后,直接呼叫该 Skill 的名称或使用
/box-kvcache触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
box-kvcache 1.1.0
- 增加系统要求说明,包括操作系统、Ollama 版本、Python 依赖等。
- 新增“安装依赖”、“安装 Ollama”及“系统要求”详细说明。
- 增补 Windows/Linux/macOS 脚本与依赖工具要求。
- 新增常用环境变量说明表格。
- 明确 Windows、Linux/macOS 支持情况及注意事项。
v1.0.0
box-kvcache 1.0.0
- 首发本地大模型 KV Cache 压缩工具箱,支持 Ollama、LocalAI、Text Generation WebUI 等框架
- 提供环境自动检测、KV Cache 占用估算、低秩分解和 INT8 量化压缩
- 支持一键启动压缩模式,减少显存占用 2–4 倍
- 附带脚本用于检测环境、量化和低秩分解等操作
- 需求:Python 3.8+,numpy,scipy
元数据
常见问题
Box-KVCache 是什么?
Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 85 次。
如何安装 Box-KVCache?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install box-kvcache」即可一键安装,无需额外配置。
Box-KVCache 是免费的吗?
是的,Box-KVCache 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Box-KVCache 支持哪些平台?
Box-KVCache 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Box-KVCache?
由 heijiaziopenclaw(@heijiaziopenclaw)开发并维护,当前版本 v1.1.0。
推荐 Skills