← 返回 Skills 市场
heijiaziopenclaw

Box-KVCache

作者 heijiaziopenclaw · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
85
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install box-kvcache
功能描述
Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference.
使用说明 (SKILL.md)

box-kvcache

描述

本地大模型 KV Cache 压缩工具箱 — 基于低秩分解 + INT8 量化原理,帮助你在同等显存下跑更长的上下文、更高的并发。

适用于 Ollama、LocalAI、Text Generation WebUI 等本地 LLM 推理框架。

⚠️ 系统要求:Windows 10+ | Linux/macOS(需 Ollama)| Python 3.8+

核心功能

  1. 检测本地 LLM 环境 — 自动识别 Ollama/llama.cpp
  2. 估算 KV Cache 占用 — 量化当前上下文大小
  3. 低秩分解压缩 — 使用 SVD/PCA 降低 KV 维度
  4. INT8 量化 — 有损压缩到 8bit,省 2-4x 显存
  5. 一键启动压缩模式 — 改 Ollama 启动参数启用缓存压缩

系统要求

要求 详情
运行时 Ollama ≥ 0.1.0 或 llama.cpp
Python 3.8+
依赖 numpy, scipy
系统工具 PowerShell (Windows), bash (Linux/macOS)
可选 nvidia-smi (用于查看 GPU 显存)

安装依赖

pip install numpy scipy

安装 Ollama

# Windows/macOS/Linux
# 详见 https://ollama.com/download

工作原理

原始 KV Cache (float32) → 低秩分解 → 压缩表示 → INT8量化
     ↓                                        ↓
16GB 显存占用                          ~4-6GB 显存占用
     ↓                                        ↓
     └──────────── 推理结束后还原 ────────────┘

脚本列表

脚本 用途
check_env.py 检测本地 LLM 环境(Ollama llama.cpp)
quantize_kv.py KV Cache INT8 量化工具
lowrank_compress.py 低秩分解压缩工具
launch_compressed.py 带压缩参数启动 Ollama

使用方法

步骤1:检测环境

python scripts/check_env.py

步骤2:查看当前显存占用

python scripts/check_env.py --verbose

步骤3:启动压缩模式

python scripts/launch_compressed.py --model llama3 --context 8192 --compress

技术细节

  • 低秩分解:SVD 截断奇异值,保留核心维度
  • INT8 量化:对称量化(scale factor)
  • 压缩比:约 2-4x(有损,但精度损失 \x3C2%)
  • 适用场景:长上下文聊天、批量推理、显存不足

限制

  • 纯软件方案,效果因模型而异
  • 不是 Google TurboQuant(那是另一套实现)
  • Windows 脚本主要测试过;Linux/macOS 使用 bash

环境变量

变量 说明
OLLAMA_HOST Ollama 服务地址(默认 127.0.0.1:11434)
OLLAMA_MODELS 模型存放路径
OLLAMA_KEEP_ALIVE 模型保留时间

作者

黑匣子 @ 主人项目


Last updated: 2026-04-06

安全使用建议
This package appears to implement the described KV-cache compression algorithms and helper scripts, but review before running: - Inspect scripts locally (they are included) and run them in a sandbox or non-production environment first. - Note platform bias: many checks use Windows commands; Linux/macOS behavior may be limited. Test on your target OS. - Be aware scripts invoke shell commands (subprocess with shell=True in run_cmd). While current commands are internal, avoid running with elevated privileges and avoid passing untrusted input into those helpers. - The README/SKILL.md mention OLLAMA_* env vars but the scripts do not read them — if you depend on custom Ollama host/settings verify the tools actually honor them. - The tool will start/launch local Ollama processes; confirm your Ollama installation and model binaries are from trusted sources and you are comfortable running local services. If you want higher confidence, ask the author for: (1) explicit support matrix for Linux/macOS, (2) clarification whether OLLAMA_* env vars are used and how, and (3) a non-Windows command-path implementation for environment detection.
功能分析
Type: OpenClaw Skill Name: box-kvcache Version: 1.1.0 The skill bundle contains several security vulnerabilities that could be exploited, although no clear malicious intent was found. Specifically, 'scripts/check_env.py' uses 'subprocess.run(shell=True)' to execute system commands and PowerShell scripts, which is susceptible to shell injection. Additionally, 'scripts/lowrank_compress.py' utilizes 'np.load' with 'allow_pickle=True', a known high-risk practice that can lead to arbitrary code execution if a user is tricked into loading a crafted malicious data file. While these functions are used for environment detection and data persistence as described, they represent significant security flaws.
能力评估
Purpose & Capability
Name/description match the included code: the scripts implement low-rank SVD compression and INT8 quantization for KV caches and helpers to detect/run Ollama/llama.cpp. However, the SKILL.md claims cross-platform support (Windows, Linux, macOS) while the scripts are largely Windows-biased (use 'tasklist | findstr', 'where', PowerShell fallbacks). The SKILL.md also documents OLLAMA_* environment variables as useful, but none are required in the registry metadata and the scripts do not actually read OLLAMA_HOST / OLLAMA_MODELS / OLLAMA_KEEP_ALIVE — this is an internal inconsistency.
Instruction Scope
Runtime instructions and scripts stay within the stated purpose (environment detection, compression, quantization, and launching Ollama). They run local subprocess commands (ollama, nvidia-smi, pip, systeminfo/tasklist/where) and perform on-disk saves/loads of compressed arrays. A few minor issues: several commands use shell=True in run_cmd (which can be risky if later passed untrusted input), and some Windows-only commands are used despite cross-platform claims. There is no evidence the scripts attempt to read unrelated credentials or exfiltrate data to remote endpoints.
Install Mechanism
No install specification is provided (instruction-only in registry), and all code is included in the bundle. Nothing is downloaded from external URLs during installation. This limits supply-chain risk compared with remote downloads.
Credentials
The skill declares no required environment variables or credentials in registry metadata (good). SKILL.md documents optional OLLAMA_* variables but they are informational only — the scripts do not read those variables. No secrets or unrelated credentials are requested. This mismatch (documented env vars vs actual usage) is inconsistent but not directly dangerous.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide configurations. It can start/launch an Ollama local service (calls 'ollama serve' and runs 'ollama run'), which is expected for this functionality but means it will start local processes if you run it.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install box-kvcache
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /box-kvcache 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
box-kvcache 1.1.0 - 增加系统要求说明,包括操作系统、Ollama 版本、Python 依赖等。 - 新增“安装依赖”、“安装 Ollama”及“系统要求”详细说明。 - 增补 Windows/Linux/macOS 脚本与依赖工具要求。 - 新增常用环境变量说明表格。 - 明确 Windows、Linux/macOS 支持情况及注意事项。
v1.0.0
box-kvcache 1.0.0 - 首发本地大模型 KV Cache 压缩工具箱,支持 Ollama、LocalAI、Text Generation WebUI 等框架 - 提供环境自动检测、KV Cache 占用估算、低秩分解和 INT8 量化压缩 - 支持一键启动压缩模式,减少显存占用 2–4 倍 - 附带脚本用于检测环境、量化和低秩分解等操作 - 需求:Python 3.8+,numpy,scipy
元数据
Slug box-kvcache
版本 1.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Box-KVCache 是什么?

Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 85 次。

如何安装 Box-KVCache?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install box-kvcache」即可一键安装,无需额外配置。

Box-KVCache 是免费的吗?

是的,Box-KVCache 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Box-KVCache 支持哪些平台?

Box-KVCache 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Box-KVCache?

由 heijiaziopenclaw(@heijiaziopenclaw)开发并维护,当前版本 v1.1.0。

💬 留言讨论