← Back to Skills Marketplace
heijiaziopenclaw

Box-KVCache

by heijiaziopenclaw · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
85
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install box-kvcache
Description
Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference.
README (SKILL.md)

box-kvcache

描述

本地大模型 KV Cache 压缩工具箱 — 基于低秩分解 + INT8 量化原理,帮助你在同等显存下跑更长的上下文、更高的并发。

适用于 Ollama、LocalAI、Text Generation WebUI 等本地 LLM 推理框架。

⚠️ 系统要求:Windows 10+ | Linux/macOS(需 Ollama)| Python 3.8+

核心功能

  1. 检测本地 LLM 环境 — 自动识别 Ollama/llama.cpp
  2. 估算 KV Cache 占用 — 量化当前上下文大小
  3. 低秩分解压缩 — 使用 SVD/PCA 降低 KV 维度
  4. INT8 量化 — 有损压缩到 8bit,省 2-4x 显存
  5. 一键启动压缩模式 — 改 Ollama 启动参数启用缓存压缩

系统要求

要求 详情
运行时 Ollama ≥ 0.1.0 或 llama.cpp
Python 3.8+
依赖 numpy, scipy
系统工具 PowerShell (Windows), bash (Linux/macOS)
可选 nvidia-smi (用于查看 GPU 显存)

安装依赖

pip install numpy scipy

安装 Ollama

# Windows/macOS/Linux
# 详见 https://ollama.com/download

工作原理

原始 KV Cache (float32) → 低秩分解 → 压缩表示 → INT8量化
     ↓                                        ↓
16GB 显存占用                          ~4-6GB 显存占用
     ↓                                        ↓
     └──────────── 推理结束后还原 ────────────┘

脚本列表

脚本 用途
check_env.py 检测本地 LLM 环境(Ollama llama.cpp)
quantize_kv.py KV Cache INT8 量化工具
lowrank_compress.py 低秩分解压缩工具
launch_compressed.py 带压缩参数启动 Ollama

使用方法

步骤1:检测环境

python scripts/check_env.py

步骤2:查看当前显存占用

python scripts/check_env.py --verbose

步骤3:启动压缩模式

python scripts/launch_compressed.py --model llama3 --context 8192 --compress

技术细节

  • 低秩分解:SVD 截断奇异值,保留核心维度
  • INT8 量化:对称量化(scale factor)
  • 压缩比:约 2-4x(有损,但精度损失 \x3C2%)
  • 适用场景:长上下文聊天、批量推理、显存不足

限制

  • 纯软件方案,效果因模型而异
  • 不是 Google TurboQuant(那是另一套实现)
  • Windows 脚本主要测试过;Linux/macOS 使用 bash

环境变量

变量 说明
OLLAMA_HOST Ollama 服务地址(默认 127.0.0.1:11434)
OLLAMA_MODELS 模型存放路径
OLLAMA_KEEP_ALIVE 模型保留时间

作者

黑匣子 @ 主人项目


Last updated: 2026-04-06

Usage Guidance
This package appears to implement the described KV-cache compression algorithms and helper scripts, but review before running: - Inspect scripts locally (they are included) and run them in a sandbox or non-production environment first. - Note platform bias: many checks use Windows commands; Linux/macOS behavior may be limited. Test on your target OS. - Be aware scripts invoke shell commands (subprocess with shell=True in run_cmd). While current commands are internal, avoid running with elevated privileges and avoid passing untrusted input into those helpers. - The README/SKILL.md mention OLLAMA_* env vars but the scripts do not read them — if you depend on custom Ollama host/settings verify the tools actually honor them. - The tool will start/launch local Ollama processes; confirm your Ollama installation and model binaries are from trusted sources and you are comfortable running local services. If you want higher confidence, ask the author for: (1) explicit support matrix for Linux/macOS, (2) clarification whether OLLAMA_* env vars are used and how, and (3) a non-Windows command-path implementation for environment detection.
Capability Analysis
Type: OpenClaw Skill Name: box-kvcache Version: 1.1.0 The skill bundle contains several security vulnerabilities that could be exploited, although no clear malicious intent was found. Specifically, 'scripts/check_env.py' uses 'subprocess.run(shell=True)' to execute system commands and PowerShell scripts, which is susceptible to shell injection. Additionally, 'scripts/lowrank_compress.py' utilizes 'np.load' with 'allow_pickle=True', a known high-risk practice that can lead to arbitrary code execution if a user is tricked into loading a crafted malicious data file. While these functions are used for environment detection and data persistence as described, they represent significant security flaws.
Capability Assessment
Purpose & Capability
Name/description match the included code: the scripts implement low-rank SVD compression and INT8 quantization for KV caches and helpers to detect/run Ollama/llama.cpp. However, the SKILL.md claims cross-platform support (Windows, Linux, macOS) while the scripts are largely Windows-biased (use 'tasklist | findstr', 'where', PowerShell fallbacks). The SKILL.md also documents OLLAMA_* environment variables as useful, but none are required in the registry metadata and the scripts do not actually read OLLAMA_HOST / OLLAMA_MODELS / OLLAMA_KEEP_ALIVE — this is an internal inconsistency.
Instruction Scope
Runtime instructions and scripts stay within the stated purpose (environment detection, compression, quantization, and launching Ollama). They run local subprocess commands (ollama, nvidia-smi, pip, systeminfo/tasklist/where) and perform on-disk saves/loads of compressed arrays. A few minor issues: several commands use shell=True in run_cmd (which can be risky if later passed untrusted input), and some Windows-only commands are used despite cross-platform claims. There is no evidence the scripts attempt to read unrelated credentials or exfiltrate data to remote endpoints.
Install Mechanism
No install specification is provided (instruction-only in registry), and all code is included in the bundle. Nothing is downloaded from external URLs during installation. This limits supply-chain risk compared with remote downloads.
Credentials
The skill declares no required environment variables or credentials in registry metadata (good). SKILL.md documents optional OLLAMA_* variables but they are informational only — the scripts do not read those variables. No secrets or unrelated credentials are requested. This mismatch (documented env vars vs actual usage) is inconsistent but not directly dangerous.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide configurations. It can start/launch an Ollama local service (calls 'ollama serve' and runs 'ollama run'), which is expected for this functionality but means it will start local processes if you run it.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install box-kvcache
  3. After installation, invoke the skill by name or use /box-kvcache
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
box-kvcache 1.1.0 - 增加系统要求说明,包括操作系统、Ollama 版本、Python 依赖等。 - 新增“安装依赖”、“安装 Ollama”及“系统要求”详细说明。 - 增补 Windows/Linux/macOS 脚本与依赖工具要求。 - 新增常用环境变量说明表格。 - 明确 Windows、Linux/macOS 支持情况及注意事项。
v1.0.0
box-kvcache 1.0.0 - 首发本地大模型 KV Cache 压缩工具箱,支持 Ollama、LocalAI、Text Generation WebUI 等框架 - 提供环境自动检测、KV Cache 占用估算、低秩分解和 INT8 量化压缩 - 支持一键启动压缩模式,减少显存占用 2–4 倍 - 附带脚本用于检测环境、量化和低秩分解等操作 - 需求:Python 3.8+,numpy,scipy
Metadata
Slug box-kvcache
Version 1.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Box-KVCache?

Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference. It is an AI Agent Skill for Claude Code / OpenClaw, with 85 downloads so far.

How do I install Box-KVCache?

Run "/install box-kvcache" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Box-KVCache free?

Yes, Box-KVCache is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Box-KVCache support?

Box-KVCache is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Box-KVCache?

It is built and maintained by heijiaziopenclaw (@heijiaziopenclaw); the current version is v1.1.0.

💬 Comments