← Back to Skills Marketplace

Box-KVCache

Name: Box-KVCache
Author: heijiaziopenclaw

by heijiaziopenclaw · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install box-kvcache

Description

Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference.

README (SKILL.md)

box-kvcache

描述

本地大模型 KV Cache 压缩工具箱 — 基于低秩分解 + INT8 量化原理，帮助你在同等显存下跑更长的上下文、更高的并发。

适用于 Ollama、LocalAI、Text Generation WebUI 等本地 LLM 推理框架。

⚠️ 系统要求：Windows 10+ | Linux/macOS（需 Ollama）| Python 3.8+

核心功能

检测本地 LLM 环境 — 自动识别 Ollama/llama.cpp
估算 KV Cache 占用 — 量化当前上下文大小
低秩分解压缩 — 使用 SVD/PCA 降低 KV 维度
INT8 量化 — 有损压缩到 8bit，省 2-4x 显存
一键启动压缩模式 — 改 Ollama 启动参数启用缓存压缩

系统要求

要求	详情
运行时	Ollama ≥ 0.1.0 或 llama.cpp
Python	3.8+
依赖	numpy, scipy
系统工具	PowerShell (Windows), bash (Linux/macOS)
可选	nvidia-smi (用于查看 GPU 显存)

安装依赖

pip install numpy scipy

安装 Ollama

# Windows/macOS/Linux
# 详见 https://ollama.com/download

工作原理

原始 KV Cache (float32) → 低秩分解 → 压缩表示 → INT8量化
     ↓                                        ↓
16GB 显存占用                          ~4-6GB 显存占用
     ↓                                        ↓
     └──────────── 推理结束后还原 ────────────┘

脚本列表

脚本	用途
`check_env.py`	检测本地 LLM 环境（Ollama llama.cpp）
`quantize_kv.py`	KV Cache INT8 量化工具
`lowrank_compress.py`	低秩分解压缩工具
`launch_compressed.py`	带压缩参数启动 Ollama

使用方法

步骤1：检测环境

python scripts/check_env.py

步骤2：查看当前显存占用

python scripts/check_env.py --verbose

步骤3：启动压缩模式

python scripts/launch_compressed.py --model llama3 --context 8192 --compress

技术细节

低秩分解：SVD 截断奇异值，保留核心维度
INT8 量化：对称量化（scale factor）
压缩比：约 2-4x（有损，但精度损失 \x3C2%）
适用场景：长上下文聊天、批量推理、显存不足

限制

纯软件方案，效果因模型而异
不是 Google TurboQuant（那是另一套实现）
Windows 脚本主要测试过；Linux/macOS 使用 bash

环境变量

变量	说明
`OLLAMA_HOST`	Ollama 服务地址（默认 127.0.0.1:11434）
`OLLAMA_MODELS`	模型存放路径
`OLLAMA_KEEP_ALIVE`	模型保留时间

作者

黑匣子 @ 主人项目

Last updated: 2026-04-06

Usage Guidance

This package appears to implement the described KV-cache compression algorithms and helper scripts, but review before running: - Inspect scripts locally (they are included) and run them in a sandbox or non-production environment first. - Note platform bias: many checks use Windows commands; Linux/macOS behavior may be limited. Test on your target OS. - Be aware scripts invoke shell commands (subprocess with shell=True in run_cmd). While current commands are internal, avoid running with elevated privileges and avoid passing untrusted input into those helpers. - The README/SKILL.md mention OLLAMA_* env vars but the scripts do not read them — if you depend on custom Ollama host/settings verify the tools actually honor them. - The tool will start/launch local Ollama processes; confirm your Ollama installation and model binaries are from trusted sources and you are comfortable running local services. If you want higher confidence, ask the author for: (1) explicit support matrix for Linux/macOS, (2) clarification whether OLLAMA_* env vars are used and how, and (3) a non-Windows command-path implementation for environment detection.

Capability Analysis

Type: OpenClaw Skill Name: box-kvcache Version: 1.1.0 The skill bundle contains several security vulnerabilities that could be exploited, although no clear malicious intent was found. Specifically, 'scripts/check_env.py' uses 'subprocess.run(shell=True)' to execute system commands and PowerShell scripts, which is susceptible to shell injection. Additionally, 'scripts/lowrank_compress.py' utilizes 'np.load' with 'allow_pickle=True', a known high-risk practice that can lead to arbitrary code execution if a user is tricked into loading a crafted malicious data file. While these functions are used for environment detection and data persistence as described, they represent significant security flaws.

Capability Assessment

ℹ Purpose & Capability

Name/description match the included code: the scripts implement low-rank SVD compression and INT8 quantization for KV caches and helpers to detect/run Ollama/llama.cpp. However, the SKILL.md claims cross-platform support (Windows, Linux, macOS) while the scripts are largely Windows-biased (use 'tasklist | findstr', 'where', PowerShell fallbacks). The SKILL.md also documents OLLAMA_* environment variables as useful, but none are required in the registry metadata and the scripts do not actually read OLLAMA_HOST / OLLAMA_MODELS / OLLAMA_KEEP_ALIVE — this is an internal inconsistency.

ℹ Instruction Scope

Runtime instructions and scripts stay within the stated purpose (environment detection, compression, quantization, and launching Ollama). They run local subprocess commands (ollama, nvidia-smi, pip, systeminfo/tasklist/where) and perform on-disk saves/loads of compressed arrays. A few minor issues: several commands use shell=True in run_cmd (which can be risky if later passed untrusted input), and some Windows-only commands are used despite cross-platform claims. There is no evidence the scripts attempt to read unrelated credentials or exfiltrate data to remote endpoints.

✓ Install Mechanism

No install specification is provided (instruction-only in registry), and all code is included in the bundle. Nothing is downloaded from external URLs during installation. This limits supply-chain risk compared with remote downloads.

ℹ Credentials

The skill declares no required environment variables or credentials in registry metadata (good). SKILL.md documents optional OLLAMA_* variables but they are informational only — the scripts do not read those variables. No secrets or unrelated credentials are requested. This mismatch (documented env vars vs actual usage) is inconsistent but not directly dangerous.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system-wide configurations. It can start/launch an Ollama local service (calls 'ollama serve' and runs 'ollama run'), which is expected for this functionality but means it will start local processes if you run it.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install box-kvcache
After installation, invoke the skill by name or use /box-kvcache
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.0

box-kvcache 1.1.0 - 增加系统要求说明，包括操作系统、Ollama 版本、Python 依赖等。 - 新增“安装依赖”、“安装 Ollama”及“系统要求”详细说明。 - 增补 Windows/Linux/macOS 脚本与依赖工具要求。 - 新增常用环境变量说明表格。 - 明确 Windows、Linux/macOS 支持情况及注意事项。

v1.0.0

box-kvcache 1.0.0 - 首发本地大模型 KV Cache 压缩工具箱，支持 Ollama、LocalAI、Text Generation WebUI 等框架 - 提供环境自动检测、KV Cache 占用估算、低秩分解和 INT8 量化压缩 - 支持一键启动压缩模式，减少显存占用 2–4 倍 - 附带脚本用于检测环境、量化和低秩分解等操作 - 需求：Python 3.8+，numpy，scipy

Metadata

Slug box-kvcache

Version 1.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Box-KVCache?

Local KV Cache compression for LLMs using low-rank decomposition and INT8 quantization to reduce GPU memory by 2-4x during inference. It is an AI Agent Skill for Claude Code / OpenClaw, with 85 downloads so far.

How do I install Box-KVCache?

Run "/install box-kvcache" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Box-KVCache free?

Yes, Box-KVCache is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Box-KVCache support?

Box-KVCache is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Box-KVCache?

It is built and maintained by heijiaziopenclaw (@heijiaziopenclaw); the current version is v1.1.0.

More Skills