← Back to Skills Marketplace

TurboQuant+ KV Cache Compression

Name: TurboQuant+ KV Cache Compression
Author: wukai8289

by wukai8289 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

122

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install turboquant-plus

Description

TurboQuant+ compresses llama.cpp KV caches on Apple Silicon up to 6.4x with minimal quality loss, enabling larger models and longer contexts efficiently.

Usage Guidance

This skill appears coherent for configuring TurboQuant+ with llama.cpp, but follow these precautions before proceeding: 1) Verify the external GitHub fork (TheTom/llama-cpp-turboquant) is the intended project and review its source/commit history before building. 2) Build and run the code in an isolated or trusted environment (container, dedicated machine) if possible. 3) Be cautious with the suggested sudo sysctl change (iogpu.wired_limit_mb): it requires elevated privileges and changes system GPU memory limits until reboot—backup important state and understand the impact. 4) Prefer official releases/tags rather than an unknown commit/branch. 5) Check checksums/signatures for any downloaded model files. If you are uncomfortable reviewing or building third-party native code, treat this as an operational risk and avoid running the build on production systems.

Capability Analysis

Type: OpenClaw Skill Name: turboquant-plus Version: 1.0.0 The skill bundle instructs the agent to perform high-risk operations, including cloning and compiling an external GitHub repository (TheTom/llama-cpp-turboquant) and executing a system-level command with elevated privileges (sudo sysctl iogpu.wired_limit_mb) to modify GPU memory limits. Furthermore, the documentation contains likely fabricated references, such as an 'ICLR 2026' paper and a 2026 timestamp in _meta.json. While these actions are contextually related to LLM optimization, the combination of sudo requirements, unverified third-party code execution, and hallucinated references constitutes a significant security risk.

Capability Assessment

✓ Purpose & Capability

Name/description claim KV cache compression for llama.cpp on Apple Silicon; the SKILL.md and README exclusively describe using a TurboQuant llama.cpp fork, relevant CLI flags, and platform-specific tuning. No unrelated credentials, binaries, or services are requested.

ℹ Instruction Scope

Instructions stay on-topic (clone/build the turboquant fork, run llama-server with cache-type flags). They also recommend a system-level change (sudo sysctl iogpu.wired_limit_mb) to raise GPU memory caps for large contexts — this is relevant to the stated goal but requires elevated privileges and modifies system state. No instructions collect or transmit user data to unexpected endpoints.

ℹ Install Mechanism

The skill is instruction-only (no install spec), but its README instructs cloning and building a GitHub repository (TheTom/llama-cpp-turboquant). Downloading and compiling third-party code from GitHub is common for this domain but is a moderate operational risk if the repository is untrusted or has malicious contents. The skill itself does not provide an automated installer or opaque download URLs.

✓ Credentials

No environment variables, credentials, or config paths are requested. The requested actions (build/run a local server, sysctl) are proportionate to compressing KV caches for local inference.

✓ Persistence & Privilege

Skill does not request persistent inclusion (always: false) and does not attempt to modify other skills or agent-wide configs. It does recommend a one-off privileged sysctl change (requires sudo) which alters system GPU memory limits until reboot; this is a legitimate but privileged action and not an automatic persistent installation by the skill.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install turboquant-plus
After installation, invoke the skill by name or use /turboquant-plus
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

v1.0: TurboQuant+ KV缓存压缩指南，支持Apple Silicon本地LLM推理

Metadata

Slug turboquant-plus

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is TurboQuant+ KV Cache Compression?

TurboQuant+ compresses llama.cpp KV caches on Apple Silicon up to 6.4x with minimal quality loss, enabling larger models and longer contexts efficiently. It is an AI Agent Skill for Claude Code / OpenClaw, with 122 downloads so far.

How do I install TurboQuant+ KV Cache Compression?

Run "/install turboquant-plus" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is TurboQuant+ KV Cache Compression free?

Yes, TurboQuant+ KV Cache Compression is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does TurboQuant+ KV Cache Compression support?

TurboQuant+ KV Cache Compression is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created TurboQuant+ KV Cache Compression?

It is built and maintained by wukai8289 (@wukai8289); the current version is v1.0.0.

More Skills