← Back to Skills Marketplace
122
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install turboquant-plus
Description
TurboQuant+ compresses llama.cpp KV caches on Apple Silicon up to 6.4x with minimal quality loss, enabling larger models and longer contexts efficiently.
Usage Guidance
This skill appears coherent for configuring TurboQuant+ with llama.cpp, but follow these precautions before proceeding: 1) Verify the external GitHub fork (TheTom/llama-cpp-turboquant) is the intended project and review its source/commit history before building. 2) Build and run the code in an isolated or trusted environment (container, dedicated machine) if possible. 3) Be cautious with the suggested sudo sysctl change (iogpu.wired_limit_mb): it requires elevated privileges and changes system GPU memory limits until reboot—backup important state and understand the impact. 4) Prefer official releases/tags rather than an unknown commit/branch. 5) Check checksums/signatures for any downloaded model files. If you are uncomfortable reviewing or building third-party native code, treat this as an operational risk and avoid running the build on production systems.
Capability Analysis
Type: OpenClaw Skill
Name: turboquant-plus
Version: 1.0.0
The skill bundle instructs the agent to perform high-risk operations, including cloning and compiling an external GitHub repository (TheTom/llama-cpp-turboquant) and executing a system-level command with elevated privileges (sudo sysctl iogpu.wired_limit_mb) to modify GPU memory limits. Furthermore, the documentation contains likely fabricated references, such as an 'ICLR 2026' paper and a 2026 timestamp in _meta.json. While these actions are contextually related to LLM optimization, the combination of sudo requirements, unverified third-party code execution, and hallucinated references constitutes a significant security risk.
Capability Assessment
Purpose & Capability
Name/description claim KV cache compression for llama.cpp on Apple Silicon; the SKILL.md and README exclusively describe using a TurboQuant llama.cpp fork, relevant CLI flags, and platform-specific tuning. No unrelated credentials, binaries, or services are requested.
Instruction Scope
Instructions stay on-topic (clone/build the turboquant fork, run llama-server with cache-type flags). They also recommend a system-level change (sudo sysctl iogpu.wired_limit_mb) to raise GPU memory caps for large contexts — this is relevant to the stated goal but requires elevated privileges and modifies system state. No instructions collect or transmit user data to unexpected endpoints.
Install Mechanism
The skill is instruction-only (no install spec), but its README instructs cloning and building a GitHub repository (TheTom/llama-cpp-turboquant). Downloading and compiling third-party code from GitHub is common for this domain but is a moderate operational risk if the repository is untrusted or has malicious contents. The skill itself does not provide an automated installer or opaque download URLs.
Credentials
No environment variables, credentials, or config paths are requested. The requested actions (build/run a local server, sysctl) are proportionate to compressing KV caches for local inference.
Persistence & Privilege
Skill does not request persistent inclusion (always: false) and does not attempt to modify other skills or agent-wide configs. It does recommend a one-off privileged sysctl change (requires sudo) which alters system GPU memory limits until reboot; this is a legitimate but privileged action and not an automatic persistent installation by the skill.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install turboquant-plus - After installation, invoke the skill by name or use
/turboquant-plus - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
v1.0: TurboQuant+ KV缓存压缩指南,支持Apple Silicon本地LLM推理
Metadata
Frequently Asked Questions
What is TurboQuant+ KV Cache Compression?
TurboQuant+ compresses llama.cpp KV caches on Apple Silicon up to 6.4x with minimal quality loss, enabling larger models and longer contexts efficiently. It is an AI Agent Skill for Claude Code / OpenClaw, with 122 downloads so far.
How do I install TurboQuant+ KV Cache Compression?
Run "/install turboquant-plus" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is TurboQuant+ KV Cache Compression free?
Yes, TurboQuant+ KV Cache Compression is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does TurboQuant+ KV Cache Compression support?
TurboQuant+ KV Cache Compression is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created TurboQuant+ KV Cache Compression?
It is built and maintained by wukai8289 (@wukai8289); the current version is v1.0.0.
More Skills