/install llamacpp-bench
llamacpp-bench
Run standardized benchmarks on GGUF models using llama.cpp's llama-bench tool.
Quick Start
# Basic benchmark
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99
# With specific backend
LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99
Benchmark Parameters
| Parameter | Description | Default |
|---|---|---|
-m |
Model path (GGUF file) | required |
-p |
Prompt sizes to test | 512 |
-n |
Generation lengths to test | 128 |
-ngl |
GPU layers to offload | 99 |
-t |
CPU threads | auto |
-dev |
Device selection | auto |
Standard Test Suite
For consistent comparisons across models, use:
-p 512,1024,2048 -n 128,256 -ngl 99
This tests:
- Prompt processing: 512, 1024, 2048 tokens
- Token generation: 128, 256 tokens
Interpreting Results
| Metric | Meaning | Good Performance |
|---|---|---|
pp512 |
Prompt processing speed at 512 tokens | >1000 t/s |
pp1024 |
Prompt processing speed at 1024 tokens | >1000 t/s |
pp2048 |
Prompt processing speed at 2048 tokens | >1000 t/s |
tg128 |
Token generation speed (128 tokens) | >50 t/s |
tg256 |
Token generation speed (256 tokens) | >50 t/s |
Backend Selection
llama-bench auto-detects available backends. Priority order:
- CUDA (NVIDIA GPUs)
- ROCm (AMD GPUs)
- Vulkan (cross-platform GPU)
- CPU (fallback)
To force a backend, set environment variable or check build:
# Check available backends
llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan"
Batch Benchmarking
Use the provided script for benchmarking multiple models:
./scripts/benchmark_models.sh /path/to/models/*.gguf
Saving Results
Output can be redirected to a file:
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt
Or use the benchmark script which auto-saves to timestamped files.
Common Issues
- Out of memory: Reduce
-ngl(GPU layers) or test smaller prompt sizes - Slow CPU performance: Ensure
-tmatches CPU core count - Backend not found: Check llama.cpp was built with the desired backend
Building / Updating llama.cpp
Check Current Version
./scripts/build_llamacpp.sh -v
Shows:
- Current Git commit and branch
- Build date
- Whether behind upstream
- Available backends
Build or Update
# Interactive mode (prompts for backend selection)
./scripts/build_llamacpp.sh -u
# Specify backend directly
./scripts/build_llamacpp.sh -u -b vulkan # Vulkan (AMD/Intel GPUs)
./scripts/build_llamacpp.sh -u -b cuda # CUDA (NVIDIA GPUs)
./scripts/build_llamacpp.sh -u -b rocm # ROCm (AMD GPUs)
./scripts/build_llamacpp.sh -u -b cpu # CPU only
# Clean rebuild
./scripts/build_llamacpp.sh -c -b vulkan
# Custom build directory
./scripts/build_llamacpp.sh -u -b cuda -d /custom/path
Build Options
| Flag | Description |
|---|---|
-v |
Show version info and exit |
-u |
Update to latest from GitHub |
-c |
Clean build (remove existing) |
-b |
Backend: vulkan, cuda, rocm, cpu |
-d |
Build directory path |
-j |
Parallel jobs (default: CPU count) |
Finding llama-bench
The benchmark script auto-detects llama-bench in these locations:
/DATA/Benchmark/llama.cpp/build/bin/llama-bench~/Repo/llama.cpp/build/bin/llama-bench~/lab/build/bin/llama-bench
If not found, it will search your home directory or you can build it using the script above.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install llamacpp-bench - 安装完成后,直接呼叫该 Skill 的名称或使用
/llamacpp-bench触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
llama.cpp Benchmark 是什么?
Run llama.cpp benchmarks on GGUF models to measure prompt processing (pp) and token generation (tg) performance. Use when the user wants to benchmark LLM mod... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 119 次。
如何安装 llama.cpp Benchmark?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install llamacpp-bench」即可一键安装,无需额外配置。
llama.cpp Benchmark 是免费的吗?
是的,llama.cpp Benchmark 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
llama.cpp Benchmark 支持哪些平台?
llama.cpp Benchmark 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 llama.cpp Benchmark?
由 alexhegit(@alexhegit)开发并维护,当前版本 v1.0.0。