/install llamacpp-bench
llamacpp-bench
Run standardized benchmarks on GGUF models using llama.cpp's llama-bench tool.
Quick Start
# Basic benchmark
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99
# With specific backend
LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99
Benchmark Parameters
| Parameter | Description | Default |
|---|---|---|
-m |
Model path (GGUF file) | required |
-p |
Prompt sizes to test | 512 |
-n |
Generation lengths to test | 128 |
-ngl |
GPU layers to offload | 99 |
-t |
CPU threads | auto |
-dev |
Device selection | auto |
Standard Test Suite
For consistent comparisons across models, use:
-p 512,1024,2048 -n 128,256 -ngl 99
This tests:
- Prompt processing: 512, 1024, 2048 tokens
- Token generation: 128, 256 tokens
Interpreting Results
| Metric | Meaning | Good Performance |
|---|---|---|
pp512 |
Prompt processing speed at 512 tokens | >1000 t/s |
pp1024 |
Prompt processing speed at 1024 tokens | >1000 t/s |
pp2048 |
Prompt processing speed at 2048 tokens | >1000 t/s |
tg128 |
Token generation speed (128 tokens) | >50 t/s |
tg256 |
Token generation speed (256 tokens) | >50 t/s |
Backend Selection
llama-bench auto-detects available backends. Priority order:
- CUDA (NVIDIA GPUs)
- ROCm (AMD GPUs)
- Vulkan (cross-platform GPU)
- CPU (fallback)
To force a backend, set environment variable or check build:
# Check available backends
llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan"
Batch Benchmarking
Use the provided script for benchmarking multiple models:
./scripts/benchmark_models.sh /path/to/models/*.gguf
Saving Results
Output can be redirected to a file:
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt
Or use the benchmark script which auto-saves to timestamped files.
Common Issues
- Out of memory: Reduce
-ngl(GPU layers) or test smaller prompt sizes - Slow CPU performance: Ensure
-tmatches CPU core count - Backend not found: Check llama.cpp was built with the desired backend
Building / Updating llama.cpp
Check Current Version
./scripts/build_llamacpp.sh -v
Shows:
- Current Git commit and branch
- Build date
- Whether behind upstream
- Available backends
Build or Update
# Interactive mode (prompts for backend selection)
./scripts/build_llamacpp.sh -u
# Specify backend directly
./scripts/build_llamacpp.sh -u -b vulkan # Vulkan (AMD/Intel GPUs)
./scripts/build_llamacpp.sh -u -b cuda # CUDA (NVIDIA GPUs)
./scripts/build_llamacpp.sh -u -b rocm # ROCm (AMD GPUs)
./scripts/build_llamacpp.sh -u -b cpu # CPU only
# Clean rebuild
./scripts/build_llamacpp.sh -c -b vulkan
# Custom build directory
./scripts/build_llamacpp.sh -u -b cuda -d /custom/path
Build Options
| Flag | Description |
|---|---|
-v |
Show version info and exit |
-u |
Update to latest from GitHub |
-c |
Clean build (remove existing) |
-b |
Backend: vulkan, cuda, rocm, cpu |
-d |
Build directory path |
-j |
Parallel jobs (default: CPU count) |
Finding llama-bench
The benchmark script auto-detects llama-bench in these locations:
/DATA/Benchmark/llama.cpp/build/bin/llama-bench~/Repo/llama.cpp/build/bin/llama-bench~/lab/build/bin/llama-bench
If not found, it will search your home directory or you can build it using the script above.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install llamacpp-bench - After installation, invoke the skill by name or use
/llamacpp-bench - Provide required inputs per the skill's parameter spec and get structured output
What is llama.cpp Benchmark?
Run llama.cpp benchmarks on GGUF models to measure prompt processing (pp) and token generation (tg) performance. Use when the user wants to benchmark LLM mod... It is an AI Agent Skill for Claude Code / OpenClaw, with 119 downloads so far.
How do I install llama.cpp Benchmark?
Run "/install llamacpp-bench" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is llama.cpp Benchmark free?
Yes, llama.cpp Benchmark is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does llama.cpp Benchmark support?
llama.cpp Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created llama.cpp Benchmark?
It is built and maintained by alexhegit (@alexhegit); the current version is v1.0.0.