Description

Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpret...

README (SKILL.md)

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.

Thinking mode: Use ultrathink for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.

Go Benchmarking & Performance Measurement

Name: Golang Benchmark
Author: samber

Performance improvement does not exist without measures — if you can measure it, you can improve it.

This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See samber/cc-skills-golang@golang-performance skill. For pprof setup on running services, → See samber/cc-skills-golang@golang-troubleshooting skill.

Writing Benchmarks

`b.Loop()` (Go 1.24+) — preferred

b.Loop() prevents the compiler from optimizing away the code under test — without it, the compiler can detect dead results and eliminate them, producing misleadingly fast numbers. It also excludes setup code before the loop from timing automatically.

func BenchmarkParse(b *testing.B) {
    data := loadFixture("large.json") // setup — excluded from timing
    for b.Loop() {
        Parse(data)  // compiler cannot eliminate this call
    }
}

Existing for range b.N benchmarks still work but should migrate to b.Loop() — the old pattern requires manual b.ResetTimer() and a package-level sink variable to prevent dead code elimination.

Memory tracking

func BenchmarkAlloc(b *testing.B) {
    b.ReportAllocs() // or run with -benchmem flag
    for b.Loop() {
        _ = make([]byte, 1024)
    }
}

b.ReportMetric() adds custom metrics (e.g., throughput):

b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")

Sub-benchmarks and table-driven

func BenchmarkEncode(b *testing.B) {
    for _, size := range []int{64, 256, 4096} {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            data := make([]byte, size)
            for b.Loop() {
                Encode(data)
            }
        })
    }
}

Running Benchmarks

go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt

Flag	Purpose
`-bench=.`	Run all benchmarks (regexp filter)
`-benchmem`	Report allocations (B/op, allocs/op)
`-count=10`	Run 10 times for statistical significance
`-benchtime=3s`	Minimum time per benchmark (default 1s)
`-cpu=1,2,4`	Run with different GOMAXPROCS values
`-cpuprofile=cpu.prof`	Write CPU profile
`-memprofile=mem.prof`	Write memory profile
`-trace=trace.out`	Write execution trace

Output format: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — the -8 suffix is GOMAXPROCS, ns/op is time per operation, B/op is bytes allocated per op, allocs/op is heap allocation count per op.

Documenting Results in Commits

Paste benchstat output in the commit body when the change has a measurable performance impact. This documents why an optimization was made, prevents future readers from reverting it, and lets reviewers verify the claim without re-running benchmarks.

Commit format:

perf(parser): reduce Parse allocations 50% with sync.Pool

Replace per-call []byte allocation with a pooled buffer.

goos: linux / goarch: amd64 / cpu: AMD Ryzen 9 5950X
          │    old     │              new               │
          │  sec/op    │  sec/op     vs base            │
Parse-32    4.592µ ± 2%  3.041µ ± 1%  -33.78% (p=0.000 n=10)

          │   old    │             new              │
          │   B/op   │   B/op     vs base           │
Parse-32   1.024Ki ± 0%  0.512Ki ± 0%  -50.00% (p=0.000 n=10)

          │ old  │            new             │
          │ allocs/op │ allocs/op  vs base    │
Parse-32   12.00 ± 0%   6.000 ± 0%  -50.00% (p=0.000 n=10)

Rules:

Only include benchmarks directly affected by the change — strip unrelated rows
Never paste results with ~ (no statistical significance) — the improvement cannot be claimed
Include the hardware context line (goos/goarch/cpu) so results are reproducible
Use perf(scope): commit type for performance-only changes

Profiling from Benchmarks

Generate profiles directly from benchmark runs — no HTTP server needed:

# CPU profile
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof

# Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof

# Execution trace
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out

For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see pprof Reference. For execution trace interpretation, see Trace Reference. For statistical comparison, see benchstat Reference.

Reference Files

pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs allocobjects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into _where time and memory are being spent in your code.
benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.
Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where CPU goes) isn't enough — you need to see the timeline of what happened.
Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.
Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.
CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.
Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.
Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by prometheus/client_golang. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between runtime/metrics (Go internal data) and Prometheus metrics (what you scrape from /metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.

Cross-References

→ See samber/cc-skills-golang@golang-performance skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y")
→ See samber/cc-skills-golang@golang-troubleshooting skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodology
→ See samber/cc-skills-golang@golang-observability skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)
→ See samber/cc-skills-golang@golang-testing skill for general testing practices
→ See samber/cc-skills@promql-cli skill for querying Prometheus runtime metrics in production to validate benchmark findings

Usage Guidance

This skill appears to be what it claims: a comprehensive Go benchmarking and profiling guide that installs benchstat. Before using it, consider: (1) The SKILL.md contains high-impact system commands (sudo writes to /sys, disabling turbo, pinning CPUs) — run those only on dedicated CI/runners, not on your laptop or shared servers. (2) Enabling pprof/continuous profiling or trace collection in production can expose internal stacks/data and add overhead — secure endpoints and limit scope. (3) Some referenced CI tools (cob, benchdiff) use git operations like git reset — commit/stash local work before running. (4) The install uses `go install` from an official golang.org path (benchstat) — verify you trust the upstream repo if you require reproducible binaries. If you plan to run the setup commands, ensure you have proper access (sudo, kubectl) and a safe target environment.

Capability Analysis

Type: OpenClaw Skill Name: golang-benchmark Version: 1.1.1 The golang-benchmark skill bundle is a highly technical and well-documented resource for Go performance engineering. It provides comprehensive instructions for writing benchmarks (including Go 1.24+ b.Loop patterns), running profiles (pprof, trace), and performing statistical analysis with benchstat. The bundle includes extensive reference materials and a robust evaluation suite (evals.json) covering 80 specific performance scenarios. While it requests powerful tools like Bash and WebFetch, their use is strictly aligned with the stated purpose of executing Go tests and fetching profiling data from local endpoints. No evidence of malicious intent, data exfiltration, or harmful prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description, required binaries (go, benchstat), references, and install step (go install golang.org/x/perf/cmd/benchstat) all match the declared purpose of Go benchmarking and analysis.

ℹ Instruction Scope

SKILL.md is large and focused on benchmark writing, pprof, benchstat and CI workflows — all within purpose. It also includes system-tuning commands (writing to /sys/devices/... to change CPU governors, disabling turbo boost, pinning CPUs), kubectl env changes to enable pprof/continuous profiling, and guidance to enable profiling endpoints. Those are relevant to large-scale benchmarking/investigation but are high-impact operations; the file includes warnings to apply them only to dedicated CI/diagnostic hosts.

✓ Install Mechanism

Install uses `go install` from golang.org/x/perf to produce the benchstat binary — an expected and standard install mechanism for Go tooling. No arbitrary URL downloads, extract steps, or unknown hosts were used.

✓ Credentials

The skill requests no environment variables, no credentials, and no config paths. All commands in the instructions assume developer/CI access (kubectl, sudo) but do not demand unrelated secrets or tokens in the skill metadata.

✓ Persistence & Privilege

Skill is user-invocable, not always-enabled, and does not request persistent elevated privileges or modify other skills. Allowed tools include networking/search utilities appropriate for an instruction skill.

Version History

v1.1.1

- Bumped version from 1.1.0 to 1.1.1 in metadata. - No functional or content changes outside of the version update.

v1.1.0

golang-benchmark 1.1.0 - Expanded and clarified documentation on Go benchmarking, profiling, and performance measurement. - Added detailed usage examples for writing and running benchmarks, including new b.Loop() pattern for Go 1.24+. - Profile collection and analysis instructions consolidated, covering CPU, memory, and execution tracing. - Descriptions of reference materials now include recommended tools for advanced diagnosis and CI regression detection. - Improved guidance on statistical rigor, analysis methodology, and integration points with related skills.

Metadata

Slug golang-benchmark

Version 1.1.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Golang Benchmark?

Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpret... It is an AI Agent Skill for Claude Code / OpenClaw, with 178 downloads so far.

How do I install Golang Benchmark?

Run "/install golang-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Golang Benchmark free?

Yes, Golang Benchmark is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Golang Benchmark support?

Golang Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Golang Benchmark?

It is built and maintained by Samuel Berthe (@samber); the current version is v1.1.1.

More Skills

Golang Benchmark