Performance Analysis
Chapter 14: Linux Performance Analysis
Performance analysis is systematic science, not guesswork. From USE/RED methodology through top/vmstat/iostat basics, strace syscall tracing, perf hardware counters, eBPF/bpftrace kernel-level observability, and FlameGraph visualization โ this chapter builds a complete Linux performance troubleshooting toolkit, anchored by a full "CPU 100%" investigation case study.
1. Performance Analysis Methodology
USE Model
Proposed by Brendan Gregg: for every resource (CPU, memory, disk, network, bus) check three dimensions:
- U โ Utilization๏ผๅฉ็จ็๏ผ๏ผpercentage of time the resource is busy (CPU 80% = 80% of time doing work)
- S โ Saturation๏ผ้ฅฑๅๅบฆ๏ผ๏ผdegree of overload (run queue length, swap usage, disk await time)
- E โ Errors๏ผ้่ฏฏๆฐ๏ผ๏ผerror event count (NIC drops, disk errors, memory ECC corrections)
RED Model
Designed for request-oriented microservices, proposed by Tom Wilkie. For each service check:
- R โ Rate๏ผ่ฏทๆฑ้็๏ผ๏ผrequests per second (QPS/RPS)
- E โ Errors๏ผ้่ฏฏ็๏ผ๏ผfailed requests per second (5xx count)
- D โ Duration๏ผๅปถ่ฟ๏ผ๏ผrequest processing time distribution (p50/p95/p99)
60-Second Troubleshooting Checklist
uptime # ่ด่ฝฝๅๅผ๏ผๅคๆญ่ถๅฟ๏ผ1/5/15ๅ้๏ผ
dmesg | tail -20 # ๅ
ๆ ธๆ่ฟ้่ฏฏไฟกๆฏ
vmstat 1 5 # ๆฏ็ง้ๆ ท5ๆฌก๏ผ็CPU/ๅ
ๅญ/IO
mpstat -P ALL 1 3 # ๆฏ้ขCPUๅฉ็จ็
pidstat 1 5 # ๅ่ฟ็จCPUไฝฟ็จๆ
ๅต
iostat -xz 1 3 # ็ฃ็IOๅฉ็จ็ไธ็ญๅพ
ๆถ้ด
free -h # ๅ
ๅญไธswapไฝฟ็จ
sar -n DEV 1 3 # ็ฝ็ปๆฅๅฃๆต้
sar -n TCP,ETCP 1 3 # TCP่ฟๆฅ็ป่ฎก
top # ๆปไฝๆฆ่ง๏ผๆCPU/ๅ
ๅญๆๅบ
2. Core Performance Tools
top / htop
top header CPU line field meanings:
| Field | Meaning | Warning threshold |
|---|---|---|
| %us | User-space CPU (app code) | >70% |
| %sy | Kernel-space CPU (syscalls) | >20% |
| %ni | Niced process CPU | โ |
| %id | Idle CPU | 5% |
| %hi | Hardware interrupt handling (driver layer) | >5% |
| %si | Software interrupt (network RX, timers) | >5% |
| %st | CPU stolen by hypervisor (steal time) | >5% |
vmstat
vmstat 1 10 # ๆฏ็ง้ๆ ท๏ผๅ
ฑ10ๆฌก
# ่พๅบๅ่ฏดๆ๏ผ
# procs: r๏ผ่ฟ่ก้ๅ้ฟๅบฆ๏ผb๏ผ้ปๅกไบไธๅฏไธญๆญ็ก็ ็่ฟ็จๆฐ๏ผ
# memory: swpd free buff cache๏ผๅไฝKB๏ผ
# swap: si๏ผswap in KB/s๏ผso๏ผswap out KB/s๏ผ
# io: bi๏ผ็ฃ็่ฏป blocks/s๏ผbo๏ผ็ฃ็ๅ blocks/s๏ผ
# system: in๏ผไธญๆญ/s๏ผcs๏ผไธไธๆๅๆข/s๏ผ
# cpu: us sy id wa st
# ๅ
ณ้ฎไฟกๅท๏ผ
# r ๆ็ปญ > CPU ๆ ธๆฐ โ CPU ้ฅฑๅ
# so > 0 ๆ็ปญ โ ๅ
ๅญไธ่ถณๅจๆข้กต
# cs ๆ้ซ โ ่ฟๅค็บฟ็จๅๆขๆ็ณป็ป่ฐ็จ
iostat -dx
iostat -dx 1 5 # ๆฉๅฑ็ฃ็็ป่ฎก๏ผๆฏ็ง้ๆ ท
# ๅ
ณ้ฎๆๆ ๏ผ
# r/s w/s ่ฏปๅ่ฏทๆฑๆฐ/็ง
# rkB/s wkB/s ่ฏปๅๅๅ้ KB/s
# await ๅนณๅIO็ญๅพ
ๆถ้ด๏ผms๏ผโโ HDDๆญฃๅธธ 0 ๆ็ปญๅข้ฟ โ ๅ
ๅญไธ่ถณ๏ผๆง่ฝไผๅคงๅน
ไธ้
sar โ Historical data replay
# sar ๆฐๆฎ็ฑ sysstat ๆๅกๆฏ10ๅ้ๆถ้ไธๆฌก๏ผๅญไบ /var/log/sa/
# ๆฅ็ๆจๅคฉ็ CPU ๅฉ็จ็
sar -u -f /var/log/sa/sa$(date -d yesterday +%d)
# ๆฅ็ไปๅคฉ็็ฃ็IO
sar -d 1 5
# ๆฅ็็ฝ็ปๆฅๅฃๆต้
sar -n DEV 1 5
# ๆฅ็่ฟ่ก้ๅไธ่ด่ฝฝ
sar -q 1 5
# ๆฅ็ๅ
ๅญ้กตๆขๅ
ฅๆขๅบ
sar -B 1 5
3. strace: Syscall Tracing
strace intercepts and records all syscalls of a target process using ptrace, making it ideal for debugging "stuck processes", "unexpected file access", and "permission errors". Warning: strace significantly degrades the target process โ use with caution in production.
# ่ท่ธชๆฐๅฏๅจ็่ฟ็จ
strace ls /tmp
# ้ๅ ๅฐ่ฟ่กไธญ็่ฟ็จ
strace -p 1234
# ๅธฆๆถ้ดๆณๅ่ๆถ๏ผ-tt ๅพฎ็งๆถ้ด๏ผ-T ๆฏไธช่ฐ็จ่ๆถ๏ผ
strace -tt -T -p 1234
# ๅชๅ
ณๆณจ็นๅฎ็ณป็ป่ฐ็จ๏ผๆไปถ่ฎฟ้ฎ็ฑป๏ผ
strace -e trace=openat,read,write,close -p 1234
# ็ป่ฎกๆจกๅผ๏ผๆพ็คบๆฏไธช่ฐ็จ็ๆฌกๆฐๅๆป่ๆถ๏ผๆๆ็จ๏ผ
strace -c -p 1234
# ่พๅบ็คบไพ๏ผ
# % time seconds usecs/call calls errors syscall
# 45.23 0.001234 12 100 0 epoll_wait
# 23.10 0.000631 6 100 0 read
# ่ท่ธชๅญ่ฟ็จ
strace -f -p 1234
# ๅธธ่ง้ฎ้ข่ฏๆญ๏ผ
# ่ฟ็จๅกไฝ็ๅฐ futex(...) โ ้็ซไบ
# ่ฟ็จๅกไฝ็ๅฐ epoll_wait โ ๆญฃๅธธ็ญๅพ
IOไบไปถ
# ๅคง้ stat() ่ฐ็จ โ ่ทฏๅพๆฅๆพๆ
ข๏ผๆฃๆฅ inode cache
4. ltrace: Library Call Tracing
# ่ท่ธชๅจๆๅบๅฝๆฐ่ฐ็จ๏ผ็จๆทๆ๏ผไธ่ฟๅ
ฅๅ
ๆ ธ๏ผ
ltrace ./myapp
# ้ๅ ๅฐ่ฟ่กไธญ่ฟ็จ
ltrace -p 1234
# ๅช่ท่ธช malloc/free ๅ
ๅญๅ้
ltrace -e malloc,free,realloc -p 1234
# ็ป่ฎก่ฐ็จๆฌกๆฐ
ltrace -c ./myapp
# ๅๆถๆพ็คบ็ณป็ป่ฐ็จ๏ผ-S๏ผ
ltrace -S ./myapp
# ๅๆๅจๆ้พๆฅไพ่ต
ldd /usr/bin/nginx
objdump -p /usr/bin/nginx | grep NEEDED
5. perf: Hardware Performance Counters
perf directly accesses the CPU's PMU (Performance Monitoring Unit) with extremely low overhead (typically perf.out
### perf top โ live hotspot
```bash
# ๅฎๆถๆพ็คบๆ็ญๅฝๆฐ๏ผ็ฑปไผผ top ไฝ้ๅฏน CPU ๆไปค๏ผ
perf top
# ๅช็ๆ่ฟ็จ
perf top -p 1234
# perf trace๏ผ็ฑป strace ไฝๅผ้ๆดไฝ๏ผๅบไบ eBPF/tracepoint๏ผ
perf trace -p 1234
perf trace -e openat,read -p 1234
Prerequisites: Using perf requires: โ install
linux-tools-$(uname -r), โกkernel.perf_event_paranoid=1(allow non-root sampling), โข binaries with debug symbols (Go includes them by default; C/C++ needs-gor install-dbgpackages).
6. FlameGraph Visualization
Flame graphs were invented by Brendan Gregg: the X-axis represents CPU time proportion (wider = more time), Y-axis represents call stack depth (upper calls lower), colors are random (no meaning). Flame graphs immediately reveal CPU hotspot functions.
#!/usr/bin/env bash
# flamegraph.sh โ ๅฎๆด็ซ็ฐๅพ็ๆ่ๆฌ
set -euo pipefail
PID="${1:?Usage: flamegraph.sh PID [duration_seconds]}"
DURATION="${2:-30}"
OUTPUT_DIR="${3:-/tmp/flamegraph-$(date +%Y%m%d-%H%M%S)}"
FLAMEGRAPH_DIR="/opt/FlameGraph"
# ๅฎ่ฃ
FlameGraph ๅทฅๅ
ท๏ผ่ฅๆชๅฎ่ฃ
๏ผ
if [ ! -d "$FLAMEGRAPH_DIR" ]; then
git clone --depth=1 https://github.com/brendangregg/FlameGraph.git "$FLAMEGRAPH_DIR"
fi
mkdir -p "$OUTPUT_DIR"
echo "[1/4] Sampling PID $PID for ${DURATION}s ..."
perf record -g --call-graph dwarf \
-o "$OUTPUT_DIR/perf.data" \
-p "$PID" \
sleep "$DURATION"
echo "[2/4] Generating perf script ..."
perf script -i "$OUTPUT_DIR/perf.data" > "$OUTPUT_DIR/perf.out"
echo "[3/4] Collapsing stacks ..."
"$FLAMEGRAPH_DIR/stackcollapse-perf.pl" \
"$OUTPUT_DIR/perf.out" > "$OUTPUT_DIR/folded.out"
echo "[4/4] Rendering flame graph ..."
"$FLAMEGRAPH_DIR/flamegraph.pl" \
--title "CPU Flame Graph โ PID $PID (${DURATION}s)" \
--width 1400 \
"$OUTPUT_DIR/folded.out" > "$OUTPUT_DIR/flamegraph.svg"
echo "Done: $OUTPUT_DIR/flamegraph.svg"
echo "Open with: xdg-open $OUTPUT_DIR/flamegraph.svg"
Off-CPU Flame Graph (Wait Time)
# Off-CPU ็ซ็ฐๅพๅๆ็บฟ็จๅจ้่ฟ่ก็ถๆ็ๆถ้ด๏ผIO็ญๅพ
ใ้็ญๅพ
๏ผ
# ้่ฆ eBPF ๆฏๆ
git clone https://github.com/brendangregg/FlameGraph.git /opt/FlameGraph
# ไฝฟ็จ bpftrace ้้ off-cpu ๆฐๆฎ
bpftrace -e '
tracepoint:sched:sched_switch {
if (args->prev_state == TASK_INTERRUPTIBLE || args->prev_state == TASK_UNINTERRUPTIBLE) {
@start[args->prev_pid] = nsecs;
}
}
tracepoint:sched:sched_switch {
if (@start[args->next_pid]) {
@offcpu[args->next_comm, args->next_pid] =
hist(nsecs - @start[args->next_pid]);
delete(@start[args->next_pid]);
}
}' > /tmp/offcpu.txt
7. eBPF and bpftrace
eBPF (extended Berkeley Packet Filter) allows safely running user-written programs inside the kernel without modifying kernel source or loading kernel modules. eBPF programs pass through a static verifier guaranteeing they cannot crash the kernel. bpftrace is a high-level eBPF scripting language with awk-like syntax.
bpftrace Common One-liners
| Goal | bpftrace command |
|---|---|
| Trace all open() calls | bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }' |
| Count slow disk IO (>1ms) | bpftrace -e 'tracepoint:block:block_rq_complete { if (args->nr_sector > 0) { @lat = hist((nsecs - @start[args->sector]) / 1000); } }' |
| Trace new TCP connections | bpftrace -e 'kprobe:tcp_connect { printf("%s โ %d\n", comm, pid); }' |
| Count syscalls per process | bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' |
| Histogram of function latency | bpftrace -e 'uprobe:/usr/bin/myapp:main { @start=nsecs; } uretprobe:/usr/bin/myapp:main { @lat=hist(nsecs-@start); }' |
| Trace OOM kill events | bpftrace -e 'kprobe:oom_kill_process { printf("OOM kill: %s (pid=%d)\n", comm, pid); }' |
| Count kernel function call rate | bpftrace -e 'kprobe:vfs_read { @[kstack] = count(); } interval:s:5 { print(@); clear(@); }' |
BCC Tool Collection
# ๅฎ่ฃ
BCC๏ผUbuntu/Debian๏ผ
apt install bpfcc-tools linux-headers-$(uname -r)
# opensnoop: ่ท่ธชๆๆ open ่ฐ็จ
opensnoop-bpfcc
# execsnoop: ่ท่ธชๆฐ่ฟ็จๆง่ก
execsnoop-bpfcc
# tcptop: ๆ่ฟ็จ็ป่ฎก TCP ๆต้
tcptop-bpfcc
# biolatency: ็ฃ็ IO ๅปถ่ฟๅๅธ๏ผๆฑ็ถๅพ๏ผ
biolatency-bpfcc -d 10
# runqlat: CPU ่ฟ่ก้ๅ็ญๅพ
ๆถ้ดๅๅธ
runqlat-bpfcc 10 1
# profile: ้ๆ ท CPU ่ฐ็จๆ ๏ผ็จไบ็ซ็ฐๅพ๏ผ
profile-bpfcc -F 99 -f 30 > /tmp/out.stacks
/opt/FlameGraph/flamegraph.pl /tmp/out.stacks > profile.svg
8. Network and IO Monitoring Tools
# iotop: ๆ่ฟ็จๅฎๆถๆพ็คบ็ฃ็IO๏ผ้ root๏ผ
iotop -ao # -a ็ดฏ็งฏๆจกๅผ๏ผ-o ๅชๆพ็คบๆIO็่ฟ็จ
# nethogs: ๆ่ฟ็จ็ป่ฎก็ฝ็ปๅธฆๅฎฝ๏ผ้ root๏ผ
nethogs eth0
# iftop: ๆ่ฟๆฅๅฏน็ป่ฎกๅธฆๅฎฝ๏ผ้ root๏ผ
iftop -i eth0
# nload: ๆฅๅฃ็บงๅธฆๅฎฝๅฎๆถๅพ
nload eth0
# ss: ๅฅๆฅๅญ็ป่ฎก๏ผๆดๅฟซ็ netstat ๆฟไปฃ๏ผ
ss -tunap # ๆพ็คบๆๆ TCP/UDP ่ฟๆฅๅ่ฟ็จ
ss -s # ่ฟๆฅๆฐ็ป่ฎกๆ่ฆ
ss -o state ESTABLISHED '( dport = :80 or sport = :80 )' # ่ฟๆปค80็ซฏๅฃ
# ็ฝๅกไธขๅ
็ป่ฎก
ip -s link show eth0
ethtool -S eth0 | grep -i drop
9. Memory Deep Dive
# /proc/meminfo ๅ
ณ้ฎๅญๆฎต
cat /proc/meminfo
# MemTotal: โ ๆป็ฉ็ๅ
ๅญ
# MemFree: โ ๅฎๅ
จ็ฉบ้ฒ๏ผไธๅซ็ผๅญ๏ผ
# MemAvailable: โ ๅฎ้
ๅฏ็จ๏ผๅ
ๅซๅฏๅๆถ็ผๅญ๏ผ
# Buffers: โ ๅ่ฎพๅค็ผๅฒ
# Cached: โ ้กต็ผๅญ๏ผๆไปถๅ
ๅฎน็ผๅญ๏ผ
# SwapCached: โ ๅทฒ่ขซๆขๅๅ
ๅญไฝๆ ๅฐ่ฟๅจswap็้กต
# Active/Inactive: โ ้กตๆดป่ท็ถๆ๏ผๅฝฑๅๅๆถ็ญ็ฅ๏ผ
# Dirty: โ ๅพ
ๅท็็่้กต๏ผ้ซๅผ่ฏดๆIOๅๅๅๅคง๏ผ
# Writeback: โ ๆญฃๅจๅๅ็ฃ็็้กต
# Slab: โ ๅ
ๆ ธ slab ๅ้
ๅจ็จ้
# VmallocUsed: โ vmalloc ๅ้
็จ้
# slabtop: ๅ
ๆ ธ slab ็ผๅญๅ ็จ๏ผdentry/inode cache ๅธธ่งๅคงๆท๏ผ
slabtop
# smem: ็ฒพ็กฎ่ฟ็จๅ
ๅญๅ ็จ๏ผPSS ๆฏ RSS ๆดๅ็กฎ๏ผ
# PSS = Private + ๆๆฏไพๅ้
็ๅ
ฑไบซๅ
ๅญ
smem -r -k -s pss | head -20
smem -P nginx -k # ๆ่ฟ็จๅ่ฟๆปค
# valgrind ๅ
ๅญๆณๆผๆฃๆต๏ผไป
็จไบๆต่ฏ็ฏๅข๏ผ
valgrind --tool=memcheck --leak-check=full ./myapp
# valgrind heap profiling๏ผๅ
ๅญๅ้
็ญ็น๏ผ
valgrind --tool=massif --pages-as-heap=yes ./myapp
ms_print massif.out.* | head -100
10. Practice: Full "CPU 100%" Investigation
The following is a complete CPU 100% investigation walkthrough, from alert receipt to root cause identification:
## ๆญฅ้ชค1๏ผ็กฎ่ฎค็ฐ่ฑก
uptime
# ่พๅบ๏ผload average: 15.23, 14.98, 13.01
# โ ๆ็ปญ้ซ่ด่ฝฝ๏ผ15ๅ้ๅๅผ้ซ๏ผไธๆฏ็ฌๆถๅฐๅณฐ
## ๆญฅ้ชค2๏ผๅฎไฝ่ฟ็จ
top -b -n 1 | head -30
# ๅ็ฐ PID 2341 myapp ่ฟ็จ CPU ๅ 790%๏ผ8ๆ ธๆบๅจ๏ผ
## ๆญฅ้ชค3๏ผ็กฎ่ฎคๆฏ็จๆทๆ่ฟๆฏๅ
ๆ ธๆ CPU
pidstat -u -p 2341 1 5
# %user=785 %system=5 โ ็จๆทๆ็ญ็น๏ผๅจๅบ็จไปฃ็ ไธญ
## ๆญฅ้ชค4๏ผๆฅ็็บฟ็จ็บง CPU๏ผๆพๅฐ็ญ็บฟ็จ๏ผ
top -H -p 2341
# ๅ็ฐ็บฟ็จ TID 2345 ๅ CPU 99%
ps -Lp 2341 -o pid,tid,pcpu,comm
# ๆพๅฐ็ญ็บฟ็จ tid=2345
## ๆญฅ้ชค5๏ผperf ้ๆ ท๏ผ30็ง๏ผ
perf record -g --call-graph dwarf -p 2341 sleep 30
perf report --stdio | head -60
# ่พๅบ็ญๅฝๆฐ๏ผjson.Marshal โ 74.3%
# โ JSON ๅบๅๅๅ ไบ็ปๅคง้จๅ CPU
## ๆญฅ้ชค6๏ผ็ๆ็ซ็ฐๅพ็กฎ่ฎค
perf script > /tmp/perf.out
/opt/FlameGraph/stackcollapse-perf.pl /tmp/perf.out > /tmp/folded.out
/opt/FlameGraph/flamegraph.pl /tmp/folded.out > /tmp/flame.svg
# ็ซ็ฐๅพๆพ็คบ json.Marshal ่ฐ็จ้พๅฎฝๅบฆๅ ๅ
จๅพ 70%+
## ๆญฅ้ชค7๏ผ็จ bpftrace ็ป่ฎก่ฐ็จ้ข็
bpftrace -e '
uprobe:/opt/myapp/bin/myapp:encoding/json.Marshal {
@calls = count();
}
interval:s:1 {
print(@calls);
clear(@calls);
}' &
# ่พๅบ๏ผๆฏ็ง่ฐ็จ 50,000 ๆฌก
## ๆญฅ้ชค8๏ผไปฃ็ ๅฑ้ข็กฎ่ฎค
# ๅฎกๆฅไปฃ็ ๅ็ฐ๏ผ็ญ็น HTTP handler ๅฏนๆฏไธช่ฏทๆฑ้ฝ่ฟ่ก
# ๅฎๆดๅฏน่ฑกๅบๅๅ๏ผ่ๅคง้จๅๅญๆฎตๅจ่ฏทๆฑ้ดไธๅๅ
# โ ๆ นๅ ๏ผ็ผบๅฐๅบๅๅ็ปๆ็ผๅญ
## ๆญฅ้ชค9๏ผไฟฎๅคไธ้ช่ฏ
# ๆทปๅ sync.Map ็ผๅญๅบๅๅ็ปๆ๏ผTTL=1s
# ้ๆฐ้จ็ฝฒๅ๏ผ
perf stat -p $(pgrep myapp) sleep 10
# IPC ไป 0.3 ๆๅๅฐ 2.1๏ผCPU ไฝฟ็จ็ไป 790% ้่ณ 45%
Investigation summary: Full investigation path: uptime โ top โ pidstat โ perf record โ perf report โ FlameGraph โ bpftrace verification โ code review โ fix. Each step narrows the search: machine โ process โ thread โ function โ code line.
Previous
โ Ch13: systemd
Next
Ch15: Security โ