第 12 章

慢查询分析器

慢查询分析与优化

慢查询是数据库性能问题的直接表现。一套系统化的分析方法：开启慢日志 → 聚合分析 → EXPLAIN 解读 → 优化执行计划 → 验证效果，可以解决 90% 的 MySQL 性能问题。

开启慢查询日志

-- 查看当前配置
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- 动态开启（无需重启）
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;         -- 超过 1 秒记录
SET GLOBAL log_queries_not_using_indexes = ON;  -- 不使用索引的查询也记录
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

-- my.cnf 持久化配置
[mysqld]
slow_query_log            = 1
slow_query_log_file       = /var/log/mysql/mysql-slow.log
long_query_time           = 1
log_queries_not_using_indexes = 1
log_throttle_queries_not_using_indexes = 10  -- 每分钟最多记录 10 条未使用索引的查询

慢日志记录格式

# Time: 2024-01-15T02:35:47.123456Z
# User@Host: app[app] @ 10.0.1.100 [10.0.1.100]  Id: 12345
# Query_time: 3.214507  Lock_time: 0.000123
# Rows_sent: 1  Rows_examined: 1048576    ← 扫描 100 万行！
# Bytes_sent: 145
SET timestamp=1705284947;
SELECT * FROM orders WHERE YEAR(created_at) = 2024 AND user_id = 100;

关键字段：Query_time（执行时间）、Lock_time（等锁时间）、Rows_examined（扫描行数）。扫描行数远大于发送行数，说明索引效率低下。

pt-query-digest 分析

手动分析慢日志效率极低，pt-query-digest 自动聚合同类 SQL，按总耗时排序，快速定位最值得优化的查询。

# 基础分析（分析最近 1 小时慢日志）
pt-query-digest /var/log/mysql/mysql-slow.log

# 按总耗时排序（默认），只显示 Top 20
pt-query-digest --limit 20 /var/log/mysql/mysql-slow.log

# 只分析特定数据库
pt-query-digest --filter '$event->{db} eq "mydb"' mysql-slow.log

# 分析特定时间段
pt-query-digest --since '2024-01-15 00:00:00' \
                --until '2024-01-15 06:00:00' mysql-slow.log

# 只看执行时间超过 5 秒的
pt-query-digest --filter '$event->{Query_time} > 5' mysql-slow.log

# 实时分析 MySQL processlist（无需慢日志）
pt-query-digest --processlist h=localhost,u=root,p=pass --interval 1

# 分析 binlog
mysqlbinlog mysql-bin.000001 | pt-query-digest --type binlog

pt-query-digest 输出解读

# Profile
# Rank Query ID           Response time  Calls  R/Call V/M   Item
# ==== ================== =============  =====  ====== =====  ====
#    1 0xABC...            245.0000 68.1%   1234  0.1985  0.02  SELECT orders

# 最重要的字段:
# Response time: 该查询的总耗时 (占比)
# Calls: 执行次数
# R/Call: 平均每次耗时

# 单个查询详情
# Query 1: 1234 QPS, 245s total, 0.20s avg, 0.45s 95th...
#   Attribute    pct   total     min     max     avg     95th  stddev  median
#   ============ === ======= ======= ======= ======= ======= =======  ======
#   Count         68    1234
#   Exec time     68    245s     50ms      2s    198ms    450ms   180ms   180ms
#   Lock time      2     5.2s      1ms    100ms     4ms    10ms     8ms     3ms
#   Rows sent     10  12.34k       1       100      10      50      15      8
#   Rows examine  90  890.0M   1.00k  100.00k  720.00k  90.95k  ...

EXPLAIN 深度解读

EXPLAIN SELECT o.id, o.order_no, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 20
  AND o.created_at > '2024-01-01'
ORDER BY o.created_at DESC
LIMIT 20\G

字段	关键值	含义
type	system > const > eq_ref > ref > range > index > ALL	连接类型，ALL 最差（全表扫描）
key	索引名或 NULL	实际使用的索引；NULL 表示未用索引
key_len	字节数	索引使用长度，越大代表用了更多列
rows	预估行数	优化器估算需要扫描的行数
filtered	百分比	经 WHERE 过滤后剩余的行比例
Extra	见下方	额外信息，最重要的字段

Extra 字段解读（最重要）

Extra 值	含义	优化方向
Using index	覆盖索引，无需回表	好，无需优化
Using where	WHERE 过滤在 Server 层进行	通常正常
Using filesort	需要额外排序（在内存/磁盘）	添加排序字段到索引
Using temporary	使用临时表（GROUP BY/DISTINCT）	改写 SQL 或优化索引
Using index condition	ICP（索引条件下推）	好，5.6+ 默认开启
Using MRR	多范围读优化	好
Select tables optimized away	只访问索引即可（如 MIN/MAX）	好

type 字段详解

-- const: 主键或唯一索引等值查询（最快）
EXPLAIN SELECT * FROM users WHERE id = 1;
-- type=const, rows=1

-- eq_ref: 连接时被驱动表使用主键/唯一索引
EXPLAIN SELECT * FROM orders o JOIN users u ON u.id = o.user_id;
-- users 表 type=eq_ref

-- ref: 非唯一索引等值查询
EXPLAIN SELECT * FROM orders WHERE user_id = 100;
-- type=ref

-- range: 索引范围扫描（BETWEEN, >,  '2024-01-01';
-- type=range

-- index: 全索引扫描（比 ALL 好一点，但仍需警惕）
-- ALL: 全表扫描（必须优化！）

EXPLAIN ANALYZE（MySQL 8.0.18+）

EXPLAIN ANALYZE 真正执行查询并返回实际运行时指标，揭示优化器估算与实际执行的差异。

EXPLAIN ANALYZE
SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE status = 20 AND created_at > '2024-01-01'
GROUP BY user_id
ORDER BY order_count DESC
LIMIT 10\G

-- 输出示例（树状格式）:
-- -> Limit: 10 row(s)  (actual time=125.3..125.3 rows=10 loops=1)
--   -> Sort: order_count DESC  (actual time=125.3..125.3 rows=10 loops=1)
--     -> Table scan on   (actual time=125.2..125.2 rows=8234 loops=1)
--       -> Aggregate using temporary table  (actual time=120.1..125.0 rows=8234 loops=1)
--         -> Index range scan on orders using idx_status_created
--              (cost=15234 rows=89234)
--              (actual time=0.234..85.2 rows=91023 loops=1)  ← 扫描 91023 行

-- 重点看: estimated rows vs actual rows 差异大 → 统计信息不准，需要 ANALYZE TABLE

Performance Schema 实时诊断

-- 查看当前正在执行的慢 SQL
SELECT * FROM performance_schema.events_statements_current
WHERE timer_wait > 5000000000  -- 超过 5 秒（单位纳秒）
ORDER BY timer_wait DESC\G

-- Top 10 最耗时的 SQL（历史聚合）
SELECT digest_text,
       count_star         AS exec_count,
       ROUND(avg_timer_wait/1e9, 3) AS avg_sec,
       ROUND(sum_timer_wait/1e9, 1) AS total_sec,
       sum_rows_examined  AS total_rows_examined
FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC
LIMIT 10\G

-- 查看表 I/O 热点
SELECT object_schema, object_name,
       count_read, count_write,
       ROUND(sum_timer_read/1e12, 3)  AS read_seconds,
       ROUND(sum_timer_write/1e12, 3) AS write_seconds
FROM performance_schema.table_io_waits_summary_by_table
WHERE object_schema NOT IN ('mysql','performance_schema','information_schema')
ORDER BY sum_timer_read + sum_timer_write DESC
LIMIT 10;

常见慢查询模式

模式 1：索引列上使用函数/计算

-- 慢：函数导致索引失效
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
SELECT * FROM users WHERE LEFT(name, 3) = '张三';

-- 快：改写为范围查询
SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31 23:59:59';
SELECT * FROM users WHERE name LIKE '张三%';

模式 2：深度分页（OFFSET 过大）

-- 慢：OFFSET 100000 需要读取 100010 行
SELECT * FROM orders ORDER BY id LIMIT 100000, 20;

-- 快：游标分页（记录上次最后一个 ID）
SELECT * FROM orders WHERE id > :last_id ORDER BY id LIMIT 20;

模式 3：SELECT * 引发回表

-- 慢：回表读取所有列
SELECT * FROM orders WHERE user_id = 100 AND status = 20;

-- 快：覆盖索引（只取索引中有的列）
SELECT id, order_no, total_amount FROM orders
WHERE user_id = 100 AND status = 20;
-- 需要索引: (user_id, status, id, order_no, total_amount)

模式 4：OR 导致索引失效

-- 慢：OR 两边列不同，可能全表扫描
SELECT * FROM users WHERE phone = '13800000001' OR email = '[email protected]';

-- 快：改写为 UNION ALL
SELECT * FROM users WHERE phone = '13800000001'
UNION ALL
SELECT * FROM users WHERE email = '[email protected]';

优化套路总结

问题现象	优化方向
type=ALL, rows 很大	添加索引或改写查询条件
Extra: Using filesort	把 ORDER BY 列加入复合索引
Extra: Using temporary	GROUP BY 列加索引；或考虑分层汇总
Rows_examined >> Rows_sent	索引效率低，添加覆盖索引
Lock_time 占比高	缩短事务，检查锁争用
执行计划正常但仍慢	ANALYZE TABLE 更新统计信息

持续监控方案

# Prometheus + Grafana 告警规则示例
# 慢查询数量每分钟增长超过 10 次
- alert: MySQLSlowQuerySpike
  expr: rate(mysql_global_status_slow_queries[5m]) > 10
  for: 2m
  annotations:
    summary: "MySQL slow query spike detected"

# 使用 PMM (Percona Monitoring and Management) — 最专业的 MySQL 监控
docker run -d -p 80:80 percona/pmm-server:2
pmm-admin add mysql --username=pmm --password=pass \
  --host=mysql-host --port=3306 --service-name=prod-mysql

黄金规则：优化前先 EXPLAIN，优化后再 EXPLAIN 验证。不要凭感觉优化，也不要在没有基准数据的情况下优化。pt-query-digest + EXPLAIN ANALYZE 是最有效的两件工具组合。

本章评分

4.6 / 5 (31 评分)