第 44 章

窗口函数实战

MySQL 窗口函数完全指南

窗口函数(Window Functions)是 MySQL 8.0 引入的最重要特性之一。与普通聚合函数不同,窗口函数在计算时不折叠行 -- 每行都保留在结果集中,同时能访问与当前行相关的一组行(即"窗口")进行计算。这使得排名、累计求和、移动平均、同比环比等以往需要复杂子查询才能实现的分析任务变得简洁高效。

**版本要求:**窗口函数需要 MySQL 8.0 或更高版本。MariaDB 10.2+ 也支持窗口函数,但语法和行为细节略有差异。本章所有示例基于 MySQL 8.0/8.4。

本章将系统讲解所有窗口函数的语法与语义,通过 20+ 个真实业务场景演示如何使用窗口函数解决实际问题,并深入分析其执行原理和性能优化策略。

窗口函数语法结构

窗口函数的完整语法由三部分组成:函数调用OVER 子句、以及可选的命名窗口

function_name([args]) OVER ( [PARTITION BY partition_expression, ...] [ORDER BY sort_expression [ASC|DESC], ...] [frame_clause] )

-- frame_clause: {ROWS | RANGE | GROUPS} BETWEEN frame_start AND frame_end

-- frame_start / frame_end: UNBOUNDED PRECEDING | N PRECEDING | CURRENT ROW | N FOLLOWING | UNBOUNDED FOLLOWING

核心概念

命名窗口(WINDOW 子句)

当多个窗口函数使用相同的窗口定义时,可以用 WINDOW 子句命名并复用:

SELECT
  employee_id,
  department,
  salary,
  ROW_NUMBER() OVER w AS rn,
  RANK()       OVER w AS rnk,
  DENSE_RANK() OVER w AS dense_rnk
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

排名函数

ROW_NUMBER()

为分区内的每一行分配唯一连续编号(1, 2, 3, ...),即使值相同也不会产生并列。

语法

ROW_NUMBER() OVER (
  [PARTITION BY col, ...]
  ORDER BY col [ASC|DESC], ...
)

示例数据

id department name salary
id department name salary
1 Engineering Alice 12000
2 Engineering Bob 11000
3 Engineering Charlie 12000
4 Sales Diana 9000
5 Sales Eve 9500

查询与结果

SELECT
  name, department, salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;
name department salary rn
name department salary rn
Alice Engineering 12000 1
Charlie Engineering 12000 2
Bob Engineering 11000 3
Eve Sales 9500 1
Diana Sales 9000 2

注意:Alice 和 Charlie 的 salary 相同,但 ROW_NUMBER() 仍然分配了不同的编号(1 和 2)。具体顺序在相同值时不确定,取决于存储引擎返回行的顺序。如果需要确定性排序,应在 ORDER BY 中增加额外列(如主键)。

RANK()

与 ROW_NUMBER() 类似,但允许并列。值相同的行获得相同排名,下一个不同值的排名跳过并列数。

SELECT
  name, department, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;
name department salary rnk
name department salary rnk
Alice Engineering 12000 1
Charlie Engineering 12000 1
Bob Engineering 11000 3
Eve Sales 9500 1
Diana Sales 9000 2

注意 Bob 的排名是 3(不是 2),因为有两个人并列第 1 名。

DENSE_RANK()

与 RANK() 相同允许并列,但下一个不同值的排名不跳过,保持连续。

SELECT
  name, department, salary,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;
name department salary dense_rnk
name department salary dense_rnk
Alice Engineering 12000 1
Charlie Engineering 12000 1
Bob Engineering 11000 2
Eve Sales 9500 1
Diana Sales 9000 2

Bob 现在是 2(紧跟并列第 1 名之后)。

ROW_NUMBER vs RANK vs DENSE_RANK 对比

函数 并列 跳号 适用场景
函数 并列 跳号 适用场景
ROW_NUMBER() 不允许 分页、去重、唯一编号
RANK() 允许 跳过 竞赛排名、考试排名
DENSE_RANK() 允许 不跳过 工资等级、连续排名

NTILE(n)

将分区内的行均匀分成 n 个桶,并返回桶编号(1 到 n)。如果行数不能被 n 整除,前面的桶多分一行。

SELECT
  name, salary,
  NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
name salary quartile
name salary quartile
Alice 12000 1
Charlie 12000 1
Bob 11000 2
Eve 9500 3
Diana 9000 4

**典型用途:**计算百分位(NTILE(100))、四分位(NTILE(4))、将用户分成 A/B 测试组。

取值函数

LAG(expr, offset, default)

访问当前行之前offset 行的值。offset 默认 1,default 默认 NULL。

语法

LAG(expression [, offset [, default_value]])
  OVER (
    [PARTITION BY col, ...]
    ORDER BY col [ASC|DESC], ...
  )

示例:月度销售额环比

SELECT
  month,
  revenue,
  LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month,
  ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
        / LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly_sales;
month revenue prev_month mom_pct
month revenue prev_month mom_pct
2024-01 100000 0 NULL
2024-02 120000 100000 20.0
2024-03 115000 120000 -4.2
2024-04 130000 115000 13.0

LEAD(expr, offset, default)

访问当前行之后offset 行的值。与 LAG 方向相反。

SELECT
  order_id,
  customer_id,
  order_date,
  LEAD(order_date, 1) OVER (
    PARTITION BY customer_id ORDER BY order_date
  ) AS next_order_date,
  DATEDIFF(
    LEAD(order_date, 1) OVER (PARTITION BY customer_id ORDER BY order_date),
    order_date
  ) AS days_to_next
FROM orders;
order_id customer_id order_date next_order_date days_to_next
order_id customer_id order_date next_order_date days_to_next
101 1 2024-01-05 2024-01-20 15
105 1 2024-01-20 2024-03-10 50
112 1 2024-03-10 NULL NULL

FIRST_VALUE(expr)

返回窗口帧中第一行的值。

SELECT
  name, department, salary,
  FIRST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS top_earner
FROM employees;
name department salary top_earner
name department salary top_earner
Alice Engineering 12000 Alice
Charlie Engineering 12000 Alice
Bob Engineering 11000 Alice
Eve Sales 9500 Eve
Diana Sales 9000 Eve

LAST_VALUE(expr)

返回窗口帧中最后一行的值。

**常见陷阱:**默认帧是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,这意味着 LAST_VALUE() 只会返回到当前行为止的最后一个值(即当前行自身)。要获得分区内真正的最后一行,必须显式指定帧为 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

-- 错误用法(常见错误):
SELECT name, salary,
  LAST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
  ) AS lowest_earner   -- 永远返回当前行的 name!
FROM employees;

-- 正确用法:
SELECT name, salary,
  LAST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS lowest_earner   -- 返回分区内 salary 最低的人
FROM employees;

NTH_VALUE(expr, n)

返回窗口帧中第 n 行的值。n 从 1 开始。

SELECT
  name, department, salary,
  NTH_VALUE(name, 2) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS second_highest
FROM employees;
name department salary second_highest
name department salary second_highest
Alice Engineering 12000 Charlie
Charlie Engineering 12000 Charlie
Bob Engineering 11000 Charlie
Eve Sales 9500 Diana
Diana Sales 9000 Diana

取值函数对比

函数 方向 用途 注意事项
函数 方向 用途 注意事项
LAG 向前看(前一行) 环比、差值计算 分区首行返回 default
LEAD 向后看(后一行) 预测下一次、间隔计算 分区末行返回 default
FIRST_VALUE 帧第一行 分组内最大/最小 通常不需改帧
LAST_VALUE 帧最后一行 分组内最大/最小 必须显式设帧!
NTH_VALUE 帧第 n 行 取第 2 名、第 3 名 需要显式设帧

聚合窗口函数

所有标准聚合函数都可以配合 OVER() 作为窗口函数使用。与普通聚合不同,窗口聚合不折叠行

SUM() OVER()

计算窗口帧内的累计总和。

SELECT
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date) AS running_total,
  SUM(amount) OVER () AS grand_total
FROM orders;
order_date amount running_total grand_total
order_date amount running_total grand_total
2024-01-01 100 100 850
2024-01-02 200 300 850
2024-01-03 150 450 850
2024-01-04 250 700 850
2024-01-05 150 850 850

AVG() OVER()

计算窗口帧内的平均值。

-- 每个员工的薪资与部门平均薪资对比
SELECT
  name, department, salary,
  ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
  salary - ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS diff_from_avg
FROM employees;
name department salary dept_avg diff_from_avg
name department salary dept_avg diff_from_avg
Alice Engineering 12000 11667 333
Charlie Engineering 12000 11667 333
Bob Engineering 11000 11667 -667
Eve Sales 9500 9250 250
Diana Sales 9000 9250 -250

COUNT() OVER()

-- 每个订单占部门订单总数的比例
SELECT
  order_id, department,
  COUNT(*) OVER (PARTITION BY department) AS dept_orders,
  ROUND(1.0 / COUNT(*) OVER (PARTITION BY department) * 100, 1) AS pct
FROM orders;

MIN() / MAX() OVER()

-- 与分区内极值对比
SELECT
  name, department, salary,
  MIN(salary) OVER (PARTITION BY department) AS dept_min,
  MAX(salary) OVER (PARTITION BY department) AS dept_max,
  salary - MIN(salary) OVER (PARTITION BY department) AS above_min
FROM employees;

PARTITION BY 详解

PARTITION BY 子句将结果集分成多个分区,窗口函数在每个分区内独立计算。它类似 GROUP BY,但不折叠行。

多列分区

-- 按年份和部门分区,计算排名
SELECT
  YEAR(hire_date) AS hire_year,
  department,
  name,
  salary,
  RANK() OVER (
    PARTITION BY YEAR(hire_date), department
    ORDER BY salary DESC
  ) AS rank_in_year_dept
FROM employees;

PARTITION BY 表达式

-- 按薪资区间分区
SELECT
  name, salary,
  CASE
    WHEN salary >= 10000 THEN 'High'
    WHEN salary >= 7000  THEN 'Mid'
    ELSE 'Low'
  END AS tier,
  ROW_NUMBER() OVER (
    PARTITION BY CASE
      WHEN salary >= 10000 THEN 'High'
      WHEN salary >= 7000  THEN 'Mid'
      ELSE 'Low'
    END
    ORDER BY salary DESC
  ) AS rank_in_tier
FROM employees;

无 PARTITION BY(全局窗口)

省略 PARTITION BY 时,整个结果集被视为一个分区:

-- 全局排名
SELECT name, salary,
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;

帧子句(Frame Clause)

帧子句定义了窗口函数在分区内能"看到"的行范围。只有聚合窗口函数(SUM, AVG, COUNT, MIN, MAX)和取值函数(FIRST_VALUE, LAST_VALUE, NTH_VALUE)会受帧子句影响。排名函数(ROW_NUMBER, RANK, DENSE_RANK, NTILE)和 LAG/LEAD 忽略帧子句。

ROWS 模式

基于物理行数定义帧范围。

-- 3 日移动平均
SELECT
  order_date,
  amount,
  ROUND(AVG(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ), 2) AS moving_avg_3
FROM daily_sales;
order_date amount moving_avg_3
order_date amount moving_avg_3
2024-01-01 100 100.00
2024-01-02 200 150.00
2024-01-03 150 150.00
2024-01-04 300 216.67
2024-01-05 250 233.33

RANGE 模式

基于值范围定义帧。与 ROWS 不同,RANGE 考虑值的大小关系。

-- 累计到当前值的总和(值相同的行全部包含)
SELECT
  salary,
  name,
  SUM(salary) OVER (
    ORDER BY salary
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_salary
FROM employees;

**ROWS vs RANGE 的区别:**当 ORDER BY 列有重复值时,RANGE 会将值相同的行视为同一组全部包含,而 ROWS 严格按物理行数计算。大多数情况下建议使用 ROWS,因为行为更可预测且性能更好。

GROUPS 模式(MySQL 8.0.2+)

基于**peer group(同值组)**定义帧。每个同值组包含 ORDER BY 值相同的所有行。

-- 当前组 + 前后各 1 组
SELECT
  department, salary,
  SUM(salary) OVER (
    ORDER BY salary
    GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) AS group_sum
FROM employees;

常用帧子句速查

帧定义 含义 典型用途
帧定义 含义 典型用途
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 从分区开头到当前行 累计求和
ROWS BETWEEN N PRECEDING AND CURRENT ROW 前 N 行到当前行 移动平均
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING 当前行到分区末尾 剩余求和
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 整个分区 LAST_VALUE、NTH_VALUE
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 前后各 1 行 + 当前行 平滑处理
RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW 当前日期前 7 天内 7 日滚动聚合

默认帧行为

当显式省略帧子句时,MySQL 使用以下默认值:

20+ 实战案例

1 连续登录天数

找出每个用户的最大连续登录天数。核心思路:用 ROW_NUMBER() 生成序号,然后用日期减去序号得到"组标识"。

WITH login_groups AS (
  SELECT
    user_id,
    login_date,
    login_date - INTERVAL ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY login_date
    ) DAY AS grp
  FROM (
    SELECT DISTINCT user_id, DATE(login_time) AS login_date
    FROM user_logins
  ) t
)
SELECT
  user_id,
  MIN(login_date) AS streak_start,
  MAX(login_date) AS streak_end,
  COUNT(*)         AS consecutive_days
FROM login_groups
GROUP BY user_id, grp
ORDER BY consecutive_days DESC;

**原理:**对于连续日期序列,date - ROW_NUMBER() 的结果是一个常数。例如 1月1日(rn=1)、1月2日(rn=2)、1月3日(rn=3) 减去后都得到 12月31日。一旦有中断,差值就不同了,形成新的分组。

2 累计求和(Running Total)

SELECT
  transaction_date,
  amount,
  SUM(amount) OVER (
    ORDER BY transaction_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM transactions
WHERE account_id = 1001
ORDER BY transaction_date;

3 7 日移动平均

SELECT
  sale_date,
  daily_revenue,
  ROUND(AVG(daily_revenue) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ), 2) AS ma_7
FROM daily_sales;

注意用 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW 表示包含当前行在内的 7 行。

4 分组 Top-N(每部门薪资前 3)

SELECT * FROM (
  SELECT
    department,
    name,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department ORDER BY salary DESC
    ) AS rn
  FROM employees
) ranked
WHERE rn <= 3;

如果允许并列,用 DENSE_RANK() 替代 ROW_NUMBER()

5 同比(Year-over-Year)

SELECT
  YEAR(order_date) AS yr,
  MONTH(order_date) AS mo,
  SUM(amount) AS monthly_revenue,
  LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) AS same_month_last_year,
  ROUND(
    (SUM(amount) - LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)))
    / LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) * 100, 1
  ) AS yoy_pct
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY yr, mo;

6 环比(Month-over-Month)

WITH monthly AS (
  SELECT
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    SUM(amount) AS revenue
  FROM orders
  GROUP BY DATE_FORMAT(order_date, '%Y-%m')
)
SELECT
  month,
  revenue,
  LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
  ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
        / LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly;

7 百分位排名(Percentile Rank)

SELECT
  name,
  salary,
  ROUND(PERCENT_RANK() OVER (ORDER BY salary) * 100, 1) AS percentile,
  CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;

PERCENT_RANK():(rank - 1) / (total_rows - 1),范围 [0, 1]。CUME_DIST():rank / total_rows,范围 (0, 1]。

8 间隙检测(Gap Detection)

查找 ID 序列中的间隙。

SELECT
  id AS gap_start,
  next_id - 1 AS gap_end,
  next_id - id - 1 AS gap_size
FROM (
  SELECT
    id,
    LEAD(id, 1) OVER (ORDER BY id) AS next_id
  FROM orders
) t
WHERE next_id - id > 1
ORDER BY gap_start;

9 岛屿检测(Island Detection)

在状态序列中找出连续相同状态的"岛屿"。

WITH numbered AS (
  SELECT
    server_id,
    check_time,
    status,
    ROW_NUMBER() OVER (PARTITION BY server_id ORDER BY check_time) -
    ROW_NUMBER() OVER (PARTITION BY server_id, status ORDER BY check_time) AS island_id
  FROM server_status
)
SELECT
  server_id,
  status,
  MIN(check_time) AS island_start,
  MAX(check_time) AS island_end,
  COUNT(*)         AS duration_checks
FROM numbered
GROUP BY server_id, status, island_id
ORDER BY server_id, island_start;

**原理:**对全体行编号与按状态分组编号的差值,在连续相同状态的行中保持恒定,一旦状态切换差值就变化。

10 去重保留最新记录

DELETE t1 FROM user_addresses t1
INNER JOIN (
  SELECT id,
    ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY updated_at DESC
    ) AS rn
  FROM user_addresses
) t2 ON t1.id = t2.id
WHERE t2.rn > 1;

每个 user_id 只保留 updated_at 最新的一条地址记录。

11 累计占比(Pareto 分析)

SELECT
  product_name,
  revenue,
  SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
  ROUND(SUM(revenue) OVER (ORDER BY revenue DESC)
        / SUM(revenue) OVER () * 100, 1) AS cumulative_pct
FROM product_sales
ORDER BY revenue DESC;

快速识别贡献了 80% 收入的前 20% 产品。

12 分组内占比

SELECT
  department,
  name,
  salary,
  ROUND(salary / SUM(salary) OVER (PARTITION BY department) * 100, 1) AS pct_of_dept
FROM employees;

13 会话化(Sessionization)

将用户行为日志按 30 分钟超时切分为会话。

WITH events_with_gap AS (
  SELECT
    user_id,
    event_time,
    TIMESTAMPDIFF(MINUTE,
      LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time),
      event_time
    ) AS gap_minutes
  FROM user_events
),
sessions AS (
  SELECT
    user_id,
    event_time,
    SUM(CASE WHEN gap_minutes > 30 OR gap_minutes IS NULL THEN 1 ELSE 0 END)
      OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
  FROM events_with_gap
)
SELECT
  user_id,
  session_id,
  MIN(event_time) AS session_start,
  MAX(event_time) AS session_end,
  COUNT(*)         AS events_count,
  TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) AS duration_min
FROM sessions
GROUP BY user_id, session_id;

14 漏斗分析

WITH funnel AS (
  SELECT
    user_id,
    event_name,
    event_time,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, event_name ORDER BY event_time
    ) AS event_seq
  FROM user_events
  WHERE event_name IN ('page_view', 'add_to_cart', 'checkout', 'payment')
)
SELECT
  event_name,
  COUNT(DISTINCT user_id) AS users,
  ROUND(COUNT(DISTINCT user_id) /
    FIRST_VALUE(COUNT(DISTINCT user_id)) OVER (ORDER BY
      FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment')
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) * 100, 1) AS conversion_pct
FROM funnel
WHERE event_seq = 1
GROUP BY event_name
ORDER BY FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment');

15 中位数计算

WITH ordered AS (
  SELECT
    salary,
    ROW_NUMBER() OVER (ORDER BY salary) AS rn,
    COUNT(*) OVER () AS total
  FROM employees
)
SELECT AVG(salary) AS median_salary
FROM ordered
WHERE rn IN (FLOOR((total + 1) / 2), CEIL((total + 1) / 2));

16 前后行差值(Delta)

SELECT
  reading_time,
  meter_value,
  meter_value - LAG(meter_value, 1, meter_value) OVER (ORDER BY reading_time) AS delta
FROM electricity_readings
WHERE meter_id = 'M001'
ORDER BY reading_time;

17 排行榜变动追踪

WITH weekly_rank AS (
  SELECT
    week_num,
    player_name,
    score,
    RANK() OVER (PARTITION BY week_num ORDER BY score DESC) AS rnk
  FROM weekly_scores
)
SELECT
  w.week_num,
  w.player_name,
  w.score,
  w.rnk AS current_rank,
  LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) AS prev_rank,
  LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) - w.rnk AS rank_change
FROM weekly_rank w
ORDER BY w.week_num, w.rnk;

18 首次购买 vs 复购标记

SELECT
  order_id,
  customer_id,
  order_date,
  amount,
  CASE
    WHEN ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) = 1
    THEN 'First Purchase'
    ELSE 'Repeat Purchase'
  END AS purchase_type
FROM orders;

19 库存变动与余额

SELECT
  transaction_time,
  product_id,
  CASE type WHEN 'IN' THEN qty ELSE -qty END AS change,
  SUM(CASE type WHEN 'IN' THEN qty ELSE -qty END) OVER (
    PARTITION BY product_id
    ORDER BY transaction_time
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS balance
FROM inventory_transactions
ORDER BY product_id, transaction_time;

20 员工薪资与同级对比

SELECT
  name,
  department,
  job_level,
  salary,
  AVG(salary) OVER (PARTITION BY job_level) AS level_avg,
  MIN(salary) OVER (PARTITION BY job_level) AS level_min,
  MAX(salary) OVER (PARTITION BY job_level) AS level_max,
  NTILE(4) OVER (PARTITION BY job_level ORDER BY salary) AS salary_quartile,
  ROUND(PERCENT_RANK() OVER (PARTITION BY job_level ORDER BY salary) * 100, 0) AS percentile
FROM employees;

21 页面停留时间计算

SELECT
  user_id,
  page_url,
  view_time,
  LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time) AS next_view_time,
  TIMESTAMPDIFF(SECOND, view_time,
    LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time)
  ) AS stay_seconds
FROM page_views
WHERE session_id = 'abc123'
ORDER BY view_time;

22 分区内行间移动标准差

SELECT
  trade_date,
  close_price,
  ROUND(AVG(close_price) OVER w, 2) AS ma_20,
  ROUND(STDDEV_POP(close_price) OVER w, 2) AS std_20,
  ROUND(AVG(close_price) OVER w + 2 * STDDEV_POP(close_price) OVER w, 2) AS upper_band,
  ROUND(AVG(close_price) OVER w - 2 * STDDEV_POP(close_price) OVER w, 2) AS lower_band
FROM stock_prices
WHERE ticker = 'AAPL'
WINDOW w AS (ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)
ORDER BY trade_date;

布林带(Bollinger Bands)计算:20 日均线 +/- 2 倍标准差。

性能分析

窗口函数 vs 子查询 vs 临时表

窗口函数
相关子查询

性能对比实测

-- 场景:100 万行订单表,计算每用户累计订单金额

-- 方案 1:窗口函数(~0.8s)
SELECT order_id, user_id, amount,
  SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS running_total
FROM orders;

-- 方案 2:相关子查询(~45s)
SELECT o1.order_id, o1.user_id, o1.amount,
  (SELECT SUM(o2.amount) FROM orders o2
   WHERE o2.user_id = o1.user_id AND o2.order_date <= o1.order_date) AS running_total
FROM orders o1;

-- 方案 3:临时表 + 变量(MySQL 5.7 兼容,~1.2s,但行为不可靠)
SET @running := 0, @prev_user := '';
SELECT order_id, user_id, amount,
  @running := IF(@prev_user = user_id, @running + amount, amount) AS running_total,
  @prev_user := user_id
FROM orders ORDER BY user_id, order_date;

**MySQL 8.0 中用户变量赋值的行为未定义:**MySQL 优化器可能以任意顺序求值 SELECT 列表中的表达式。方案 3 在 MySQL 8.0 中不可靠,应使用窗口函数替代。

EXPLAIN 分析窗口函数

EXPLAIN FORMAT=TREE
SELECT department, name, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;

/* 输出示例:
-> Window aggregate with buffering: rank() OVER (
      PARTITION BY department ORDER BY salary DESC)
  -> Sort: employees.department, employees.salary DESC
    -> Table scan on employees  (cost=10.25 rows=100)
*/

关键观察点:

优化建议

  1. 为 PARTITION BY + ORDER BY 创建索引 -- 如果窗口函数用于子查询/CTE 且可下推,合适的复合索引可避免排序
  2. 减少参与窗口计算的行数 -- 先用 WHERE 过滤,再应用窗口函数
  3. 合并窗口定义 -- 使用 WINDOW 子句复用相同的 PARTITION BY + ORDER BY
  4. 避免不必要的 ORDER BY -- 如果只需要全局聚合(如 SUM() OVER()),省略 ORDER BY
  5. 控制帧大小 -- ROWS 帧通常比 RANGE 帧更高效
  6. 注意内存使用 -- sort_buffer_sizetmp_table_size 会影响窗口函数性能
-- 创建索引优化窗口函数
ALTER TABLE orders ADD INDEX idx_user_date (user_id, order_date);

-- 先过滤再排名
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rn
  FROM orders
  WHERE order_date >= '2024-01-01'  -- 先过滤减少数据量
) t WHERE rn = 1;

MySQL 8.0 特性与限制

MySQL 8.0 支持的窗口函数一览

类别 函数 最低版本
类别 函数 最低版本
排名 ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE() 8.0.2
取值 LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE() 8.0.2
分布 PERCENT_RANK(), CUME_DIST() 8.0.2
聚合窗口 SUM(), AVG(), COUNT(), MIN(), MAX(), BIT_AND(), BIT_OR(), BIT_XOR(), STDDEV_POP(), STDDEV_SAMP(), VAR_POP(), VAR_SAMP(), JSON_ARRAYAGG(), JSON_OBJECTAGG() 8.0.2
帧模式 ROWS, RANGE, GROUPS 8.0.2(GROUPS 8.0.2)
帧排除 EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES, EXCLUDE NO OTHERS 不支持
命名窗口 WINDOW w AS (...) 8.0.2

已知限制

FILTER 子句的替代方案

-- 标准 SQL(PostgreSQL 支持,MySQL 不支持):
SELECT COUNT(*) FILTER (WHERE status = 'active') OVER (PARTITION BY dept)
FROM employees;

-- MySQL 替代方案:
SELECT
  SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) OVER (PARTITION BY dept) AS active_count
FROM employees;

COUNT(DISTINCT) OVER() 的替代方案

-- 不支持:
-- SELECT COUNT(DISTINCT category) OVER (PARTITION BY dept) FROM products;

-- 替代方案 1:子查询
SELECT p.*,
  (SELECT COUNT(DISTINCT p2.category) FROM products p2 WHERE p2.dept = p.dept) AS distinct_cats
FROM products p;

-- 替代方案 2:DENSE_RANK 技巧
SELECT *,
  MAX(dr) OVER (PARTITION BY dept) AS distinct_cats
FROM (
  SELECT *, DENSE_RANK() OVER (PARTITION BY dept ORDER BY category) AS dr
  FROM products
) t;

执行顺序

窗口函数在 SQL 逻辑执行顺序中位于SELECT 阶段

  1. FROM / JOIN
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. 窗口函数(WINDOW)
  6. SELECT
  7. DISTINCT
  8. ORDER BY
  9. LIMIT

这意味着窗口函数可以引用 SELECT 列表中的别名聚合结果,但不能在 WHERE 或 HAVING 中使用窗口函数的结果。

常见问题(FAQ)

Q1: 窗口函数和 GROUP BY 有什么区别?

GROUP BY 将多行折叠为一行,每组只输出一条结果。窗口函数保留所有原始行,在每行上附加聚合/排名计算。例如,GROUP BY department 只返回各部门一行,而 SUM(salary) OVER (PARTITION BY department) 保留每个员工行,同时每行都附带部门总薪资。

Q2: 为什么 LAST_VALUE() 总是返回当前行的值?

因为默认帧是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW。帧的末尾就是当前行,所以 LAST_VALUE 返回当前行的值。要获取分区内真正的最后一个值,必须显式设置帧为 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Q3: 能否在 WHERE 子句中使用窗口函数?

不能。窗口函数在 WHERE 之后执行。要过滤窗口函数的结果,需要将查询包装在子查询或 CTE 中:SELECT * FROM (SELECT *, ROW_NUMBER() OVER (...) AS rn FROM t) sub WHERE rn = 1;

Q4: 窗口函数会导致排序吗?如何优化?

几乎总是会。PARTITION BY 和 ORDER BY 都需要排序。优化方法:(1) 为 PARTITION BY + ORDER BY 列创建复合索引;(2) 使用 WINDOW 子句复用相同的窗口定义,让优化器只排序一次;(3) 先用 WHERE 过滤减少数据量;(4) 如果只需聚合不需要排序,省略 ORDER BY。

Q5: ROW_NUMBER() 结果值相同时顺序不确定怎么办?

ORDER BY 中增加一个唯一列(通常是主键)作为 tiebreaker:ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC, id ASC)。这样即使 salary 相同,排序也是确定性的。

Q6: MySQL 5.7 如何实现窗口函数的等效功能?

MySQL 5.7 不支持窗口函数。常见替代方案:(1) 排名用用户变量 @rn := @rn + 1(MySQL 8.0 中行为不可靠);(2) 累计求和用相关子查询;(3) LAG/LEAD 用 self-join。但这些方案性能通常差很多,建议升级到 MySQL 8.0。

Q7: 多个窗口函数是否会多次排序?

取决于 PARTITION BY 和 ORDER BY 是否相同。相同窗口定义的多个函数只排序一次。不同窗口定义则需要分别排序。使用 WINDOW 子句复用窗口定义,让优化器合并排序。通过 EXPLAIN FORMAT=TREE 可以查看实际排序次数。

Q8: ROWS 和 RANGE 帧模式该如何选择?

ROWS:按物理行数,行为更可预测,性能更好。大多数情况优先使用。RANGE:按值范围,当 ORDER BY 列有重复值且需要将相同值视为一组时使用。如果需要基于时间间隔(如 RANGE BETWEEN INTERVAL 7 DAY PRECEDING)则必须用 RANGE。GROUPS:按同值组,介于 ROWS 和 RANGE 之间,适合同值组粒度的计算。

Q9: 窗口函数能用于 UPDATE 语句吗?

不能直接在 UPDATE SET 子句中使用窗口函数。但可以通过 JOIN 子查询实现:UPDATE orders o JOIN (SELECT id, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn FROM orders) t ON o.id = t.id SET o.seq = t.rn;

Q10: 窗口函数和 LIMIT 配合使用有什么注意事项?

LIMIT 在窗口函数之后执行,所以窗口函数会先计算所有行再截断。如果你只需要 Top-N,窗口函数仍然要处理全部数据。对于大表,可以先用 WHERE 缩小范围,或者考虑 ORDER BY + LIMIT 这种不需要窗口函数的写法是否能满足需求。

本章评分
4.8  / 5  (3 评分)

💬 留言讨论