Chapter 44

Window Functions in Practice

MySQL Window Functions — Complete Guide

Window functions are one of the most important features introduced in MySQL 8.0. Unlike regular aggregate functions, window functions do not collapse rows -- every row remains in the result set while simultaneously having access to a set of related rows (the "window") for computation. This makes ranking, running totals, moving averages, year-over-year comparisons, and other analytical tasks that previously required complex subqueries both concise and efficient.

Version requirement: Window functions require MySQL 8.0 or higher. MariaDB 10.2+ also supports window functions, but with minor syntax and behavior differences. All examples in this chapter are based on MySQL 8.0/8.4.

This chapter systematically covers the syntax and semantics of all window functions, demonstrates 20+ real-world business scenarios, and provides in-depth analysis of execution internals and performance optimization strategies.

Window Function Syntax

The complete syntax of a window function consists of three parts: the function call, the OVER clause, and an optional named window.

function_name([args]) OVER ( [PARTITION BY partition_expression, ...] [ORDER BY sort_expression [ASC|DESC], ...] [frame_clause] )

-- frame_clause: {ROWS | RANGE | GROUPS} BETWEEN frame_start AND frame_end

-- frame_start / frame_end: UNBOUNDED PRECEDING | N PRECEDING | CURRENT ROW | N FOLLOWING | UNBOUNDED FOLLOWING

Core Concepts

PARTITION BY -- Divides the result set into partitions; the window function computes independently within each partition, similar to GROUP BY but without collapsing rows
ORDER BY -- Defines the ordering of rows within a partition; affects ranking results and frame boundaries
Frame Clause -- Defines the range of rows the current row can "see", controlling the computation boundary of aggregate windows

Named Windows (WINDOW Clause)

When multiple window functions share the same window definition, use the WINDOW clause to name and reuse it:

SELECT
  employee_id,
  department,
  salary,
  ROW_NUMBER() OVER w AS rn,
  RANK()       OVER w AS rnk,
  DENSE_RANK() OVER w AS dense_rnk
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

Ranking Functions

ROW_NUMBER()

Assigns a unique sequential number (1, 2, 3, ...) to each row within a partition, even when values are identical -- no ties.

Syntax

ROW_NUMBER() OVER (
  [PARTITION BY col, ...]
  ORDER BY col [ASC|DESC], ...
)

Sample Data

id	department	name	salary
id	department	name	salary
1	Engineering	Alice	12000
2	Engineering	Bob	11000
3	Engineering	Charlie	12000
4	Sales	Diana	9000
5	Sales	Eve	9500

Query and Result

SELECT
  name, department, salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;

name	department	salary	rn
name	department	salary	rn
Alice	Engineering	12000	1
Charlie	Engineering	12000	2
Bob	Engineering	11000	3
Eve	Sales	9500	1
Diana	Sales	9000	2

Note: Alice and Charlie have the same salary, but ROW_NUMBER() still assigns different numbers (1 and 2). The specific ordering among equal values is non-deterministic and depends on the order in which the storage engine returns rows. For deterministic ordering, add an additional column (such as the primary key) to the ORDER BY clause.

RANK()

Similar to ROW_NUMBER(), but allows ties. Rows with the same value receive the same rank, and the next different value's rank skips past the tied rows.

SELECT
  name, department, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;

name	department	salary	rnk
name	department	salary	rnk
Alice	Engineering	12000	1
Charlie	Engineering	12000	1
Bob	Engineering	11000	3
Eve	Sales	9500	1
Diana	Sales	9000	2

Note that Bob's rank is 3 (not 2), because two people are tied at rank 1.

DENSE_RANK()

Like RANK(), allows ties, but the next different value's rank does not skip -- it stays consecutive.

SELECT
  name, department, salary,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;

name	department	salary	dense_rnk
name	department	salary	dense_rnk
Alice	Engineering	12000	1
Charlie	Engineering	12000	1
Bob	Engineering	11000	2
Eve	Sales	9500	1
Diana	Sales	9000	2

Bob is now rank 2 (immediately following the tied rank 1).

ROW_NUMBER vs RANK vs DENSE_RANK Comparison

Function	Ties	Gaps	Use Case
Function	Ties	Gaps	Use Case
ROW_NUMBER()	No ties	None	Pagination, deduplication, unique numbering
RANK()	Allows ties	Gaps after ties	Competition ranking, exam scores
DENSE_RANK()	Allows ties	No gaps	Salary grades, continuous ranking

NTILE(n)

Divides rows within a partition into n roughly equal buckets and returns the bucket number (1 to n). If the row count is not evenly divisible by n, earlier buckets receive one extra row.

SELECT
  name, salary,
  NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

name	salary	quartile
name	salary	quartile
Alice	12000	1
Charlie	12000	1
Bob	11000	2
Eve	9500	3
Diana	9000	4

Typical uses: Computing percentiles (NTILE(100)), quartiles (NTILE(4)), splitting users into A/B test groups.

Value Functions

LAG(expr, offset, default)

Accesses the value from a row before the current row by offset positions. offset defaults to 1, default defaults to NULL.

Syntax

LAG(expression [, offset [, default_value]])
  OVER (
    [PARTITION BY col, ...]
    ORDER BY col [ASC|DESC], ...
  )

Example: Month-over-Month Revenue

SELECT
  month,
  revenue,
  LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month,
  ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
        / LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly_sales;

month	revenue	prev_month	mom_pct
month	revenue	prev_month	mom_pct
2024-01	100000	0	NULL
2024-02	120000	100000	20.0
2024-03	115000	120000	-4.2
2024-04	130000	115000	13.0

LEAD(expr, offset, default)

Accesses the value from a row after the current row by offset positions. The opposite direction of LAG.

SELECT
  order_id,
  customer_id,
  order_date,
  LEAD(order_date, 1) OVER (
    PARTITION BY customer_id ORDER BY order_date
  ) AS next_order_date,
  DATEDIFF(
    LEAD(order_date, 1) OVER (PARTITION BY customer_id ORDER BY order_date),
    order_date
  ) AS days_to_next
FROM orders;

order_id	customer_id	order_date	next_order_date	days_to_next
order_id	customer_id	order_date	next_order_date	days_to_next
101	1	2024-01-05	2024-01-20	15
105	1	2024-01-20	2024-03-10	50
112	1	2024-03-10	NULL	NULL

FIRST_VALUE(expr)

Returns the value from the first row in the window frame.

SELECT
  name, department, salary,
  FIRST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS top_earner
FROM employees;

name	department	salary	top_earner
name	department	salary	top_earner
Alice	Engineering	12000	Alice
Charlie	Engineering	12000	Alice
Bob	Engineering	11000	Alice
Eve	Sales	9500	Eve
Diana	Sales	9000	Eve

LAST_VALUE(expr)

Returns the value from the last row in the window frame.

Common pitfall: The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE() will only return the last value up to the current row (i.e., the current row itself). To get the true last row of the partition, you must explicitly specify the frame as ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

-- Wrong usage (common mistake):
SELECT name, salary,
  LAST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
  ) AS lowest_earner   -- Always returns the current row's name!
FROM employees;

-- Correct usage:
SELECT name, salary,
  LAST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS lowest_earner   -- Returns the person with lowest salary in partition
FROM employees;

NTH_VALUE(expr, n)

Returns the value from the n-th row in the window frame. n starts at 1.

SELECT
  name, department, salary,
  NTH_VALUE(name, 2) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS second_highest
FROM employees;

name	department	salary	second_highest
name	department	salary	second_highest
Alice	Engineering	12000	Charlie
Charlie	Engineering	12000	Charlie
Bob	Engineering	11000	Charlie
Eve	Sales	9500	Diana
Diana	Sales	9000	Diana

Value Functions Comparison

Function	Direction	Use Case	Notes
Function	Direction	Use Case	Notes
LAG	Look back (previous row)	MoM, delta calculations	Returns default at partition start
LEAD	Look ahead (next row)	Predict next event, gap calculation	Returns default at partition end
FIRST_VALUE	First row in frame	Group min/max	Usually no frame change needed
LAST_VALUE	Last row in frame	Group min/max	Must set frame explicitly!
NTH_VALUE	N-th row in frame	2nd place, 3rd place	Needs explicit frame

Aggregate Window Functions

All standard aggregate functions can be used as window functions with OVER(). Unlike regular aggregates, window aggregates do not collapse rows.

SUM() OVER()

Computes a cumulative sum within the window frame.

SELECT
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date) AS running_total,
  SUM(amount) OVER () AS grand_total
FROM orders;

order_date	amount	running_total	grand_total
order_date	amount	running_total	grand_total
2024-01-01	100	100	850
2024-01-02	200	300	850
2024-01-03	150	450	850
2024-01-04	250	700	850
2024-01-05	150	850	850

AVG() OVER()

Computes the average within the window frame.

-- Compare each employee's salary against department average
SELECT
  name, department, salary,
  ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
  salary - ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS diff_from_avg
FROM employees;

name	department	salary	dept_avg	diff_from_avg
name	department	salary	dept_avg	diff_from_avg
Alice	Engineering	12000	11667	333
Charlie	Engineering	12000	11667	333
Bob	Engineering	11000	11667	-667
Eve	Sales	9500	9250	250
Diana	Sales	9000	9250	-250

COUNT() OVER()

-- Each order's share of department total
SELECT
  order_id, department,
  COUNT(*) OVER (PARTITION BY department) AS dept_orders,
  ROUND(1.0 / COUNT(*) OVER (PARTITION BY department) * 100, 1) AS pct
FROM orders;

MIN() / MAX() OVER()

-- Compare against partition extremes
SELECT
  name, department, salary,
  MIN(salary) OVER (PARTITION BY department) AS dept_min,
  MAX(salary) OVER (PARTITION BY department) AS dept_max,
  salary - MIN(salary) OVER (PARTITION BY department) AS above_min
FROM employees;

PARTITION BY Deep Dive

PARTITION BY divides the result set into multiple partitions where the window function computes independently. It is similar to GROUP BY but does not collapse rows.

Multi-Column Partition

-- Partition by year and department for ranking
SELECT
  YEAR(hire_date) AS hire_year,
  department,
  name,
  salary,
  RANK() OVER (
    PARTITION BY YEAR(hire_date), department
    ORDER BY salary DESC
  ) AS rank_in_year_dept
FROM employees;

PARTITION BY Expressions

-- Partition by salary tier
SELECT
  name, salary,
  CASE
    WHEN salary >= 10000 THEN 'High'
    WHEN salary >= 7000  THEN 'Mid'
    ELSE 'Low'
  END AS tier,
  ROW_NUMBER() OVER (
    PARTITION BY CASE
      WHEN salary >= 10000 THEN 'High'
      WHEN salary >= 7000  THEN 'Mid'
      ELSE 'Low'
    END
    ORDER BY salary DESC
  ) AS rank_in_tier
FROM employees;

No PARTITION BY (Global Window)

When PARTITION BY is omitted, the entire result set is treated as a single partition:

-- Global ranking
SELECT name, salary,
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;

Frame Clauses

Frame clauses define the range of rows within a partition that the window function can "see". Only aggregate window functions (SUM, AVG, COUNT, MIN, MAX) and value functions (FIRST_VALUE, LAST_VALUE, NTH_VALUE) are affected by frame clauses. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) and LAG/LEAD ignore frame clauses.

ROWS Mode

Defines the frame based on physical row count.

-- 3-day moving average
SELECT
  order_date,
  amount,
  ROUND(AVG(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ), 2) AS moving_avg_3
FROM daily_sales;

order_date	amount	moving_avg_3
order_date	amount	moving_avg_3
2024-01-01	100	100.00
2024-01-02	200	150.00
2024-01-03	150	150.00
2024-01-04	300	216.67
2024-01-05	250	233.33

RANGE Mode

Defines the frame based on value range. Unlike ROWS, RANGE considers value magnitude.

-- Cumulative sum up to current value (includes all rows with same value)
SELECT
  salary,
  name,
  SUM(salary) OVER (
    ORDER BY salary
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_salary
FROM employees;

ROWS vs RANGE: When the ORDER BY column has duplicate values, RANGE treats all rows with the same value as a group and includes them all, while ROWS strictly counts physical rows. In most cases, prefer ROWS because its behavior is more predictable and performance is better.

GROUPS Mode (MySQL 8.0.2+)

Defines the frame based on peer groups (groups of rows with the same ORDER BY value).

-- Current group + 1 group before and after
SELECT
  department, salary,
  SUM(salary) OVER (
    ORDER BY salary
    GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) AS group_sum
FROM employees;

Common Frame Clauses Quick Reference

Frame Definition	Meaning	Typical Use
Frame Definition	Meaning	Typical Use
`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`	From partition start to current row	Running total
`ROWS BETWEEN N PRECEDING AND CURRENT ROW`	Previous N rows to current row	Moving average
`ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING`	Current row to partition end	Remaining sum
`ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`	Entire partition	LAST_VALUE, NTH_VALUE
`ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING`	1 row before + current + 1 row after	Smoothing
`RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW`	Within 7 days before current date	7-day rolling aggregate

Default Frame Behavior

When the frame clause is explicitly omitted, MySQL uses these defaults:

With ORDER BY: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Without ORDER BY: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (entire partition)

20+ Real-World Examples

Find each user's maximum consecutive login streak. Key idea: use ROW_NUMBER() to generate a sequence, then subtract it from the date to create a "group identifier".

WITH login_groups AS (
  SELECT
    user_id,
    login_date,
    login_date - INTERVAL ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY login_date
    ) DAY AS grp
  FROM (
    SELECT DISTINCT user_id, DATE(login_time) AS login_date
    FROM user_logins
  ) t
)
SELECT
  user_id,
  MIN(login_date) AS streak_start,
  MAX(login_date) AS streak_end,
  COUNT(*)         AS consecutive_days
FROM login_groups
GROUP BY user_id, grp
ORDER BY consecutive_days DESC;

How it works: For a consecutive date sequence, date - ROW_NUMBER() yields a constant. For example, Jan 1 (rn=1), Jan 2 (rn=2), Jan 3 (rn=3) all produce Dec 31 after subtraction. When there is a gap, the difference changes, forming a new group.

2 Running Total

SELECT
  transaction_date,
  amount,
  SUM(amount) OVER (
    ORDER BY transaction_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM transactions
WHERE account_id = 1001
ORDER BY transaction_date;

3 7-Day Moving Average

SELECT
  sale_date,
  daily_revenue,
  ROUND(AVG(daily_revenue) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ), 2) AS ma_7
FROM daily_sales;

Note: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means 7 rows including the current one.

4 Top-N Per Group (Top 3 Salaries Per Department)

SELECT * FROM (
  SELECT
    department,
    name,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department ORDER BY salary DESC
    ) AS rn
  FROM employees
) ranked
WHERE rn <= 3;

If ties are allowed, replace ROW_NUMBER() with DENSE_RANK().

5 Year-over-Year (YoY) Comparison

SELECT
  YEAR(order_date) AS yr,
  MONTH(order_date) AS mo,
  SUM(amount) AS monthly_revenue,
  LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) AS same_month_last_year,
  ROUND(
    (SUM(amount) - LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)))
    / LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) * 100, 1
  ) AS yoy_pct
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY yr, mo;

6 Month-over-Month (MoM) Comparison

WITH monthly AS (
  SELECT
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    SUM(amount) AS revenue
  FROM orders
  GROUP BY DATE_FORMAT(order_date, '%Y-%m')
)
SELECT
  month,
  revenue,
  LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
  ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
        / LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly;

7 Percentile Rank

SELECT
  name,
  salary,
  ROUND(PERCENT_RANK() OVER (ORDER BY salary) * 100, 1) AS percentile,
  CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;

PERCENT_RANK(): (rank - 1) / (total_rows - 1), range [0, 1]. CUME_DIST(): rank / total_rows, range (0, 1].

8 Gap Detection

Find gaps in an ID sequence.

SELECT
  id AS gap_start,
  next_id - 1 AS gap_end,
  next_id - id - 1 AS gap_size
FROM (
  SELECT
    id,
    LEAD(id, 1) OVER (ORDER BY id) AS next_id
  FROM orders
) t
WHERE next_id - id > 1
ORDER BY gap_start;

9 Island Detection

Find consecutive "islands" of the same status in a status sequence.

WITH numbered AS (
  SELECT
    server_id,
    check_time,
    status,
    ROW_NUMBER() OVER (PARTITION BY server_id ORDER BY check_time) -
    ROW_NUMBER() OVER (PARTITION BY server_id, status ORDER BY check_time) AS island_id
  FROM server_status
)
SELECT
  server_id,
  status,
  MIN(check_time) AS island_start,
  MAX(check_time) AS island_end,
  COUNT(*)         AS duration_checks
FROM numbered
GROUP BY server_id, status, island_id
ORDER BY server_id, island_start;

How it works: The difference between a global row number and a per-status row number stays constant within consecutive rows of the same status. When the status changes, the difference changes, forming a new group.

10 Deduplicate Keeping Latest Record

DELETE t1 FROM user_addresses t1
INNER JOIN (
  SELECT id,
    ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY updated_at DESC
    ) AS rn
  FROM user_addresses
) t2 ON t1.id = t2.id
WHERE t2.rn > 1;

Keeps only the most recently updated address record per user_id.

11 Cumulative Percentage (Pareto Analysis)

SELECT
  product_name,
  revenue,
  SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
  ROUND(SUM(revenue) OVER (ORDER BY revenue DESC)
        / SUM(revenue) OVER () * 100, 1) AS cumulative_pct
FROM product_sales
ORDER BY revenue DESC;

Quickly identify the top 20% of products contributing 80% of revenue.

12 Percentage Within Group

SELECT
  department,
  name,
  salary,
  ROUND(salary / SUM(salary) OVER (PARTITION BY department) * 100, 1) AS pct_of_dept
FROM employees;

13 Sessionization

Split user behavior logs into sessions using a 30-minute timeout.

WITH events_with_gap AS (
  SELECT
    user_id,
    event_time,
    TIMESTAMPDIFF(MINUTE,
      LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time),
      event_time
    ) AS gap_minutes
  FROM user_events
),
sessions AS (
  SELECT
    user_id,
    event_time,
    SUM(CASE WHEN gap_minutes > 30 OR gap_minutes IS NULL THEN 1 ELSE 0 END)
      OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
  FROM events_with_gap
)
SELECT
  user_id,
  session_id,
  MIN(event_time) AS session_start,
  MAX(event_time) AS session_end,
  COUNT(*)         AS events_count,
  TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) AS duration_min
FROM sessions
GROUP BY user_id, session_id;

14 Funnel Analysis

WITH funnel AS (
  SELECT
    user_id,
    event_name,
    event_time,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, event_name ORDER BY event_time
    ) AS event_seq
  FROM user_events
  WHERE event_name IN ('page_view', 'add_to_cart', 'checkout', 'payment')
)
SELECT
  event_name,
  COUNT(DISTINCT user_id) AS users,
  ROUND(COUNT(DISTINCT user_id) /
    FIRST_VALUE(COUNT(DISTINCT user_id)) OVER (ORDER BY
      FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment')
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) * 100, 1) AS conversion_pct
FROM funnel
WHERE event_seq = 1
GROUP BY event_name
ORDER BY FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment');

15 Median Calculation

WITH ordered AS (
  SELECT
    salary,
    ROW_NUMBER() OVER (ORDER BY salary) AS rn,
    COUNT(*) OVER () AS total
  FROM employees
)
SELECT AVG(salary) AS median_salary
FROM ordered
WHERE rn IN (FLOOR((total + 1) / 2), CEIL((total + 1) / 2));

16 Row-to-Row Delta

SELECT
  reading_time,
  meter_value,
  meter_value - LAG(meter_value, 1, meter_value) OVER (ORDER BY reading_time) AS delta
FROM electricity_readings
WHERE meter_id = 'M001'
ORDER BY reading_time;

17 Leaderboard Rank Change Tracking

WITH weekly_rank AS (
  SELECT
    week_num,
    player_name,
    score,
    RANK() OVER (PARTITION BY week_num ORDER BY score DESC) AS rnk
  FROM weekly_scores
)
SELECT
  w.week_num,
  w.player_name,
  w.score,
  w.rnk AS current_rank,
  LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) AS prev_rank,
  LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) - w.rnk AS rank_change
FROM weekly_rank w
ORDER BY w.week_num, w.rnk;

18 First Purchase vs Repeat Purchase

SELECT
  order_id,
  customer_id,
  order_date,
  amount,
  CASE
    WHEN ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) = 1
    THEN 'First Purchase'
    ELSE 'Repeat Purchase'
  END AS purchase_type
FROM orders;

19 Inventory Movement and Balance

SELECT
  transaction_time,
  product_id,
  CASE type WHEN 'IN' THEN qty ELSE -qty END AS change,
  SUM(CASE type WHEN 'IN' THEN qty ELSE -qty END) OVER (
    PARTITION BY product_id
    ORDER BY transaction_time
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS balance
FROM inventory_transactions
ORDER BY product_id, transaction_time;

20 Employee Salary vs Peer Level Comparison

SELECT
  name,
  department,
  job_level,
  salary,
  AVG(salary) OVER (PARTITION BY job_level) AS level_avg,
  MIN(salary) OVER (PARTITION BY job_level) AS level_min,
  MAX(salary) OVER (PARTITION BY job_level) AS level_max,
  NTILE(4) OVER (PARTITION BY job_level ORDER BY salary) AS salary_quartile,
  ROUND(PERCENT_RANK() OVER (PARTITION BY job_level ORDER BY salary) * 100, 0) AS percentile
FROM employees;

21 Page Dwell Time Calculation

SELECT
  user_id,
  page_url,
  view_time,
  LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time) AS next_view_time,
  TIMESTAMPDIFF(SECOND, view_time,
    LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time)
  ) AS stay_seconds
FROM page_views
WHERE session_id = 'abc123'
ORDER BY view_time;

22 Bollinger Bands (Moving Standard Deviation)

SELECT
  trade_date,
  close_price,
  ROUND(AVG(close_price) OVER w, 2) AS ma_20,
  ROUND(STDDEV_POP(close_price) OVER w, 2) AS std_20,
  ROUND(AVG(close_price) OVER w + 2 * STDDEV_POP(close_price) OVER w, 2) AS upper_band,
  ROUND(AVG(close_price) OVER w - 2 * STDDEV_POP(close_price) OVER w, 2) AS lower_band
FROM stock_prices
WHERE ticker = 'AAPL'
WINDOW w AS (ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)
ORDER BY trade_date;

Bollinger Bands: 20-day moving average +/- 2 standard deviations.

Performance Analysis

Window Functions vs Subqueries vs Temp Tables

Window Functions

Single scan for multiple computations
Optimizer can merge multiple OVER clauses
Clean code, high maintainability
Sorting is the primary cost

Correlated Subqueries

Executes subquery per row (O(n^2))
Acceptable for small datasets, extremely slow for large tables
Can leverage indexes
Compatible with MySQL 5.7

Performance Benchmark

-- Scenario: 1 million row orders table, compute running total per user

-- Approach 1: Window function (~0.8s)
SELECT order_id, user_id, amount,
  SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS running_total
FROM orders;

-- Approach 2: Correlated subquery (~45s)
SELECT o1.order_id, o1.user_id, o1.amount,
  (SELECT SUM(o2.amount) FROM orders o2
   WHERE o2.user_id = o1.user_id AND o2.order_date <= o1.order_date) AS running_total
FROM orders o1;

-- Approach 3: Temp table + variables (MySQL 5.7 compatible, ~1.2s, unreliable)
SET @running := 0, @prev_user := '';
SELECT order_id, user_id, amount,
  @running := IF(@prev_user = user_id, @running + amount, amount) AS running_total,
  @prev_user := user_id
FROM orders ORDER BY user_id, order_date;

User variable assignment behavior is undefined in MySQL 8.0: The MySQL optimizer may evaluate expressions in the SELECT list in any order. Approach 3 is unreliable in MySQL 8.0 and should be replaced with window functions.

EXPLAIN Analysis for Window Functions

EXPLAIN FORMAT=TREE
SELECT department, name, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;

/* Example output:
-> Window aggregate with buffering: rank() OVER (
      PARTITION BY department ORDER BY salary DESC)
  -> Sort: employees.department, employees.salary DESC
    -> Table scan on employees  (cost=10.25 rows=100)
*/

Key observations:

Sort node: Window functions almost always require sorting. A suitable index may avoid the sort
Window aggregate with buffering: MySQL needs to buffer rows for window computation. For large datasets, this means more memory usage
Multiple windows: If PARTITION BY and ORDER BY differ across windows, multiple sorts may be needed

Optimization Tips

Create indexes for PARTITION BY + ORDER BY -- If the window function is used in a subquery/CTE that can be pushed down, a suitable composite index can avoid sorting
Reduce the number of rows participating in window computation -- Filter with WHERE first, then apply window functions
Consolidate window definitions -- Use the WINDOW clause to reuse the same PARTITION BY + ORDER BY so the optimizer sorts only once
Avoid unnecessary ORDER BY -- If you only need global aggregation (e.g., SUM() OVER()), omit ORDER BY
Control frame size -- ROWS frames are typically more efficient than RANGE frames
Mind memory usage -- sort_buffer_size and tmp_table_size affect window function performance

-- Create an index to optimize window functions
ALTER TABLE orders ADD INDEX idx_user_date (user_id, order_date);

-- Filter first, then rank
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rn
  FROM orders
  WHERE order_date >= '2024-01-01'  -- Filter first to reduce data volume
) t WHERE rn = 1;

MySQL 8.0 Features and Limitations

MySQL 8.0 Supported Window Functions

Category	Functions	Minimum Version
Category	Functions	Minimum Version
Ranking	ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()	8.0.2
Value	LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()	8.0.2
Distribution	PERCENT_RANK(), CUME_DIST()	8.0.2
Aggregate Windows	SUM(), AVG(), COUNT(), MIN(), MAX(), BIT_AND(), BIT_OR(), BIT_XOR(), STDDEV_POP(), STDDEV_SAMP(), VAR_POP(), VAR_SAMP(), JSON_ARRAYAGG(), JSON_OBJECTAGG()	8.0.2
Frame Modes	ROWS, RANGE, GROUPS	8.0.2 (GROUPS 8.0.2)
Frame Exclusion	EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES, EXCLUDE NO OTHERS	Not supported
Named Windows	WINDOW w AS (...)	8.0.2

Known Limitations

No EXCLUDE clause: The SQL standard's EXCLUDE CURRENT ROW, EXCLUDE GROUP, and other frame exclusion clauses are not implemented in MySQL
No FILTER clause: The SQL standard's COUNT(*) FILTER (WHERE ...) is unavailable; use COUNT(CASE WHEN ... THEN 1 END) instead
No DISTINCT window aggregate: COUNT(DISTINCT col) OVER () raises an error; requires a subquery workaround
Window functions cannot be nested: You cannot call a window function inside another window function
Cannot be used in WHERE/HAVING: Window functions can only appear in SELECT and ORDER BY clauses. To filter on results, wrap in a subquery
GROUP BY + window functions: Window functions execute after GROUP BY, so they operate on the aggregated result rows

FILTER Clause Workaround

-- Standard SQL (PostgreSQL supports, MySQL does not):
SELECT COUNT(*) FILTER (WHERE status = 'active') OVER (PARTITION BY dept)
FROM employees;

-- MySQL workaround:
SELECT
  SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) OVER (PARTITION BY dept) AS active_count
FROM employees;

COUNT(DISTINCT) OVER() Workaround

-- Not supported:
-- SELECT COUNT(DISTINCT category) OVER (PARTITION BY dept) FROM products;

-- Workaround 1: Subquery
SELECT p.*,
  (SELECT COUNT(DISTINCT p2.category) FROM products p2 WHERE p2.dept = p.dept) AS distinct_cats
FROM products p;

-- Workaround 2: DENSE_RANK trick
SELECT *,
  MAX(dr) OVER (PARTITION BY dept) AS distinct_cats
FROM (
  SELECT *, DENSE_RANK() OVER (PARTITION BY dept ORDER BY category) AS dr
  FROM products
) t;

Execution Order

Window functions sit at the SELECT stage in SQL's logical execution order:

FROM / JOIN
WHERE
GROUP BY
HAVING
Window Functions (WINDOW)
SELECT
DISTINCT
ORDER BY
LIMIT

This means window functions can reference aliased aggregate results in the SELECT list, but you cannot use window function results in WHERE or HAVING.

Frequently Asked Questions (FAQ)

Q1: What is the difference between window functions and GROUP BY?

GROUP BY collapses multiple rows into one, outputting a single result per group. Window functions preserve all original rows and attach aggregate/ranking computations to each row. For example, GROUP BY department returns one row per department, while SUM(salary) OVER (PARTITION BY department) preserves every employee row with the department total attached to each.

Q2: Why does LAST_VALUE() always return the current row's value?

Because the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The end of the frame is the current row, so LAST_VALUE returns the current row's value. To get the true last value in the partition, you must explicitly set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Q3: Can I use window functions in the WHERE clause?

No. Window functions execute after WHERE. To filter on window function results, wrap the query in a subquery or CTE: SELECT * FROM (SELECT *, ROW_NUMBER() OVER (...) AS rn FROM t) sub WHERE rn = 1;

Q4: Do window functions cause sorting? How to optimize?

Almost always. Both PARTITION BY and ORDER BY require sorting. Optimization strategies: (1) Create composite indexes for PARTITION BY + ORDER BY columns; (2) Use the WINDOW clause to reuse the same window definition so the optimizer sorts only once; (3) Filter with WHERE first to reduce data volume; (4) If you only need aggregation without ordering, omit ORDER BY.

Q5: How to handle non-deterministic ROW_NUMBER() ordering for equal values?

Add a unique column (typically the primary key) as a tiebreaker in ORDER BY: ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC, id ASC). This ensures deterministic ordering even when salary values are identical.

Q6: How to achieve window function equivalents in MySQL 5.7?

MySQL 5.7 does not support window functions. Common workarounds: (1) Ranking via user variables @rn := @rn + 1 (unreliable in MySQL 8.0); (2) Running totals via correlated subqueries; (3) LAG/LEAD via self-joins. These alternatives typically perform much worse. We recommend upgrading to MySQL 8.0.

Q7: Do multiple window functions cause multiple sorts?

It depends on whether PARTITION BY and ORDER BY are the same. Multiple functions with the same window definition only sort once. Different window definitions require separate sorts. Use the WINDOW clause to reuse window definitions and let the optimizer merge sorts. Check actual sort counts via EXPLAIN FORMAT=TREE.

Q8: How to choose between ROWS and RANGE frame modes?

ROWS: Based on physical row count; more predictable behavior and better performance. Preferred in most cases. RANGE: Based on value range; use when the ORDER BY column has duplicates and you need to treat equal values as a group. Required for time-interval-based frames (e.g., RANGE BETWEEN INTERVAL 7 DAY PRECEDING). GROUPS: Based on peer groups; between ROWS and RANGE, suitable for peer-group-level computation.

Q9: Can window functions be used in UPDATE statements?

Not directly in the UPDATE SET clause. But you can use a subquery JOIN: UPDATE orders o JOIN (SELECT id, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn FROM orders) t ON o.id = t.id SET o.seq = t.rn;

Q10: Any caveats when combining window functions with LIMIT?

LIMIT executes after window functions, so window functions compute on all rows first before truncation. If you only need Top-N, the window function still processes all data. For large tables, filter with WHERE first, or consider whether a simple ORDER BY + LIMIT approach (without window functions) meets your needs.

Rate this chapter

4.8 / 5 (3 ratings)

Window Functions in Practice

MySQL Window Functions — Complete Guide

Window Function Syntax

Core Concepts

Named Windows (WINDOW Clause)

Ranking Functions

ROW_NUMBER()

Syntax

Sample Data

Query and Result

RANK()

DENSE_RANK()

ROW_NUMBER vs RANK vs DENSE_RANK Comparison

NTILE(n)

Value Functions

LAG(expr, offset, default)

Syntax

Example: Month-over-Month Revenue

LEAD(expr, offset, default)

FIRST_VALUE(expr)

LAST_VALUE(expr)

NTH_VALUE(expr, n)

Value Functions Comparison

Aggregate Window Functions

SUM() OVER()

AVG() OVER()

COUNT() OVER()

MIN() / MAX() OVER()

PARTITION BY Deep Dive

Multi-Column Partition

PARTITION BY Expressions

No PARTITION BY (Global Window)

Frame Clauses

ROWS Mode

RANGE Mode

GROUPS Mode (MySQL 8.0.2+)

Common Frame Clauses Quick Reference

Default Frame Behavior

20+ Real-World Examples

1 Consecutive Login Days

2 Running Total

3 7-Day Moving Average

4 Top-N Per Group (Top 3 Salaries Per Department)

5 Year-over-Year (YoY) Comparison

6 Month-over-Month (MoM) Comparison

7 Percentile Rank

8 Gap Detection

9 Island Detection

10 Deduplicate Keeping Latest Record

11 Cumulative Percentage (Pareto Analysis)

12 Percentage Within Group

13 Sessionization

14 Funnel Analysis

15 Median Calculation

16 Row-to-Row Delta

17 Leaderboard Rank Change Tracking

18 First Purchase vs Repeat Purchase

19 Inventory Movement and Balance

20 Employee Salary vs Peer Level Comparison

21 Page Dwell Time Calculation

22 Bollinger Bands (Moving Standard Deviation)

Performance Analysis

Window Functions vs Subqueries vs Temp Tables

Window Functions

Correlated Subqueries

Performance Benchmark

EXPLAIN Analysis for Window Functions

Optimization Tips

MySQL 8.0 Features and Limitations

MySQL 8.0 Supported Window Functions

Known Limitations

FILTER Clause Workaround

COUNT(DISTINCT) OVER() Workaround

Execution Order

Frequently Asked Questions (FAQ)

💬 Comments