Window Functions in Practice
MySQL Window Functions — Complete Guide
Window functions are one of the most important features introduced in MySQL 8.0. Unlike regular aggregate functions, window functions do not collapse rows -- every row remains in the result set while simultaneously having access to a set of related rows (the "window") for computation. This makes ranking, running totals, moving averages, year-over-year comparisons, and other analytical tasks that previously required complex subqueries both concise and efficient.
Version requirement: Window functions require MySQL 8.0 or higher. MariaDB 10.2+ also supports window functions, but with minor syntax and behavior differences. All examples in this chapter are based on MySQL 8.0/8.4.
This chapter systematically covers the syntax and semantics of all window functions, demonstrates 20+ real-world business scenarios, and provides in-depth analysis of execution internals and performance optimization strategies.
Window Function Syntax
The complete syntax of a window function consists of three parts: the function call, the OVER clause, and an optional named window.
function_name([args]) OVER ( [PARTITION BY partition_expression, ...] [ORDER BY sort_expression [ASC|DESC], ...] [frame_clause] )
-- frame_clause: {ROWS | RANGE | GROUPS} BETWEEN frame_start AND frame_end
-- frame_start / frame_end: UNBOUNDED PRECEDING | N PRECEDING | CURRENT ROW | N FOLLOWING | UNBOUNDED FOLLOWING
Core Concepts
- PARTITION BY -- Divides the result set into partitions; the window function computes independently within each partition, similar to GROUP BY but without collapsing rows
- ORDER BY -- Defines the ordering of rows within a partition; affects ranking results and frame boundaries
- Frame Clause -- Defines the range of rows the current row can "see", controlling the computation boundary of aggregate windows
Named Windows (WINDOW Clause)
When multiple window functions share the same window definition, use the WINDOW clause to name and reuse it:
SELECT
employee_id,
department,
salary,
ROW_NUMBER() OVER w AS rn,
RANK() OVER w AS rnk,
DENSE_RANK() OVER w AS dense_rnk
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);
Ranking Functions
ROW_NUMBER()
Assigns a unique sequential number (1, 2, 3, ...) to each row within a partition, even when values are identical -- no ties.
Syntax
ROW_NUMBER() OVER (
[PARTITION BY col, ...]
ORDER BY col [ASC|DESC], ...
)
Sample Data
| id | department | name | salary |
|---|---|---|---|
| id | department | name | salary |
| 1 | Engineering | Alice | 12000 |
| 2 | Engineering | Bob | 11000 |
| 3 | Engineering | Charlie | 12000 |
| 4 | Sales | Diana | 9000 |
| 5 | Sales | Eve | 9500 |
Query and Result
SELECT
name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;
| name | department | salary | rn |
|---|---|---|---|
| name | department | salary | rn |
| Alice | Engineering | 12000 | 1 |
| Charlie | Engineering | 12000 | 2 |
| Bob | Engineering | 11000 | 3 |
| Eve | Sales | 9500 | 1 |
| Diana | Sales | 9000 | 2 |
Note: Alice and Charlie have the same salary, but ROW_NUMBER() still assigns different numbers (1 and 2). The specific ordering among equal values is non-deterministic and depends on the order in which the storage engine returns rows. For deterministic ordering, add an additional column (such as the primary key) to the ORDER BY clause.
RANK()
Similar to ROW_NUMBER(), but allows ties. Rows with the same value receive the same rank, and the next different value's rank skips past the tied rows.
SELECT
name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;
| name | department | salary | rnk |
|---|---|---|---|
| name | department | salary | rnk |
| Alice | Engineering | 12000 | 1 |
| Charlie | Engineering | 12000 | 1 |
| Bob | Engineering | 11000 | 3 |
| Eve | Sales | 9500 | 1 |
| Diana | Sales | 9000 | 2 |
Note that Bob's rank is 3 (not 2), because two people are tied at rank 1.
DENSE_RANK()
Like RANK(), allows ties, but the next different value's rank does not skip -- it stays consecutive.
SELECT
name, department, salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;
| name | department | salary | dense_rnk |
|---|---|---|---|
| name | department | salary | dense_rnk |
| Alice | Engineering | 12000 | 1 |
| Charlie | Engineering | 12000 | 1 |
| Bob | Engineering | 11000 | 2 |
| Eve | Sales | 9500 | 1 |
| Diana | Sales | 9000 | 2 |
Bob is now rank 2 (immediately following the tied rank 1).
ROW_NUMBER vs RANK vs DENSE_RANK Comparison
| Function | Ties | Gaps | Use Case |
|---|---|---|---|
| Function | Ties | Gaps | Use Case |
| ROW_NUMBER() | No ties | None | Pagination, deduplication, unique numbering |
| RANK() | Allows ties | Gaps after ties | Competition ranking, exam scores |
| DENSE_RANK() | Allows ties | No gaps | Salary grades, continuous ranking |
NTILE(n)
Divides rows within a partition into n roughly equal buckets and returns the bucket number (1 to n). If the row count is not evenly divisible by n, earlier buckets receive one extra row.
SELECT
name, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
| name | salary | quartile |
|---|---|---|
| name | salary | quartile |
| Alice | 12000 | 1 |
| Charlie | 12000 | 1 |
| Bob | 11000 | 2 |
| Eve | 9500 | 3 |
| Diana | 9000 | 4 |
Typical uses: Computing percentiles (NTILE(100)), quartiles (NTILE(4)), splitting users into A/B test groups.
Value Functions
LAG(expr, offset, default)
Accesses the value from a row before the current row by offset positions. offset defaults to 1, default defaults to NULL.
Syntax
LAG(expression [, offset [, default_value]])
OVER (
[PARTITION BY col, ...]
ORDER BY col [ASC|DESC], ...
)
Example: Month-over-Month Revenue
SELECT
month,
revenue,
LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
/ LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly_sales;
| month | revenue | prev_month | mom_pct |
|---|---|---|---|
| month | revenue | prev_month | mom_pct |
| 2024-01 | 100000 | 0 | NULL |
| 2024-02 | 120000 | 100000 | 20.0 |
| 2024-03 | 115000 | 120000 | -4.2 |
| 2024-04 | 130000 | 115000 | 13.0 |
LEAD(expr, offset, default)
Accesses the value from a row after the current row by offset positions. The opposite direction of LAG.
SELECT
order_id,
customer_id,
order_date,
LEAD(order_date, 1) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS next_order_date,
DATEDIFF(
LEAD(order_date, 1) OVER (PARTITION BY customer_id ORDER BY order_date),
order_date
) AS days_to_next
FROM orders;
| order_id | customer_id | order_date | next_order_date | days_to_next |
|---|---|---|---|---|
| order_id | customer_id | order_date | next_order_date | days_to_next |
| 101 | 1 | 2024-01-05 | 2024-01-20 | 15 |
| 105 | 1 | 2024-01-20 | 2024-03-10 | 50 |
| 112 | 1 | 2024-03-10 | NULL | NULL |
FIRST_VALUE(expr)
Returns the value from the first row in the window frame.
SELECT
name, department, salary,
FIRST_VALUE(name) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS top_earner
FROM employees;
| name | department | salary | top_earner |
|---|---|---|---|
| name | department | salary | top_earner |
| Alice | Engineering | 12000 | Alice |
| Charlie | Engineering | 12000 | Alice |
| Bob | Engineering | 11000 | Alice |
| Eve | Sales | 9500 | Eve |
| Diana | Sales | 9000 | Eve |
LAST_VALUE(expr)
Returns the value from the last row in the window frame.
Common pitfall: The default frame is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE() will only return the last value up to the current row (i.e., the current row itself). To get the true last row of the partition, you must explicitly specify the frame asROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
-- Wrong usage (common mistake):
SELECT name, salary,
LAST_VALUE(name) OVER (
PARTITION BY department ORDER BY salary DESC
) AS lowest_earner -- Always returns the current row's name!
FROM employees;
-- Correct usage:
SELECT name, salary,
LAST_VALUE(name) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_earner -- Returns the person with lowest salary in partition
FROM employees;
NTH_VALUE(expr, n)
Returns the value from the n-th row in the window frame. n starts at 1.
SELECT
name, department, salary,
NTH_VALUE(name, 2) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS second_highest
FROM employees;
| name | department | salary | second_highest |
|---|---|---|---|
| name | department | salary | second_highest |
| Alice | Engineering | 12000 | Charlie |
| Charlie | Engineering | 12000 | Charlie |
| Bob | Engineering | 11000 | Charlie |
| Eve | Sales | 9500 | Diana |
| Diana | Sales | 9000 | Diana |
Value Functions Comparison
| Function | Direction | Use Case | Notes |
|---|---|---|---|
| Function | Direction | Use Case | Notes |
| LAG | Look back (previous row) | MoM, delta calculations | Returns default at partition start |
| LEAD | Look ahead (next row) | Predict next event, gap calculation | Returns default at partition end |
| FIRST_VALUE | First row in frame | Group min/max | Usually no frame change needed |
| LAST_VALUE | Last row in frame | Group min/max | Must set frame explicitly! |
| NTH_VALUE | N-th row in frame | 2nd place, 3rd place | Needs explicit frame |
Aggregate Window Functions
All standard aggregate functions can be used as window functions with OVER(). Unlike regular aggregates, window aggregates do not collapse rows.
SUM() OVER()
Computes a cumulative sum within the window frame.
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total,
SUM(amount) OVER () AS grand_total
FROM orders;
| order_date | amount | running_total | grand_total |
|---|---|---|---|
| order_date | amount | running_total | grand_total |
| 2024-01-01 | 100 | 100 | 850 |
| 2024-01-02 | 200 | 300 | 850 |
| 2024-01-03 | 150 | 450 | 850 |
| 2024-01-04 | 250 | 700 | 850 |
| 2024-01-05 | 150 | 850 | 850 |
AVG() OVER()
Computes the average within the window frame.
-- Compare each employee's salary against department average
SELECT
name, department, salary,
ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
salary - ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS diff_from_avg
FROM employees;
| name | department | salary | dept_avg | diff_from_avg |
|---|---|---|---|---|
| name | department | salary | dept_avg | diff_from_avg |
| Alice | Engineering | 12000 | 11667 | 333 |
| Charlie | Engineering | 12000 | 11667 | 333 |
| Bob | Engineering | 11000 | 11667 | -667 |
| Eve | Sales | 9500 | 9250 | 250 |
| Diana | Sales | 9000 | 9250 | -250 |
COUNT() OVER()
-- Each order's share of department total
SELECT
order_id, department,
COUNT(*) OVER (PARTITION BY department) AS dept_orders,
ROUND(1.0 / COUNT(*) OVER (PARTITION BY department) * 100, 1) AS pct
FROM orders;
MIN() / MAX() OVER()
-- Compare against partition extremes
SELECT
name, department, salary,
MIN(salary) OVER (PARTITION BY department) AS dept_min,
MAX(salary) OVER (PARTITION BY department) AS dept_max,
salary - MIN(salary) OVER (PARTITION BY department) AS above_min
FROM employees;
PARTITION BY Deep Dive
PARTITION BY divides the result set into multiple partitions where the window function computes independently. It is similar to GROUP BY but does not collapse rows.
Multi-Column Partition
-- Partition by year and department for ranking
SELECT
YEAR(hire_date) AS hire_year,
department,
name,
salary,
RANK() OVER (
PARTITION BY YEAR(hire_date), department
ORDER BY salary DESC
) AS rank_in_year_dept
FROM employees;
PARTITION BY Expressions
-- Partition by salary tier
SELECT
name, salary,
CASE
WHEN salary >= 10000 THEN 'High'
WHEN salary >= 7000 THEN 'Mid'
ELSE 'Low'
END AS tier,
ROW_NUMBER() OVER (
PARTITION BY CASE
WHEN salary >= 10000 THEN 'High'
WHEN salary >= 7000 THEN 'Mid'
ELSE 'Low'
END
ORDER BY salary DESC
) AS rank_in_tier
FROM employees;
No PARTITION BY (Global Window)
When PARTITION BY is omitted, the entire result set is treated as a single partition:
-- Global ranking
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;
Frame Clauses
Frame clauses define the range of rows within a partition that the window function can "see". Only aggregate window functions (SUM, AVG, COUNT, MIN, MAX) and value functions (FIRST_VALUE, LAST_VALUE, NTH_VALUE) are affected by frame clauses. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) and LAG/LEAD ignore frame clauses.
ROWS Mode
Defines the frame based on physical row count.
-- 3-day moving average
SELECT
order_date,
amount,
ROUND(AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_3
FROM daily_sales;
| order_date | amount | moving_avg_3 |
|---|---|---|
| order_date | amount | moving_avg_3 |
| 2024-01-01 | 100 | 100.00 |
| 2024-01-02 | 200 | 150.00 |
| 2024-01-03 | 150 | 150.00 |
| 2024-01-04 | 300 | 216.67 |
| 2024-01-05 | 250 | 233.33 |
RANGE Mode
Defines the frame based on value range. Unlike ROWS, RANGE considers value magnitude.
-- Cumulative sum up to current value (includes all rows with same value)
SELECT
salary,
name,
SUM(salary) OVER (
ORDER BY salary
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_salary
FROM employees;
ROWS vs RANGE: When the ORDER BY column has duplicate values, RANGE treats all rows with the same value as a group and includes them all, while ROWS strictly counts physical rows. In most cases, prefer ROWS because its behavior is more predictable and performance is better.
GROUPS Mode (MySQL 8.0.2+)
Defines the frame based on peer groups (groups of rows with the same ORDER BY value).
-- Current group + 1 group before and after
SELECT
department, salary,
SUM(salary) OVER (
ORDER BY salary
GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS group_sum
FROM employees;
Common Frame Clauses Quick Reference
| Frame Definition | Meaning | Typical Use |
|---|---|---|
| Frame Definition | Meaning | Typical Use |
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
From partition start to current row | Running total |
ROWS BETWEEN N PRECEDING AND CURRENT ROW |
Previous N rows to current row | Moving average |
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING |
Current row to partition end | Remaining sum |
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
Entire partition | LAST_VALUE, NTH_VALUE |
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING |
1 row before + current + 1 row after | Smoothing |
RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW |
Within 7 days before current date | 7-day rolling aggregate |
Default Frame Behavior
When the frame clause is explicitly omitted, MySQL uses these defaults:
- With ORDER BY:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW - Without ORDER BY:
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING(entire partition)
20+ Real-World Examples
1 Consecutive Login Days
Find each user's maximum consecutive login streak. Key idea: use ROW_NUMBER() to generate a sequence, then subtract it from the date to create a "group identifier".
WITH login_groups AS (
SELECT
user_id,
login_date,
login_date - INTERVAL ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_date
) DAY AS grp
FROM (
SELECT DISTINCT user_id, DATE(login_time) AS login_date
FROM user_logins
) t
)
SELECT
user_id,
MIN(login_date) AS streak_start,
MAX(login_date) AS streak_end,
COUNT(*) AS consecutive_days
FROM login_groups
GROUP BY user_id, grp
ORDER BY consecutive_days DESC;
How it works: For a consecutive date sequence, date - ROW_NUMBER() yields a constant. For example, Jan 1 (rn=1), Jan 2 (rn=2), Jan 3 (rn=3) all produce Dec 31 after subtraction. When there is a gap, the difference changes, forming a new group.
2 Running Total
SELECT
transaction_date,
amount,
SUM(amount) OVER (
ORDER BY transaction_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM transactions
WHERE account_id = 1001
ORDER BY transaction_date;
3 7-Day Moving Average
SELECT
sale_date,
daily_revenue,
ROUND(AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS ma_7
FROM daily_sales;
Note: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means 7 rows including the current one.
4 Top-N Per Group (Top 3 Salaries Per Department)
SELECT * FROM (
SELECT
department,
name,
salary,
ROW_NUMBER() OVER (
PARTITION BY department ORDER BY salary DESC
) AS rn
FROM employees
) ranked
WHERE rn <= 3;
If ties are allowed, replace ROW_NUMBER() with DENSE_RANK().
5 Year-over-Year (YoY) Comparison
SELECT
YEAR(order_date) AS yr,
MONTH(order_date) AS mo,
SUM(amount) AS monthly_revenue,
LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) AS same_month_last_year,
ROUND(
(SUM(amount) - LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)))
/ LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) * 100, 1
) AS yoy_pct
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY yr, mo;
6 Month-over-Month (MoM) Comparison
WITH monthly AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
)
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
/ LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly;
7 Percentile Rank
SELECT
name,
salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary) * 100, 1) AS percentile,
CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;
PERCENT_RANK(): (rank - 1) / (total_rows - 1), range [0, 1]. CUME_DIST(): rank / total_rows, range (0, 1].
8 Gap Detection
Find gaps in an ID sequence.
SELECT
id AS gap_start,
next_id - 1 AS gap_end,
next_id - id - 1 AS gap_size
FROM (
SELECT
id,
LEAD(id, 1) OVER (ORDER BY id) AS next_id
FROM orders
) t
WHERE next_id - id > 1
ORDER BY gap_start;
9 Island Detection
Find consecutive "islands" of the same status in a status sequence.
WITH numbered AS (
SELECT
server_id,
check_time,
status,
ROW_NUMBER() OVER (PARTITION BY server_id ORDER BY check_time) -
ROW_NUMBER() OVER (PARTITION BY server_id, status ORDER BY check_time) AS island_id
FROM server_status
)
SELECT
server_id,
status,
MIN(check_time) AS island_start,
MAX(check_time) AS island_end,
COUNT(*) AS duration_checks
FROM numbered
GROUP BY server_id, status, island_id
ORDER BY server_id, island_start;
How it works: The difference between a global row number and a per-status row number stays constant within consecutive rows of the same status. When the status changes, the difference changes, forming a new group.
10 Deduplicate Keeping Latest Record
DELETE t1 FROM user_addresses t1
INNER JOIN (
SELECT id,
ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY updated_at DESC
) AS rn
FROM user_addresses
) t2 ON t1.id = t2.id
WHERE t2.rn > 1;
Keeps only the most recently updated address record per user_id.
11 Cumulative Percentage (Pareto Analysis)
SELECT
product_name,
revenue,
SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
ROUND(SUM(revenue) OVER (ORDER BY revenue DESC)
/ SUM(revenue) OVER () * 100, 1) AS cumulative_pct
FROM product_sales
ORDER BY revenue DESC;
Quickly identify the top 20% of products contributing 80% of revenue.
12 Percentage Within Group
SELECT
department,
name,
salary,
ROUND(salary / SUM(salary) OVER (PARTITION BY department) * 100, 1) AS pct_of_dept
FROM employees;
13 Sessionization
Split user behavior logs into sessions using a 30-minute timeout.
WITH events_with_gap AS (
SELECT
user_id,
event_time,
TIMESTAMPDIFF(MINUTE,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time),
event_time
) AS gap_minutes
FROM user_events
),
sessions AS (
SELECT
user_id,
event_time,
SUM(CASE WHEN gap_minutes > 30 OR gap_minutes IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM events_with_gap
)
SELECT
user_id,
session_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
COUNT(*) AS events_count,
TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) AS duration_min
FROM sessions
GROUP BY user_id, session_id;
14 Funnel Analysis
WITH funnel AS (
SELECT
user_id,
event_name,
event_time,
ROW_NUMBER() OVER (
PARTITION BY user_id, event_name ORDER BY event_time
) AS event_seq
FROM user_events
WHERE event_name IN ('page_view', 'add_to_cart', 'checkout', 'payment')
)
SELECT
event_name,
COUNT(DISTINCT user_id) AS users,
ROUND(COUNT(DISTINCT user_id) /
FIRST_VALUE(COUNT(DISTINCT user_id)) OVER (ORDER BY
FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment')
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) * 100, 1) AS conversion_pct
FROM funnel
WHERE event_seq = 1
GROUP BY event_name
ORDER BY FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment');
15 Median Calculation
WITH ordered AS (
SELECT
salary,
ROW_NUMBER() OVER (ORDER BY salary) AS rn,
COUNT(*) OVER () AS total
FROM employees
)
SELECT AVG(salary) AS median_salary
FROM ordered
WHERE rn IN (FLOOR((total + 1) / 2), CEIL((total + 1) / 2));
16 Row-to-Row Delta
SELECT
reading_time,
meter_value,
meter_value - LAG(meter_value, 1, meter_value) OVER (ORDER BY reading_time) AS delta
FROM electricity_readings
WHERE meter_id = 'M001'
ORDER BY reading_time;
17 Leaderboard Rank Change Tracking
WITH weekly_rank AS (
SELECT
week_num,
player_name,
score,
RANK() OVER (PARTITION BY week_num ORDER BY score DESC) AS rnk
FROM weekly_scores
)
SELECT
w.week_num,
w.player_name,
w.score,
w.rnk AS current_rank,
LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) AS prev_rank,
LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) - w.rnk AS rank_change
FROM weekly_rank w
ORDER BY w.week_num, w.rnk;
18 First Purchase vs Repeat Purchase
SELECT
order_id,
customer_id,
order_date,
amount,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) = 1
THEN 'First Purchase'
ELSE 'Repeat Purchase'
END AS purchase_type
FROM orders;
19 Inventory Movement and Balance
SELECT
transaction_time,
product_id,
CASE type WHEN 'IN' THEN qty ELSE -qty END AS change,
SUM(CASE type WHEN 'IN' THEN qty ELSE -qty END) OVER (
PARTITION BY product_id
ORDER BY transaction_time
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS balance
FROM inventory_transactions
ORDER BY product_id, transaction_time;
20 Employee Salary vs Peer Level Comparison
SELECT
name,
department,
job_level,
salary,
AVG(salary) OVER (PARTITION BY job_level) AS level_avg,
MIN(salary) OVER (PARTITION BY job_level) AS level_min,
MAX(salary) OVER (PARTITION BY job_level) AS level_max,
NTILE(4) OVER (PARTITION BY job_level ORDER BY salary) AS salary_quartile,
ROUND(PERCENT_RANK() OVER (PARTITION BY job_level ORDER BY salary) * 100, 0) AS percentile
FROM employees;
21 Page Dwell Time Calculation
SELECT
user_id,
page_url,
view_time,
LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time) AS next_view_time,
TIMESTAMPDIFF(SECOND, view_time,
LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time)
) AS stay_seconds
FROM page_views
WHERE session_id = 'abc123'
ORDER BY view_time;
22 Bollinger Bands (Moving Standard Deviation)
SELECT
trade_date,
close_price,
ROUND(AVG(close_price) OVER w, 2) AS ma_20,
ROUND(STDDEV_POP(close_price) OVER w, 2) AS std_20,
ROUND(AVG(close_price) OVER w + 2 * STDDEV_POP(close_price) OVER w, 2) AS upper_band,
ROUND(AVG(close_price) OVER w - 2 * STDDEV_POP(close_price) OVER w, 2) AS lower_band
FROM stock_prices
WHERE ticker = 'AAPL'
WINDOW w AS (ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)
ORDER BY trade_date;
Bollinger Bands: 20-day moving average +/- 2 standard deviations.
Performance Analysis
Window Functions vs Subqueries vs Temp Tables
Window Functions
- Single scan for multiple computations
- Optimizer can merge multiple OVER clauses
- Clean code, high maintainability
- Sorting is the primary cost
Correlated Subqueries
- Executes subquery per row (O(n^2))
- Acceptable for small datasets, extremely slow for large tables
- Can leverage indexes
- Compatible with MySQL 5.7
Performance Benchmark
-- Scenario: 1 million row orders table, compute running total per user
-- Approach 1: Window function (~0.8s)
SELECT order_id, user_id, amount,
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS running_total
FROM orders;
-- Approach 2: Correlated subquery (~45s)
SELECT o1.order_id, o1.user_id, o1.amount,
(SELECT SUM(o2.amount) FROM orders o2
WHERE o2.user_id = o1.user_id AND o2.order_date <= o1.order_date) AS running_total
FROM orders o1;
-- Approach 3: Temp table + variables (MySQL 5.7 compatible, ~1.2s, unreliable)
SET @running := 0, @prev_user := '';
SELECT order_id, user_id, amount,
@running := IF(@prev_user = user_id, @running + amount, amount) AS running_total,
@prev_user := user_id
FROM orders ORDER BY user_id, order_date;
User variable assignment behavior is undefined in MySQL 8.0: The MySQL optimizer may evaluate expressions in the SELECT list in any order. Approach 3 is unreliable in MySQL 8.0 and should be replaced with window functions.
EXPLAIN Analysis for Window Functions
EXPLAIN FORMAT=TREE
SELECT department, name, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;
/* Example output:
-> Window aggregate with buffering: rank() OVER (
PARTITION BY department ORDER BY salary DESC)
-> Sort: employees.department, employees.salary DESC
-> Table scan on employees (cost=10.25 rows=100)
*/
Key observations:
- Sort node: Window functions almost always require sorting. A suitable index may avoid the sort
- Window aggregate with buffering: MySQL needs to buffer rows for window computation. For large datasets, this means more memory usage
- Multiple windows: If PARTITION BY and ORDER BY differ across windows, multiple sorts may be needed
Optimization Tips
- Create indexes for PARTITION BY + ORDER BY -- If the window function is used in a subquery/CTE that can be pushed down, a suitable composite index can avoid sorting
- Reduce the number of rows participating in window computation -- Filter with WHERE first, then apply window functions
- Consolidate window definitions -- Use the WINDOW clause to reuse the same PARTITION BY + ORDER BY so the optimizer sorts only once
- Avoid unnecessary ORDER BY -- If you only need global aggregation (e.g., SUM() OVER()), omit ORDER BY
- Control frame size -- ROWS frames are typically more efficient than RANGE frames
- Mind memory usage --
sort_buffer_sizeandtmp_table_sizeaffect window function performance
-- Create an index to optimize window functions
ALTER TABLE orders ADD INDEX idx_user_date (user_id, order_date);
-- Filter first, then rank
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rn
FROM orders
WHERE order_date >= '2024-01-01' -- Filter first to reduce data volume
) t WHERE rn = 1;
MySQL 8.0 Features and Limitations
MySQL 8.0 Supported Window Functions
| Category | Functions | Minimum Version |
|---|---|---|
| Category | Functions | Minimum Version |
| Ranking | ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE() | 8.0.2 |
| Value | LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE() | 8.0.2 |
| Distribution | PERCENT_RANK(), CUME_DIST() | 8.0.2 |
| Aggregate Windows | SUM(), AVG(), COUNT(), MIN(), MAX(), BIT_AND(), BIT_OR(), BIT_XOR(), STDDEV_POP(), STDDEV_SAMP(), VAR_POP(), VAR_SAMP(), JSON_ARRAYAGG(), JSON_OBJECTAGG() | 8.0.2 |
| Frame Modes | ROWS, RANGE, GROUPS | 8.0.2 (GROUPS 8.0.2) |
| Frame Exclusion | EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES, EXCLUDE NO OTHERS | Not supported |
| Named Windows | WINDOW w AS (...) | 8.0.2 |
Known Limitations
- No EXCLUDE clause: The SQL standard's
EXCLUDE CURRENT ROW,EXCLUDE GROUP, and other frame exclusion clauses are not implemented in MySQL - No FILTER clause: The SQL standard's
COUNT(*) FILTER (WHERE ...)is unavailable; useCOUNT(CASE WHEN ... THEN 1 END)instead - No DISTINCT window aggregate:
COUNT(DISTINCT col) OVER ()raises an error; requires a subquery workaround - Window functions cannot be nested: You cannot call a window function inside another window function
- Cannot be used in WHERE/HAVING: Window functions can only appear in SELECT and ORDER BY clauses. To filter on results, wrap in a subquery
- GROUP BY + window functions: Window functions execute after GROUP BY, so they operate on the aggregated result rows
FILTER Clause Workaround
-- Standard SQL (PostgreSQL supports, MySQL does not):
SELECT COUNT(*) FILTER (WHERE status = 'active') OVER (PARTITION BY dept)
FROM employees;
-- MySQL workaround:
SELECT
SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) OVER (PARTITION BY dept) AS active_count
FROM employees;
COUNT(DISTINCT) OVER() Workaround
-- Not supported:
-- SELECT COUNT(DISTINCT category) OVER (PARTITION BY dept) FROM products;
-- Workaround 1: Subquery
SELECT p.*,
(SELECT COUNT(DISTINCT p2.category) FROM products p2 WHERE p2.dept = p.dept) AS distinct_cats
FROM products p;
-- Workaround 2: DENSE_RANK trick
SELECT *,
MAX(dr) OVER (PARTITION BY dept) AS distinct_cats
FROM (
SELECT *, DENSE_RANK() OVER (PARTITION BY dept ORDER BY category) AS dr
FROM products
) t;
Execution Order
Window functions sit at the SELECT stage in SQL's logical execution order:
- FROM / JOIN
- WHERE
- GROUP BY
- HAVING
- Window Functions (WINDOW)
- SELECT
- DISTINCT
- ORDER BY
- LIMIT
This means window functions can reference aliased aggregate results in the SELECT list, but you cannot use window function results in WHERE or HAVING.
Frequently Asked Questions (FAQ)
Q1: What is the difference between window functions and GROUP BY?
GROUP BY collapses multiple rows into one, outputting a single result per group. Window functions preserve all original rows and attach aggregate/ranking computations to each row. For example, GROUP BY department returns one row per department, while SUM(salary) OVER (PARTITION BY department) preserves every employee row with the department total attached to each.
Q2: Why does LAST_VALUE() always return the current row's value?
Because the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The end of the frame is the current row, so LAST_VALUE returns the current row's value. To get the true last value in the partition, you must explicitly set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
Q3: Can I use window functions in the WHERE clause?
No. Window functions execute after WHERE. To filter on window function results, wrap the query in a subquery or CTE: SELECT * FROM (SELECT *, ROW_NUMBER() OVER (...) AS rn FROM t) sub WHERE rn = 1;
Q4: Do window functions cause sorting? How to optimize?
Almost always. Both PARTITION BY and ORDER BY require sorting. Optimization strategies: (1) Create composite indexes for PARTITION BY + ORDER BY columns; (2) Use the WINDOW clause to reuse the same window definition so the optimizer sorts only once; (3) Filter with WHERE first to reduce data volume; (4) If you only need aggregation without ordering, omit ORDER BY.
Q5: How to handle non-deterministic ROW_NUMBER() ordering for equal values?
Add a unique column (typically the primary key) as a tiebreaker in ORDER BY: ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC, id ASC). This ensures deterministic ordering even when salary values are identical.
Q6: How to achieve window function equivalents in MySQL 5.7?
MySQL 5.7 does not support window functions. Common workarounds: (1) Ranking via user variables @rn := @rn + 1 (unreliable in MySQL 8.0); (2) Running totals via correlated subqueries; (3) LAG/LEAD via self-joins. These alternatives typically perform much worse. We recommend upgrading to MySQL 8.0.
Q7: Do multiple window functions cause multiple sorts?
It depends on whether PARTITION BY and ORDER BY are the same. Multiple functions with the same window definition only sort once. Different window definitions require separate sorts. Use the WINDOW clause to reuse window definitions and let the optimizer merge sorts. Check actual sort counts via EXPLAIN FORMAT=TREE.
Q8: How to choose between ROWS and RANGE frame modes?
ROWS: Based on physical row count; more predictable behavior and better performance. Preferred in most cases. RANGE: Based on value range; use when the ORDER BY column has duplicates and you need to treat equal values as a group. Required for time-interval-based frames (e.g., RANGE BETWEEN INTERVAL 7 DAY PRECEDING). GROUPS: Based on peer groups; between ROWS and RANGE, suitable for peer-group-level computation.
Q9: Can window functions be used in UPDATE statements?
Not directly in the UPDATE SET clause. But you can use a subquery JOIN: UPDATE orders o JOIN (SELECT id, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn FROM orders) t ON o.id = t.id SET o.seq = t.rn;
Q10: Any caveats when combining window functions with LIMIT?
LIMIT executes after window functions, so window functions compute on all rows first before truncation. If you only need Top-N, the window function still processes all data. For large tables, filter with WHERE first, or consider whether a simple ORDER BY + LIMIT approach (without window functions) meets your needs.