Chapter 44

Window Functions in Practice

MySQL Window Functions — Complete Guide

Window functions are one of the most important features introduced in MySQL 8.0. Unlike regular aggregate functions, window functions do not collapse rows -- every row remains in the result set while simultaneously having access to a set of related rows (the "window") for computation. This makes ranking, running totals, moving averages, year-over-year comparisons, and other analytical tasks that previously required complex subqueries both concise and efficient.

Version requirement: Window functions require MySQL 8.0 or higher. MariaDB 10.2+ also supports window functions, but with minor syntax and behavior differences. All examples in this chapter are based on MySQL 8.0/8.4.

This chapter systematically covers the syntax and semantics of all window functions, demonstrates 20+ real-world business scenarios, and provides in-depth analysis of execution internals and performance optimization strategies.

Window Function Syntax

The complete syntax of a window function consists of three parts: the function call, the OVER clause, and an optional named window.

function_name([args]) OVER ( [PARTITION BY partition_expression, ...] [ORDER BY sort_expression [ASC|DESC], ...] [frame_clause] )

-- frame_clause: {ROWS | RANGE | GROUPS} BETWEEN frame_start AND frame_end

-- frame_start / frame_end: UNBOUNDED PRECEDING | N PRECEDING | CURRENT ROW | N FOLLOWING | UNBOUNDED FOLLOWING

Core Concepts

Named Windows (WINDOW Clause)

When multiple window functions share the same window definition, use the WINDOW clause to name and reuse it:

SELECT
  employee_id,
  department,
  salary,
  ROW_NUMBER() OVER w AS rn,
  RANK()       OVER w AS rnk,
  DENSE_RANK() OVER w AS dense_rnk
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

Ranking Functions

ROW_NUMBER()

Assigns a unique sequential number (1, 2, 3, ...) to each row within a partition, even when values are identical -- no ties.

Syntax

ROW_NUMBER() OVER (
  [PARTITION BY col, ...]
  ORDER BY col [ASC|DESC], ...
)

Sample Data

id department name salary
id department name salary
1 Engineering Alice 12000
2 Engineering Bob 11000
3 Engineering Charlie 12000
4 Sales Diana 9000
5 Sales Eve 9500

Query and Result

SELECT
  name, department, salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;
name department salary rn
name department salary rn
Alice Engineering 12000 1
Charlie Engineering 12000 2
Bob Engineering 11000 3
Eve Sales 9500 1
Diana Sales 9000 2

Note: Alice and Charlie have the same salary, but ROW_NUMBER() still assigns different numbers (1 and 2). The specific ordering among equal values is non-deterministic and depends on the order in which the storage engine returns rows. For deterministic ordering, add an additional column (such as the primary key) to the ORDER BY clause.

RANK()

Similar to ROW_NUMBER(), but allows ties. Rows with the same value receive the same rank, and the next different value's rank skips past the tied rows.

SELECT
  name, department, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;
name department salary rnk
name department salary rnk
Alice Engineering 12000 1
Charlie Engineering 12000 1
Bob Engineering 11000 3
Eve Sales 9500 1
Diana Sales 9000 2

Note that Bob's rank is 3 (not 2), because two people are tied at rank 1.

DENSE_RANK()

Like RANK(), allows ties, but the next different value's rank does not skip -- it stays consecutive.

SELECT
  name, department, salary,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;
name department salary dense_rnk
name department salary dense_rnk
Alice Engineering 12000 1
Charlie Engineering 12000 1
Bob Engineering 11000 2
Eve Sales 9500 1
Diana Sales 9000 2

Bob is now rank 2 (immediately following the tied rank 1).

ROW_NUMBER vs RANK vs DENSE_RANK Comparison

Function Ties Gaps Use Case
Function Ties Gaps Use Case
ROW_NUMBER() No ties None Pagination, deduplication, unique numbering
RANK() Allows ties Gaps after ties Competition ranking, exam scores
DENSE_RANK() Allows ties No gaps Salary grades, continuous ranking

NTILE(n)

Divides rows within a partition into n roughly equal buckets and returns the bucket number (1 to n). If the row count is not evenly divisible by n, earlier buckets receive one extra row.

SELECT
  name, salary,
  NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
name salary quartile
name salary quartile
Alice 12000 1
Charlie 12000 1
Bob 11000 2
Eve 9500 3
Diana 9000 4

Typical uses: Computing percentiles (NTILE(100)), quartiles (NTILE(4)), splitting users into A/B test groups.

Value Functions

LAG(expr, offset, default)

Accesses the value from a row before the current row by offset positions. offset defaults to 1, default defaults to NULL.

Syntax

LAG(expression [, offset [, default_value]])
  OVER (
    [PARTITION BY col, ...]
    ORDER BY col [ASC|DESC], ...
  )

Example: Month-over-Month Revenue

SELECT
  month,
  revenue,
  LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month,
  ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
        / LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly_sales;
month revenue prev_month mom_pct
month revenue prev_month mom_pct
2024-01 100000 0 NULL
2024-02 120000 100000 20.0
2024-03 115000 120000 -4.2
2024-04 130000 115000 13.0

LEAD(expr, offset, default)

Accesses the value from a row after the current row by offset positions. The opposite direction of LAG.

SELECT
  order_id,
  customer_id,
  order_date,
  LEAD(order_date, 1) OVER (
    PARTITION BY customer_id ORDER BY order_date
  ) AS next_order_date,
  DATEDIFF(
    LEAD(order_date, 1) OVER (PARTITION BY customer_id ORDER BY order_date),
    order_date
  ) AS days_to_next
FROM orders;
order_id customer_id order_date next_order_date days_to_next
order_id customer_id order_date next_order_date days_to_next
101 1 2024-01-05 2024-01-20 15
105 1 2024-01-20 2024-03-10 50
112 1 2024-03-10 NULL NULL

FIRST_VALUE(expr)

Returns the value from the first row in the window frame.

SELECT
  name, department, salary,
  FIRST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS top_earner
FROM employees;
name department salary top_earner
name department salary top_earner
Alice Engineering 12000 Alice
Charlie Engineering 12000 Alice
Bob Engineering 11000 Alice
Eve Sales 9500 Eve
Diana Sales 9000 Eve

LAST_VALUE(expr)

Returns the value from the last row in the window frame.

Common pitfall: The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE() will only return the last value up to the current row (i.e., the current row itself). To get the true last row of the partition, you must explicitly specify the frame as ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

-- Wrong usage (common mistake):
SELECT name, salary,
  LAST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
  ) AS lowest_earner   -- Always returns the current row's name!
FROM employees;

-- Correct usage:
SELECT name, salary,
  LAST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS lowest_earner   -- Returns the person with lowest salary in partition
FROM employees;

NTH_VALUE(expr, n)

Returns the value from the n-th row in the window frame. n starts at 1.

SELECT
  name, department, salary,
  NTH_VALUE(name, 2) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS second_highest
FROM employees;
name department salary second_highest
name department salary second_highest
Alice Engineering 12000 Charlie
Charlie Engineering 12000 Charlie
Bob Engineering 11000 Charlie
Eve Sales 9500 Diana
Diana Sales 9000 Diana

Value Functions Comparison

Function Direction Use Case Notes
Function Direction Use Case Notes
LAG Look back (previous row) MoM, delta calculations Returns default at partition start
LEAD Look ahead (next row) Predict next event, gap calculation Returns default at partition end
FIRST_VALUE First row in frame Group min/max Usually no frame change needed
LAST_VALUE Last row in frame Group min/max Must set frame explicitly!
NTH_VALUE N-th row in frame 2nd place, 3rd place Needs explicit frame

Aggregate Window Functions

All standard aggregate functions can be used as window functions with OVER(). Unlike regular aggregates, window aggregates do not collapse rows.

SUM() OVER()

Computes a cumulative sum within the window frame.

SELECT
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date) AS running_total,
  SUM(amount) OVER () AS grand_total
FROM orders;
order_date amount running_total grand_total
order_date amount running_total grand_total
2024-01-01 100 100 850
2024-01-02 200 300 850
2024-01-03 150 450 850
2024-01-04 250 700 850
2024-01-05 150 850 850

AVG() OVER()

Computes the average within the window frame.

-- Compare each employee's salary against department average
SELECT
  name, department, salary,
  ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
  salary - ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS diff_from_avg
FROM employees;
name department salary dept_avg diff_from_avg
name department salary dept_avg diff_from_avg
Alice Engineering 12000 11667 333
Charlie Engineering 12000 11667 333
Bob Engineering 11000 11667 -667
Eve Sales 9500 9250 250
Diana Sales 9000 9250 -250

COUNT() OVER()

-- Each order's share of department total
SELECT
  order_id, department,
  COUNT(*) OVER (PARTITION BY department) AS dept_orders,
  ROUND(1.0 / COUNT(*) OVER (PARTITION BY department) * 100, 1) AS pct
FROM orders;

MIN() / MAX() OVER()

-- Compare against partition extremes
SELECT
  name, department, salary,
  MIN(salary) OVER (PARTITION BY department) AS dept_min,
  MAX(salary) OVER (PARTITION BY department) AS dept_max,
  salary - MIN(salary) OVER (PARTITION BY department) AS above_min
FROM employees;

PARTITION BY Deep Dive

PARTITION BY divides the result set into multiple partitions where the window function computes independently. It is similar to GROUP BY but does not collapse rows.

Multi-Column Partition

-- Partition by year and department for ranking
SELECT
  YEAR(hire_date) AS hire_year,
  department,
  name,
  salary,
  RANK() OVER (
    PARTITION BY YEAR(hire_date), department
    ORDER BY salary DESC
  ) AS rank_in_year_dept
FROM employees;

PARTITION BY Expressions

-- Partition by salary tier
SELECT
  name, salary,
  CASE
    WHEN salary >= 10000 THEN 'High'
    WHEN salary >= 7000  THEN 'Mid'
    ELSE 'Low'
  END AS tier,
  ROW_NUMBER() OVER (
    PARTITION BY CASE
      WHEN salary >= 10000 THEN 'High'
      WHEN salary >= 7000  THEN 'Mid'
      ELSE 'Low'
    END
    ORDER BY salary DESC
  ) AS rank_in_tier
FROM employees;

No PARTITION BY (Global Window)

When PARTITION BY is omitted, the entire result set is treated as a single partition:

-- Global ranking
SELECT name, salary,
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;

Frame Clauses

Frame clauses define the range of rows within a partition that the window function can "see". Only aggregate window functions (SUM, AVG, COUNT, MIN, MAX) and value functions (FIRST_VALUE, LAST_VALUE, NTH_VALUE) are affected by frame clauses. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) and LAG/LEAD ignore frame clauses.

ROWS Mode

Defines the frame based on physical row count.

-- 3-day moving average
SELECT
  order_date,
  amount,
  ROUND(AVG(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ), 2) AS moving_avg_3
FROM daily_sales;
order_date amount moving_avg_3
order_date amount moving_avg_3
2024-01-01 100 100.00
2024-01-02 200 150.00
2024-01-03 150 150.00
2024-01-04 300 216.67
2024-01-05 250 233.33

RANGE Mode

Defines the frame based on value range. Unlike ROWS, RANGE considers value magnitude.

-- Cumulative sum up to current value (includes all rows with same value)
SELECT
  salary,
  name,
  SUM(salary) OVER (
    ORDER BY salary
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_salary
FROM employees;

ROWS vs RANGE: When the ORDER BY column has duplicate values, RANGE treats all rows with the same value as a group and includes them all, while ROWS strictly counts physical rows. In most cases, prefer ROWS because its behavior is more predictable and performance is better.

GROUPS Mode (MySQL 8.0.2+)

Defines the frame based on peer groups (groups of rows with the same ORDER BY value).

-- Current group + 1 group before and after
SELECT
  department, salary,
  SUM(salary) OVER (
    ORDER BY salary
    GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) AS group_sum
FROM employees;

Common Frame Clauses Quick Reference

Frame Definition Meaning Typical Use
Frame Definition Meaning Typical Use
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW From partition start to current row Running total
ROWS BETWEEN N PRECEDING AND CURRENT ROW Previous N rows to current row Moving average
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING Current row to partition end Remaining sum
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING Entire partition LAST_VALUE, NTH_VALUE
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 1 row before + current + 1 row after Smoothing
RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW Within 7 days before current date 7-day rolling aggregate

Default Frame Behavior

When the frame clause is explicitly omitted, MySQL uses these defaults:

20+ Real-World Examples

1 Consecutive Login Days

Find each user's maximum consecutive login streak. Key idea: use ROW_NUMBER() to generate a sequence, then subtract it from the date to create a "group identifier".

WITH login_groups AS (
  SELECT
    user_id,
    login_date,
    login_date - INTERVAL ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY login_date
    ) DAY AS grp
  FROM (
    SELECT DISTINCT user_id, DATE(login_time) AS login_date
    FROM user_logins
  ) t
)
SELECT
  user_id,
  MIN(login_date) AS streak_start,
  MAX(login_date) AS streak_end,
  COUNT(*)         AS consecutive_days
FROM login_groups
GROUP BY user_id, grp
ORDER BY consecutive_days DESC;

How it works: For a consecutive date sequence, date - ROW_NUMBER() yields a constant. For example, Jan 1 (rn=1), Jan 2 (rn=2), Jan 3 (rn=3) all produce Dec 31 after subtraction. When there is a gap, the difference changes, forming a new group.

2 Running Total

SELECT
  transaction_date,
  amount,
  SUM(amount) OVER (
    ORDER BY transaction_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM transactions
WHERE account_id = 1001
ORDER BY transaction_date;

3 7-Day Moving Average

SELECT
  sale_date,
  daily_revenue,
  ROUND(AVG(daily_revenue) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ), 2) AS ma_7
FROM daily_sales;

Note: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means 7 rows including the current one.

4 Top-N Per Group (Top 3 Salaries Per Department)

SELECT * FROM (
  SELECT
    department,
    name,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department ORDER BY salary DESC
    ) AS rn
  FROM employees
) ranked
WHERE rn <= 3;

If ties are allowed, replace ROW_NUMBER() with DENSE_RANK().

5 Year-over-Year (YoY) Comparison

SELECT
  YEAR(order_date) AS yr,
  MONTH(order_date) AS mo,
  SUM(amount) AS monthly_revenue,
  LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) AS same_month_last_year,
  ROUND(
    (SUM(amount) - LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)))
    / LAG(SUM(amount), 12) OVER (ORDER BY YEAR(order_date), MONTH(order_date)) * 100, 1
  ) AS yoy_pct
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY yr, mo;

6 Month-over-Month (MoM) Comparison

WITH monthly AS (
  SELECT
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    SUM(amount) AS revenue
  FROM orders
  GROUP BY DATE_FORMAT(order_date, '%Y-%m')
)
SELECT
  month,
  revenue,
  LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
  ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
        / LAG(revenue, 1) OVER (ORDER BY month) * 100, 1) AS mom_pct
FROM monthly;

7 Percentile Rank

SELECT
  name,
  salary,
  ROUND(PERCENT_RANK() OVER (ORDER BY salary) * 100, 1) AS percentile,
  CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;

PERCENT_RANK(): (rank - 1) / (total_rows - 1), range [0, 1]. CUME_DIST(): rank / total_rows, range (0, 1].

8 Gap Detection

Find gaps in an ID sequence.

SELECT
  id AS gap_start,
  next_id - 1 AS gap_end,
  next_id - id - 1 AS gap_size
FROM (
  SELECT
    id,
    LEAD(id, 1) OVER (ORDER BY id) AS next_id
  FROM orders
) t
WHERE next_id - id > 1
ORDER BY gap_start;

9 Island Detection

Find consecutive "islands" of the same status in a status sequence.

WITH numbered AS (
  SELECT
    server_id,
    check_time,
    status,
    ROW_NUMBER() OVER (PARTITION BY server_id ORDER BY check_time) -
    ROW_NUMBER() OVER (PARTITION BY server_id, status ORDER BY check_time) AS island_id
  FROM server_status
)
SELECT
  server_id,
  status,
  MIN(check_time) AS island_start,
  MAX(check_time) AS island_end,
  COUNT(*)         AS duration_checks
FROM numbered
GROUP BY server_id, status, island_id
ORDER BY server_id, island_start;

How it works: The difference between a global row number and a per-status row number stays constant within consecutive rows of the same status. When the status changes, the difference changes, forming a new group.

10 Deduplicate Keeping Latest Record

DELETE t1 FROM user_addresses t1
INNER JOIN (
  SELECT id,
    ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY updated_at DESC
    ) AS rn
  FROM user_addresses
) t2 ON t1.id = t2.id
WHERE t2.rn > 1;

Keeps only the most recently updated address record per user_id.

11 Cumulative Percentage (Pareto Analysis)

SELECT
  product_name,
  revenue,
  SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
  ROUND(SUM(revenue) OVER (ORDER BY revenue DESC)
        / SUM(revenue) OVER () * 100, 1) AS cumulative_pct
FROM product_sales
ORDER BY revenue DESC;

Quickly identify the top 20% of products contributing 80% of revenue.

12 Percentage Within Group

SELECT
  department,
  name,
  salary,
  ROUND(salary / SUM(salary) OVER (PARTITION BY department) * 100, 1) AS pct_of_dept
FROM employees;

13 Sessionization

Split user behavior logs into sessions using a 30-minute timeout.

WITH events_with_gap AS (
  SELECT
    user_id,
    event_time,
    TIMESTAMPDIFF(MINUTE,
      LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time),
      event_time
    ) AS gap_minutes
  FROM user_events
),
sessions AS (
  SELECT
    user_id,
    event_time,
    SUM(CASE WHEN gap_minutes > 30 OR gap_minutes IS NULL THEN 1 ELSE 0 END)
      OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
  FROM events_with_gap
)
SELECT
  user_id,
  session_id,
  MIN(event_time) AS session_start,
  MAX(event_time) AS session_end,
  COUNT(*)         AS events_count,
  TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) AS duration_min
FROM sessions
GROUP BY user_id, session_id;

14 Funnel Analysis

WITH funnel AS (
  SELECT
    user_id,
    event_name,
    event_time,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, event_name ORDER BY event_time
    ) AS event_seq
  FROM user_events
  WHERE event_name IN ('page_view', 'add_to_cart', 'checkout', 'payment')
)
SELECT
  event_name,
  COUNT(DISTINCT user_id) AS users,
  ROUND(COUNT(DISTINCT user_id) /
    FIRST_VALUE(COUNT(DISTINCT user_id)) OVER (ORDER BY
      FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment')
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) * 100, 1) AS conversion_pct
FROM funnel
WHERE event_seq = 1
GROUP BY event_name
ORDER BY FIELD(event_name, 'page_view', 'add_to_cart', 'checkout', 'payment');

15 Median Calculation

WITH ordered AS (
  SELECT
    salary,
    ROW_NUMBER() OVER (ORDER BY salary) AS rn,
    COUNT(*) OVER () AS total
  FROM employees
)
SELECT AVG(salary) AS median_salary
FROM ordered
WHERE rn IN (FLOOR((total + 1) / 2), CEIL((total + 1) / 2));

16 Row-to-Row Delta

SELECT
  reading_time,
  meter_value,
  meter_value - LAG(meter_value, 1, meter_value) OVER (ORDER BY reading_time) AS delta
FROM electricity_readings
WHERE meter_id = 'M001'
ORDER BY reading_time;

17 Leaderboard Rank Change Tracking

WITH weekly_rank AS (
  SELECT
    week_num,
    player_name,
    score,
    RANK() OVER (PARTITION BY week_num ORDER BY score DESC) AS rnk
  FROM weekly_scores
)
SELECT
  w.week_num,
  w.player_name,
  w.score,
  w.rnk AS current_rank,
  LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) AS prev_rank,
  LAG(w.rnk) OVER (PARTITION BY w.player_name ORDER BY w.week_num) - w.rnk AS rank_change
FROM weekly_rank w
ORDER BY w.week_num, w.rnk;

18 First Purchase vs Repeat Purchase

SELECT
  order_id,
  customer_id,
  order_date,
  amount,
  CASE
    WHEN ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) = 1
    THEN 'First Purchase'
    ELSE 'Repeat Purchase'
  END AS purchase_type
FROM orders;

19 Inventory Movement and Balance

SELECT
  transaction_time,
  product_id,
  CASE type WHEN 'IN' THEN qty ELSE -qty END AS change,
  SUM(CASE type WHEN 'IN' THEN qty ELSE -qty END) OVER (
    PARTITION BY product_id
    ORDER BY transaction_time
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS balance
FROM inventory_transactions
ORDER BY product_id, transaction_time;

20 Employee Salary vs Peer Level Comparison

SELECT
  name,
  department,
  job_level,
  salary,
  AVG(salary) OVER (PARTITION BY job_level) AS level_avg,
  MIN(salary) OVER (PARTITION BY job_level) AS level_min,
  MAX(salary) OVER (PARTITION BY job_level) AS level_max,
  NTILE(4) OVER (PARTITION BY job_level ORDER BY salary) AS salary_quartile,
  ROUND(PERCENT_RANK() OVER (PARTITION BY job_level ORDER BY salary) * 100, 0) AS percentile
FROM employees;

21 Page Dwell Time Calculation

SELECT
  user_id,
  page_url,
  view_time,
  LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time) AS next_view_time,
  TIMESTAMPDIFF(SECOND, view_time,
    LEAD(view_time) OVER (PARTITION BY user_id ORDER BY view_time)
  ) AS stay_seconds
FROM page_views
WHERE session_id = 'abc123'
ORDER BY view_time;

22 Bollinger Bands (Moving Standard Deviation)

SELECT
  trade_date,
  close_price,
  ROUND(AVG(close_price) OVER w, 2) AS ma_20,
  ROUND(STDDEV_POP(close_price) OVER w, 2) AS std_20,
  ROUND(AVG(close_price) OVER w + 2 * STDDEV_POP(close_price) OVER w, 2) AS upper_band,
  ROUND(AVG(close_price) OVER w - 2 * STDDEV_POP(close_price) OVER w, 2) AS lower_band
FROM stock_prices
WHERE ticker = 'AAPL'
WINDOW w AS (ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)
ORDER BY trade_date;

Bollinger Bands: 20-day moving average +/- 2 standard deviations.

Performance Analysis

Window Functions vs Subqueries vs Temp Tables

Window Functions
Correlated Subqueries

Performance Benchmark

-- Scenario: 1 million row orders table, compute running total per user

-- Approach 1: Window function (~0.8s)
SELECT order_id, user_id, amount,
  SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS running_total
FROM orders;

-- Approach 2: Correlated subquery (~45s)
SELECT o1.order_id, o1.user_id, o1.amount,
  (SELECT SUM(o2.amount) FROM orders o2
   WHERE o2.user_id = o1.user_id AND o2.order_date <= o1.order_date) AS running_total
FROM orders o1;

-- Approach 3: Temp table + variables (MySQL 5.7 compatible, ~1.2s, unreliable)
SET @running := 0, @prev_user := '';
SELECT order_id, user_id, amount,
  @running := IF(@prev_user = user_id, @running + amount, amount) AS running_total,
  @prev_user := user_id
FROM orders ORDER BY user_id, order_date;

User variable assignment behavior is undefined in MySQL 8.0: The MySQL optimizer may evaluate expressions in the SELECT list in any order. Approach 3 is unreliable in MySQL 8.0 and should be replaced with window functions.

EXPLAIN Analysis for Window Functions

EXPLAIN FORMAT=TREE
SELECT department, name, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;

/* Example output:
-> Window aggregate with buffering: rank() OVER (
      PARTITION BY department ORDER BY salary DESC)
  -> Sort: employees.department, employees.salary DESC
    -> Table scan on employees  (cost=10.25 rows=100)
*/

Key observations:

Optimization Tips

  1. Create indexes for PARTITION BY + ORDER BY -- If the window function is used in a subquery/CTE that can be pushed down, a suitable composite index can avoid sorting
  2. Reduce the number of rows participating in window computation -- Filter with WHERE first, then apply window functions
  3. Consolidate window definitions -- Use the WINDOW clause to reuse the same PARTITION BY + ORDER BY so the optimizer sorts only once
  4. Avoid unnecessary ORDER BY -- If you only need global aggregation (e.g., SUM() OVER()), omit ORDER BY
  5. Control frame size -- ROWS frames are typically more efficient than RANGE frames
  6. Mind memory usage -- sort_buffer_size and tmp_table_size affect window function performance
-- Create an index to optimize window functions
ALTER TABLE orders ADD INDEX idx_user_date (user_id, order_date);

-- Filter first, then rank
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rn
  FROM orders
  WHERE order_date >= '2024-01-01'  -- Filter first to reduce data volume
) t WHERE rn = 1;

MySQL 8.0 Features and Limitations

MySQL 8.0 Supported Window Functions

Category Functions Minimum Version
Category Functions Minimum Version
Ranking ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE() 8.0.2
Value LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE() 8.0.2
Distribution PERCENT_RANK(), CUME_DIST() 8.0.2
Aggregate Windows SUM(), AVG(), COUNT(), MIN(), MAX(), BIT_AND(), BIT_OR(), BIT_XOR(), STDDEV_POP(), STDDEV_SAMP(), VAR_POP(), VAR_SAMP(), JSON_ARRAYAGG(), JSON_OBJECTAGG() 8.0.2
Frame Modes ROWS, RANGE, GROUPS 8.0.2 (GROUPS 8.0.2)
Frame Exclusion EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES, EXCLUDE NO OTHERS Not supported
Named Windows WINDOW w AS (...) 8.0.2

Known Limitations

FILTER Clause Workaround

-- Standard SQL (PostgreSQL supports, MySQL does not):
SELECT COUNT(*) FILTER (WHERE status = 'active') OVER (PARTITION BY dept)
FROM employees;

-- MySQL workaround:
SELECT
  SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) OVER (PARTITION BY dept) AS active_count
FROM employees;

COUNT(DISTINCT) OVER() Workaround

-- Not supported:
-- SELECT COUNT(DISTINCT category) OVER (PARTITION BY dept) FROM products;

-- Workaround 1: Subquery
SELECT p.*,
  (SELECT COUNT(DISTINCT p2.category) FROM products p2 WHERE p2.dept = p.dept) AS distinct_cats
FROM products p;

-- Workaround 2: DENSE_RANK trick
SELECT *,
  MAX(dr) OVER (PARTITION BY dept) AS distinct_cats
FROM (
  SELECT *, DENSE_RANK() OVER (PARTITION BY dept ORDER BY category) AS dr
  FROM products
) t;

Execution Order

Window functions sit at the SELECT stage in SQL's logical execution order:

  1. FROM / JOIN
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. Window Functions (WINDOW)
  6. SELECT
  7. DISTINCT
  8. ORDER BY
  9. LIMIT

This means window functions can reference aliased aggregate results in the SELECT list, but you cannot use window function results in WHERE or HAVING.

Frequently Asked Questions (FAQ)

Q1: What is the difference between window functions and GROUP BY?

GROUP BY collapses multiple rows into one, outputting a single result per group. Window functions preserve all original rows and attach aggregate/ranking computations to each row. For example, GROUP BY department returns one row per department, while SUM(salary) OVER (PARTITION BY department) preserves every employee row with the department total attached to each.

Q2: Why does LAST_VALUE() always return the current row's value?

Because the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The end of the frame is the current row, so LAST_VALUE returns the current row's value. To get the true last value in the partition, you must explicitly set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Q3: Can I use window functions in the WHERE clause?

No. Window functions execute after WHERE. To filter on window function results, wrap the query in a subquery or CTE: SELECT * FROM (SELECT *, ROW_NUMBER() OVER (...) AS rn FROM t) sub WHERE rn = 1;

Q4: Do window functions cause sorting? How to optimize?

Almost always. Both PARTITION BY and ORDER BY require sorting. Optimization strategies: (1) Create composite indexes for PARTITION BY + ORDER BY columns; (2) Use the WINDOW clause to reuse the same window definition so the optimizer sorts only once; (3) Filter with WHERE first to reduce data volume; (4) If you only need aggregation without ordering, omit ORDER BY.

Q5: How to handle non-deterministic ROW_NUMBER() ordering for equal values?

Add a unique column (typically the primary key) as a tiebreaker in ORDER BY: ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC, id ASC). This ensures deterministic ordering even when salary values are identical.

Q6: How to achieve window function equivalents in MySQL 5.7?

MySQL 5.7 does not support window functions. Common workarounds: (1) Ranking via user variables @rn := @rn + 1 (unreliable in MySQL 8.0); (2) Running totals via correlated subqueries; (3) LAG/LEAD via self-joins. These alternatives typically perform much worse. We recommend upgrading to MySQL 8.0.

Q7: Do multiple window functions cause multiple sorts?

It depends on whether PARTITION BY and ORDER BY are the same. Multiple functions with the same window definition only sort once. Different window definitions require separate sorts. Use the WINDOW clause to reuse window definitions and let the optimizer merge sorts. Check actual sort counts via EXPLAIN FORMAT=TREE.

Q8: How to choose between ROWS and RANGE frame modes?

ROWS: Based on physical row count; more predictable behavior and better performance. Preferred in most cases. RANGE: Based on value range; use when the ORDER BY column has duplicates and you need to treat equal values as a group. Required for time-interval-based frames (e.g., RANGE BETWEEN INTERVAL 7 DAY PRECEDING). GROUPS: Based on peer groups; between ROWS and RANGE, suitable for peer-group-level computation.

Q9: Can window functions be used in UPDATE statements?

Not directly in the UPDATE SET clause. But you can use a subquery JOIN: UPDATE orders o JOIN (SELECT id, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn FROM orders) t ON o.id = t.id SET o.seq = t.rn;

Q10: Any caveats when combining window functions with LIMIT?

LIMIT executes after window functions, so window functions compute on all rows first before truncation. If you only need Top-N, the window function still processes all data. For large tables, filter with WHERE first, or consider whether a simple ORDER BY + LIMIT approach (without window functions) meets your needs.

Rate this chapter
4.8  / 5  (3 ratings)

💬 Comments