Chapter 3

Prefix Sums and Difference Arrays

Chapter 3: Prefix Sums and Difference Arrays

You have an array of 1 million elements and need to answer 100,000 queries of the form "what is the sum from index l to index r?" The naive approachโ€”iterating from l to r for each queryโ€”amounts to 10ยนยน operations total. But if you spend O(n) time on a single preprocessing pass, every subsequent query costs just O(1). This is not wizardry; it is a devastatingly simple idea: store the running total of everything before each position.

Prefix sums and difference arrays are among the most elegant examples of the "precompute to accelerate" paradigm in algorithm design. They require no sophisticated data structures, no advanced mathematicsโ€”just one addition and one subtraction. Yet this very simplicity makes them a frequent interview trap, because simple concepts breed subtle implementation errors.


Level 1 ยท What You Need to Know

3.1 The Core Idea Behind Prefix Sums

Principle: If you precompute the cumulative sum from the beginning of the array to every position, then the sum of any contiguous subarray can be obtained by subtracting two precomputed values.

Think of kilometer markers on a highway. The marker tells you the distance from the starting point to your current location. To find the distance between marker 30 and marker 75, you don't re-measureโ€”you subtract: 75 - 30 = 45 km.

Prefix sums are those kilometer markers.

Given an array nums of length n, the prefix sum array prefix is defined as:

prefix[0] = 0
prefix[1] = nums[0]
prefix[2] = nums[0] + nums[1]
...
prefix[i] = nums[0] + nums[1] + ... + nums[i-1]

That is, prefix[i] stores the sum of the first i elements. Note that we define prefix[0] = 0, making the prefix array of length n+1.

3.2 One-Dimensional Prefix Sums: Construction and Queries

Building the prefix sum array:

def build_prefix_sum(nums: list[int]) -> list[int]:
    """Build prefix sum array. prefix[i] = sum(nums[0..i-1])"""
    n = len(nums)
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i + 1] = prefix[i] + nums[i]
    return prefix

Time complexity: O(n) โ€” single pass. Space complexity: O(n) โ€” storing the prefix array.

Range queries:

To query the sum of nums[l..r] (inclusive on both ends):

def range_sum(prefix: list[int], l: int, r: int) -> int:
    """Query sum of nums[l..r] in O(1)"""
    return prefix[r + 1] - prefix[l]

Why prefix[r+1] - prefix[l]?

prefix[r+1] = nums[0] + nums[1] + ... + nums[r]
prefix[l]   = nums[0] + nums[1] + ... + nums[l-1]
Subtracting: nums[l] + nums[l+1] + ... + nums[r]

Exactly the sum of the interval [l, r].

Complete example:

nums = [3, 1, 4, 1, 5, 9, 2, 6]
prefix = build_prefix_sum(nums)
# prefix = [0, 3, 4, 8, 9, 14, 23, 25, 31]

# Query nums[2..5] = 4 + 1 + 5 + 9 = 19
print(range_sum(prefix, 2, 5))  # prefix[6] - prefix[2] = 23 - 4 = 19

3.3 Performance Comparison: Why Prefix Sums Matter

Suppose the array has n = 10โถ elements and there are q = 10โต queries:

Approach Preprocessing Per Query Total
Brute force O(1) O(n) O(q ร— n) = 10ยนยน
Prefix sum O(n) O(1) O(n + q) = 10โถ + 10โต

The gap is 100,000ร—. This is why prefix sums should be your first instinct whenever you see "multiple range sum queries."

3.4 Two-Dimensional Prefix Sums: Matrix Region Queries

When data extends from one dimension to two, the prefix sum idea still applies. Given an mร—n matrix, we want to quickly query the sum of any rectangular subregion.

Definition: prefix[i][j] represents the sum of all elements in the rectangle from (0,0) to (i-1, j-1).

Construction:

def build_2d_prefix_sum(matrix: list[list[int]]) -> list[list[int]]:
    """Build 2D prefix sum. prefix[i][j] = sum of matrix[0..i-1][0..j-1]"""
    if not matrix or not matrix[0]:
        return [[]]
    m, n = len(matrix), len(matrix[0])
    prefix = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m):
        for j in range(n):
            prefix[i+1][j+1] = (matrix[i][j]
                                + prefix[i][j+1]
                                + prefix[i+1][j]
                                - prefix[i][j])
    return prefix

Inclusion-exclusion principle: During construction, prefix[i+1][j+1] equals the current element plus the "prefix sum above" and the "prefix sum to the left," minus the "prefix sum at the upper-left corner" (which was counted twice).

Visualizing the inclusion-exclusion:

+-------+-------+
|       |       |
|   A   |   B   |
|       |       |
+-------+-------+
|       |       |
|   C   |   D   |
|       |       |
+-------+-------+

prefix(bottom-right of D) = A + B + C + D
prefix(bottom-right of B) = A + B
prefix(bottom-right of C) = A + C
prefix(bottom-right of A) = A

Therefore: D = prefix(D) - prefix(B) - prefix(C) + prefix(A)

Query: Sum of the rectangle from (r1, c1) to (r2, c2):

def range_sum_2d(prefix: list[list[int]], r1: int, c1: int, r2: int, c2: int) -> int:
    """Query sum of matrix[r1..r2][c1..c2] in O(1)"""
    return (prefix[r2+1][c2+1]
            - prefix[r1][c2+1]
            - prefix[r2+1][c1]
            + prefix[r1][c1])

Complete example:

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]
prefix = build_2d_prefix_sum(matrix)
# Query (1,1) to (2,2): 5+6+8+9 = 28
print(range_sum_2d(prefix, 1, 1, 2, 2))  # 28

3.5 Difference Arrays: The Weapon for Range Updates

Problem: Given an array, perform q range update operations (add val to every element in [l, r]), then output the final array.

Naive approach: iterate [l, r] for each operation โ€” total O(q ร— n).

A difference array is the inverse of a prefix sum. If prefix sums "accumulate," difference arrays "decompose."

Definition: Given array nums, its difference array diff is:

diff[0] = nums[0]
diff[i] = nums[i] - nums[i-1]  (i >= 1)

Key property: Taking the prefix sum of the difference array reconstructs the original array.

nums[0] = diff[0]
nums[1] = diff[0] + diff[1]
nums[2] = diff[0] + diff[1] + diff[2]
...
nums[i] = sum(diff[0..i])

O(1) range update:

To add val to every element in nums[l..r]:

diff[l] += val
if r + 1 < n:
    diff[r + 1] -= val

Why does this work? When we reconstruct the array via prefix sum:

Net effect: only elements in [l, r] receive the addition.

Complete implementation:

def range_add(n: int, operations: list[tuple[int, int, int]]) -> list[int]:
    """
    Apply multiple range-add operations to a zero-initialized array of length n.
    operations: [(l, r, val), ...] means add val to every element in [l, r]
    Returns the final array.
    """
    diff = [0] * n
    for l, r, val in operations:
        diff[l] += val
        if r + 1 < n:
            diff[r + 1] -= val
    
    # Reconstruct via prefix sum
    result = [0] * n
    result[0] = diff[0]
    for i in range(1, n):
        result[i] = result[i - 1] + diff[i]
    return result


# Example:
# Apply three operations to array of length 6:
# [1,3] +2, [2,5] +3, [0,2] -1
ops = [(1, 3, 2), (2, 5, 3), (0, 2, -1)]
print(range_add(6, ops))
# [-1, 1, 4, 5, 3, 3]
# Verification: start with [0,0,0,0,0,0]
# +2 on [1,3]: [0,2,2,2,0,0]
# +3 on [2,5]: [0,2,5,5,3,3]
# -1 on [0,2]: [-1,1,4,5,3,3] โœ“

3.6 Common Mistakes

Mistake 1: Off-by-one errors

The most frequent bug with prefix sums involves index offsets. Memorize this core formula:

# Sum of interval [l, r] (inclusive on both ends)
sum_lr = prefix[r + 1] - prefix[l]

Common incorrect versions:

# Wrong! This misses nums[l]
sum_lr = prefix[r] - prefix[l]

# Wrong! This includes nums[r+1] (if r+1 < n)
sum_lr = prefix[r + 1] - prefix[l - 1]  # Also out of bounds when l=0

Tip: Make the prefix array one element longer (length n+1) with prefix[0] = 0. This eliminates the need to special-case l=0.

Mistake 2: Integer overflow

In Python, integers have arbitrary precision so overflow never occurs. But in C++/Java/Go:

# No problem in Python
nums = [10**9] * 100000
prefix = build_prefix_sum(nums)  # prefix[-1] = 10^14, Python handles it

# In C++ with int (32-bit), 10^14 far exceeds 2^31 - 1 โ‰ˆ 2.1 ร— 10^9
# Must use long long

Mistake 3: Boundary handling in difference arrays

# When r is the last element, r+1 = n, no need to subtract
diff[l] += val
if r + 1 < n:        # Do NOT forget this check!
    diff[r + 1] -= val

Level 2 ยท Going Deeper

3.7 Prefix Sum Variants

The core insight of prefix sumsโ€”"precompute prefix information to accelerate range queries"โ€”generalizes to any operator that has an inverse operation.

Prefix XOR:

XOR satisfies a ^ a = 0 and a ^ 0 = a, giving it a "self-cancellation" property analogous to subtraction for addition.

def build_prefix_xor(nums: list[int]) -> list[int]:
    """Build prefix XOR array"""
    n = len(nums)
    prefix_xor = [0] * (n + 1)
    for i in range(n):
        prefix_xor[i + 1] = prefix_xor[i] ^ nums[i]
    return prefix_xor

def range_xor(prefix_xor: list[int], l: int, r: int) -> int:
    """Query XOR of nums[l..r]"""
    return prefix_xor[r + 1] ^ prefix_xor[l]

Why can XOR serve as a "prefix sum"? Because XOR is its own inverse: a ^ b ^ b = a. So prefix_xor[r+1] ^ prefix_xor[l] cancels out the [0, l-1] portion.

Prefix Product:

Multiplication's inverse is division. If the array contains no zeros, prefix products work:

def build_prefix_product(nums: list[int]) -> list[float]:
    """Prefix product, assuming no zero elements"""
    n = len(nums)
    prefix_prod = [1.0] * (n + 1)
    for i in range(n):
        prefix_prod[i + 1] = prefix_prod[i] * nums[i]
    return prefix_prod

def range_product(prefix_prod: list[float], l: int, r: int) -> float:
    """Range product query"""
    return prefix_prod[r + 1] / prefix_prod[l]

Note: division introduces floating-point precision issues, and zeros cannot be handled. LeetCode 238 ("Product of Array Except Self") exploits this limitation to motivate a more clever approach (see Level 4).

Prefix GCD:

GCD has no inverse, so you cannot use "right endpoint minus left endpoint" for arbitrary range queries. Prefix GCD can only answer "GCD of the first i elements," not arbitrary range GCD queries. This is the boundary of applicability for the prefix sum technique.

Summary โ€” which operations support the prefix sum trick:

Operation Inverse Supports range query?
Addition (+) Subtraction (-) Yes
XOR (^) XOR (^) Yes
Multiplication (ร—) Division (รท) Yes (if no zeros)
Maximum (max) None No
Minimum (min) None No
GCD None No

For operations without inverses (max, min, GCD), range queries require Sparse Tables or Segment Trees.

3.8 The Subarray Sum Problem: Subarrays with Sum K

Problem (LeetCode 560): Given an integer array nums and an integer k, find the total number of contiguous subarrays whose sum equals k.

Key insight: Subarray [l, r] has sum k if and only if prefix[r+1] - prefix[l] = k, i.e., prefix[l] = prefix[r+1] - k.

So the question becomes: for each position r, how many previous positions l satisfy prefix[l] = prefix[r+1] - k?

This is a "two-sum difference" problem, solvable with a hash map in O(1) per lookup.

from collections import defaultdict

def subarray_sum(nums: list[int], k: int) -> int:
    """
    Count contiguous subarrays with sum equal to k.
    Time O(n), Space O(n).
    """
    count = 0
    prefix_sum = 0
    # Track frequency of each prefix sum value
    # Initialize {0: 1} for the "empty prefix" with sum 0
    freq = defaultdict(int)
    freq[0] = 1
    
    for num in nums:
        prefix_sum += num
        # If prefix_sum - k appeared c times before,
        # there are c starting positions yielding sum k
        count += freq[prefix_sum - k]
        freq[prefix_sum] += 1
    
    return count

Why initialize freq[0] = 1?

This accounts for the "empty prefix." When prefix_sum itself equals k, the subarray from the start to the current position has sum k. We need freq[prefix_sum - k] = freq[0] to be 1.

Time complexity: O(n) โ€” one pass, hash map operations amortized O(1). Space complexity: O(n) โ€” the hash map stores at most n distinct prefix sums.

This technique reappears in Chapter 8 (Hash Tables) as a core case of the "prefix sum + hash map" combination.

3.9 Difference Arrays in Practice: Corporate Flight Bookings

Problem (LeetCode 1109): There are n flights (numbered 1 to n). Given a list of bookings bookings[i] = [first_i, last_i, seats_i] meaning seats_i seats are booked on every flight from first_i to last_i (inclusive), return an array answer where answer[i] is the total seats reserved for flight i+1.

This is the canonical use case for difference arrays: batch range updates, single final reconstruction.

def corp_flight_bookings(bookings: list[list[int]], n: int) -> list[int]:
    """Corporate Flight Bookings โ€” difference array approach"""
    diff = [0] * n
    for first, last, seats in bookings:
        # Problem uses 1-indexed flights; convert to 0-indexed
        diff[first - 1] += seats
        if last < n:
            diff[last] -= seats
    
    # Reconstruct via prefix sum
    answer = [0] * n
    answer[0] = diff[0]
    for i in range(1, n):
        answer[i] = answer[i - 1] + diff[i]
    return answer

Performance analysis:

When m = 2 ร— 10โด and n = 2 ร— 10โด, brute force = 4 ร— 10โธ operations (TLE), difference = 4 ร— 10โด operations.

3.10 Two-Dimensional Difference Arrays

One-dimensional difference handles range updates; two-dimensional difference handles rectangular region updates.

Problem: Given an mร—n matrix, perform q operations each adding val to all elements in rectangle (r1,c1)-(r2,c2), then output the final matrix.

def apply_2d_range_add(m: int, n: int, 
                       operations: list[tuple[int,int,int,int,int]]) -> list[list[int]]:
    """
    2D difference array for rectangular region updates.
    operations: [(r1, c1, r2, c2, val), ...]
    """
    # Build 2D difference array
    diff = [[0] * (n + 1) for _ in range(m + 1)]
    
    for r1, c1, r2, c2, val in operations:
        diff[r1][c1] += val
        diff[r1][c2 + 1] -= val
        diff[r2 + 1][c1] -= val
        diff[r2 + 1][c2 + 1] += val
    
    # Reconstruct via 2D prefix sum
    result = [[0] * n for _ in range(m)]
    for i in range(m):
        for j in range(n):
            diff[i][j] += (diff[i-1][j] if i > 0 else 0)
            diff[i][j] += (diff[i][j-1] if j > 0 else 0)
            diff[i][j] -= (diff[i-1][j-1] if i > 0 and j > 0 else 0)
            result[i][j] = diff[i][j]
    
    return result

Inclusion-exclusion again: The 2D difference update marks four corners:

This is perfectly symmetric with the 2D prefix sum inclusion-exclusion.

3.11 Prefix Sums vs. Sliding Window

Prefix sums and sliding windows (Chapter 23) solve the same category of problemsโ€”subarray/range queriesโ€”but under different constraints:

Dimension Prefix Sum Sliding Window
Applicability Unrestricted (handles negatives) Usually needs monotonicity (non-negative elements, increasing subarray sums, etc.)
Query type Exact value (sum equals K) Longest/shortest interval satisfying condition
Time complexity O(n) (with hash map) O(n)
Space complexity O(n) (hash map / prefix array) O(1) to O(k)

Rules of thumb:

The two techniques are sometimes interchangeable, but when negatives are present, the sliding window typically fails (expanding the window doesn't guarantee increasing sum), making prefix sums the only O(n) option.


Level 3 ยท Mathematical and Engineering Perspectives

3.12 Mathematical Foundations of Prefix Sums

Prefix sums are the discrete version of partial sums. In mathematics:

$$S_n = \sum_{i=0}^{n-1} a_i$$

And a range sum is:

$$\sum_{i=l}^{r} a_i = S_{r+1} - S_l$$

This identity holds because addition satisfies associativity and commutativity, and subtraction is the inverse of additionโ€”properties of the additive group (Z, +).

More generally, if an operation โŠ• forms a group (identity element, closure, associativity, inverse elements), then the prefix "sum" technique generalizes.

But (Z, max) does not form a group (max has no inverse), so the prefix sum trick cannot provide O(1) range maximum queries.

The duality between differencing and summation:

The difference operator ฮ” is defined as: ฮ”(f)(i) = f(i) - f(i-1)

The summation operator ฮฃ is defined as: ฮฃ(f)(i) = f(0) + f(1) + ... + f(i)

They are mutual inverses: ฮฃ(ฮ”(f)) = f and ฮ”(ฮฃ(f)) = f

This is a direct analogy to differentiation and integration in calculus.

3.13 The Analogy Between Prefix Sums and Integration

Discrete (Arrays) Continuous (Functions)
Array a[i] Function f(x)
Prefix sum S[n] = ฮฃa[i] Integral F(x) = โˆซf(t)dt
Difference ฮ”a[i] = a[i] - a[i-1] Derivative f'(x) = df/dx
Range sum = S[r+1] - S[l] Definite integral = F(b) - F(a)
Reconstruct from difference = prefix sum Antiderivative + differentiation = identity

This analogy is not merely conceptualโ€”the Fundamental Theorem of Calculus โˆซ[a,b] f(x)dx = F(b) - F(a) is structurally identical to the prefix sum formula sum(l,r) = prefix[r+1] - prefix[l].

The discrete Newton-Leibniz formula is exactly the prefix sum range query formula. This unified perspective helps you understand why differencing and prefix sums are inverse operationsโ€”just as differentiation and integration are inverse operations.

3.14 Prefix Sums in Database Systems

In database systems, prefix sums appear in multiple forms:

Running Aggregation:

-- Compute cumulative daily sales
SELECT 
    date,
    daily_sales,
    SUM(daily_sales) OVER (ORDER BY date) AS cumulative_sales
FROM sales;

The window function SUM(...) OVER (ORDER BY ...) is essentially computing a prefix sum. Internally, the database engine typically maintains an accumulator and scans the data onceโ€”identical to our build_prefix_sum.

Window functions for range queries:

-- Compute 7-day moving average
SELECT 
    date,
    AVG(daily_sales) OVER (
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM sales;

Although SQL window function syntax hides implementation details, many query optimizers use prefix sum ideas under the hood: compute the running total first, then subtract two running totals to obtain any window's sum.

Materialized views and incremental maintenance:

In data warehouses, precomputed aggregates (materialized views) are essentially "prefix information computed in advance." When underlying data changes, the differencing idea is used for incremental updatesโ€”processing only the delta rather than recomputing everything.

3.15 Prefix Sums on Trees

When data structures extend from linear arrays to trees, prefix sums can be realized via DFS ordering.

Problem: Given a weighted tree, answer multiple queries for the sum of edge weights (or node values) on the path from the root to a given node.

Approach 1: Precompute distances via DFS

def build_tree_prefix(graph: dict, root: int, 
                      weights: dict) -> dict[int, int]:
    """
    Precompute path weight sum from root to every node.
    graph: adjacency list
    weights: edge weights {(u,v): w}
    """
    dist = {root: 0}
    stack = [root]
    visited = {root}
    
    while stack:
        u = stack.pop()
        for v in graph[u]:
            if v not in visited:
                visited.add(v)
                edge_weight = weights.get((u, v), weights.get((v, u), 0))
                dist[v] = dist[u] + edge_weight
                stack.append(v)
    return dist

With root-to-node "prefix sums," the path sum between any two nodes becomes:

def path_sum(dist: dict, lca_func, u: int, v: int) -> int:
    """Path sum from u to v = dist[u] + dist[v] - 2 * dist[LCA(u,v)]"""
    ancestor = lca_func(u, v)
    return dist[u] + dist[v] - 2 * dist[ancestor]

Here LCA (Lowest Common Ancestor) plays the role analogous to "subtracting the common prefix" in 1D prefix sums.

Approach 2: Euler tour (DFS order) + 1D prefix sums

Flattening a tree into a DFS sequence converts subtree operations into contiguous range operations:

def dfs_order(graph: dict, root: int) -> tuple[list[int], dict, dict]:
    """
    Compute DFS order (Euler tour).
    Returns: (sequence, entry_time tin, exit_time tout)
    Subtree of node u corresponds to interval [tin[u], tout[u]]
    """
    order = []
    tin = {}
    tout = {}
    timer = [0]
    visited = set()
    
    def dfs(u):
        visited.add(u)
        tin[u] = timer[0]
        order.append(u)
        timer[0] += 1
        for v in graph[u]:
            if v not in visited:
                dfs(v)
        tout[u] = timer[0] - 1
    
    dfs(root)
    return order, tin, tout

Once the tree is linearized, "subtree sum" becomes "range sum," directly solvable with 1D prefix sums. This technique is ubiquitous in competitive programming and forms the foundation for segment tree operations on trees.

3.16 Higher-Dimensional Prefix Sums

Two-dimensional prefix sums generalize to arbitrary dimensions. In d-dimensional space, inclusion-exclusion involves 2^d vertices.

Three-dimensional prefix sum construction and query:

def build_3d_prefix_sum(arr, X, Y, Z):
    """3D prefix sum"""
    prefix = [[[0]*(Z+1) for _ in range(Y+1)] for _ in range(X+1)]
    for i in range(X):
        for j in range(Y):
            for k in range(Z):
                prefix[i+1][j+1][k+1] = (arr[i][j][k]
                    + prefix[i][j+1][k+1]
                    + prefix[i+1][j][k+1]
                    + prefix[i+1][j+1][k]
                    - prefix[i][j][k+1]
                    - prefix[i][j+1][k]
                    - prefix[i+1][j][k]
                    + prefix[i][j][k])
    return prefix

The 3D inclusion-exclusion has 8 terms (2ยณ = 8), alternating signs. In practice, prefix sums beyond three dimensions are rare because space complexity is O(n^d), growing exponentially.

Practical applications of higher-dimensional prefix sums:


Level 4 ยท Interview Practice

3.17 High-Frequency Interview Problems

Problem 1: Subarray Sum Equals K (LeetCode 560)

We presented the solution in Section 3.8. Here we address common follow-up questions in interviews.

Follow-up 1: If all elements are positive, is there a better approach?

All positives means the prefix sum is strictly increasing, allowing a sliding window (two pointers) with O(1) space. But the original problem includes negatives and zeros, making the prefix sum non-monotonic and invalidating the sliding window.

Follow-up 2: Can we find all subarrays with sum K (not just count)?

Yes, but we need to record the position list for each prefix sum value, not just the count:

from collections import defaultdict

def find_all_subarrays_with_sum_k(nums: list[int], k: int) -> list[tuple[int, int]]:
    """Return all (start, end) pairs of subarrays with sum k"""
    result = []
    prefix_sum = 0
    # Record positions where each prefix sum value occurred
    positions = defaultdict(list)
    positions[0].append(-1)  # "Empty prefix" position is -1
    
    for i, num in enumerate(nums):
        prefix_sum += num
        target = prefix_sum - k
        if target in positions:
            for start_pos in positions[target]:
                result.append((start_pos + 1, i))
        positions[prefix_sum].append(i)
    
    return result

Note: In the worst case (e.g., all zeros with k=0), the number of results is O(nยฒ), so time complexity is also O(nยฒ).

Problem 2: Continuous Subarray Sum (LeetCode 523)

Problem: Given a non-negative integer array nums and an integer k, determine whether there exists a contiguous subarray of length at least 2 whose sum is a multiple of k.

Key insight: If prefix[j] - prefix[i] is a multiple of k (with j - i >= 2), this is equivalent to prefix[j] % k == prefix[i] % k.

So we use a hash map to record the first occurrence of each "prefix sum modulo k" remainder. If the same remainder appears again with a gap of at least 2, we found the answer.

def check_subarray_sum(nums: list[int], k: int) -> bool:
    """
    Check if subarray of length >= 2 with sum divisible by k exists.
    Time O(n), Space O(min(n, k)).
    """
    # Record first index where each remainder was seen
    remainder_index = {0: -1}  # Empty prefix has remainder 0 at position -1
    prefix_sum = 0
    
    for i, num in enumerate(nums):
        prefix_sum += num
        remainder = prefix_sum % k if k != 0 else prefix_sum
        
        if remainder in remainder_index:
            # Ensure subarray length is at least 2
            if i - remainder_index[remainder] >= 2:
                return True
        else:
            remainder_index[remainder] = i
    
    return False

Important details:

Problem 3: Product of Array Except Self (LeetCode 238)

Problem: Given array nums, return array answer where answer[i] equals the product of all elements except nums[i]. Required: O(n) time and no division.

Approach: answer[i] = product of all elements to the left ร— product of all elements to the right. Use prefix product and suffix product.

def product_except_self(nums: list[int]) -> list[int]:
    """
    Product of array except self.
    Time O(n), Space O(1) (excluding output array).
    """
    n = len(nums)
    answer = [1] * n
    
    # First pass: left to right, answer[i] = product of all elements left of i
    left_product = 1
    for i in range(n):
        answer[i] = left_product
        left_product *= nums[i]
    
    # Second pass: right to left, multiply by product of all elements right of i
    right_product = 1
    for i in range(n - 1, -1, -1):
        answer[i] *= right_product
        right_product *= nums[i]
    
    return answer

Why no division?

  1. The problem explicitly forbids it
  2. If the array contains zeros, division breaks
  3. This solution demonstrates the general "prefix and suffix" bidirectional scan technique

Space optimization: The above uses only the output array and two variablesโ€”O(1) extra space (the problem states the output array doesn't count). This is more elegant than maintaining two full prefix/suffix arrays.

Problem 4: Corporate Flight Bookings (LeetCode 1109)

Covered in detail in Section 3.9. Interview follow-up:

Follow-up: What if updates and queries are interleaved rather than batched?

Difference arrays are ideal for "do all updates first, then reconstruct once." If updates and queries alternate, you need more powerful structuresโ€”segment trees (supporting O(log n) range updates and range queries) or Binary Indexed Trees (Fenwick trees).

3.18 Handling Overflow in Prefix Sums

In languages with fixed-precision integers (C++/Java/Go), prefix sums are prone to overflow.

Scenario: Element values |a[i]| โ‰ค 10โน, array length n โ‰ค 10โต. Maximum prefix sum is approximately 10ยนโด, exceeding 32-bit integer range (โ‰ˆ 2.1 ร— 10โน) but within 64-bit range (โ‰ˆ 9.2 ร— 10ยนโธ).

Python advantage: Python integers never overflow, at the cost of slower arithmetic (large integers require more memory allocation and computation).

Modular arithmetic considerations:

Some problems require "answer modulo 10โน+7," in which case prefix sums must also be taken modulo:

MOD = 10**9 + 7

def build_prefix_sum_mod(nums: list[int]) -> list[int]:
    n = len(nums)
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i + 1] = (prefix[i] + nums[i]) % MOD
    return prefix

Trap: Subtraction after modulo can produce negative results!

def range_sum_mod(prefix: list[int], l: int, r: int) -> int:
    """Range sum under modular arithmetic"""
    return (prefix[r + 1] - prefix[l]) % MOD
    # In Python, % always returns non-negative, so this is fine
    # But in C++: (-3) % 7 = -3 (not 4), must add MOD manually
    # C++ version: return ((prefix[r+1] - prefix[l]) % MOD + MOD) % MOD

Python's % behavior: (-3) % 7 = 4 (Python) vs (-3) % 7 = -3 (C++/Java). Python's behavior is more convenient for algorithm problems, but if you use other languages, you must be aware of this difference.

3.19 Prefix Sums vs. Segment Trees: Static vs. Dynamic

Dimension Prefix Sum Segment Tree
Build time O(n) O(n)
Single query O(1) O(log n)
Single update O(n) (rebuild) O(log n)
Space O(n) O(n)
Implementation complexity Very low (few lines) Moderate (50-100 lines)
Use case Static data โ€” query only Dynamic data โ€” interleaved updates and queries

When to use prefix sums:

When to use segment trees:

Decision flowchart:

Are there update operations?
โ”œโ”€โ”€ No โ†’ Use prefix sums (O(1) queries)
โ””โ”€โ”€ Yes โ†’ Are updates batched or interleaved?
    โ”œโ”€โ”€ All updates first, then query once โ†’ Difference array
    โ””โ”€โ”€ Updates and queries interleaved โ†’ Segment tree / BIT

3.20 Comprehensive Practice and Pattern Recognition

Pattern 1: Prefix sum + hash map (counting problems)

# Template: count subarrays satisfying some range property
def count_subarrays_with_property(nums, target):
    count = 0
    prefix = 0  # or another cumulative quantity
    seen = defaultdict(int)
    seen[0] = 1  # empty prefix
    
    for num in nums:
        prefix = update(prefix, num)  # accumulate
        # Check how many previous prefix values satisfy the condition
        count += seen[complement(prefix, target)]
        seen[prefix] += 1
    
    return count

Applicable problems: LeetCode 560, 523, 974, 930, 1248

Pattern 2: Prefix/suffix bidirectional scan

# Template: need information from both left and right
def bidirectional_scan(nums):
    n = len(nums)
    left_info = compute_prefix(nums)    # left to right
    right_info = compute_suffix(nums)   # right to left
    
    answer = [0] * n
    for i in range(n):
        answer[i] = combine(left_info[i], right_info[i])
    return answer

Applicable problems: LeetCode 238, 42 (Trapping Rain Water via prefix max), 135

Pattern 3: Difference array (batch range updates)

# Template: multiple range updates + final reconstruction
def batch_range_update(n, operations):
    diff = [0] * (n + 1)  # extra element avoids boundary checks
    for l, r, val in operations:
        diff[l] += val
        diff[r + 1] -= val
    
    # Prefix sum to reconstruct
    for i in range(1, n):
        diff[i] += diff[i - 1]
    return diff[:n]

Applicable problems: LeetCode 1109, 1094, 370, 253

3.21 Chapter Summary

Prefix sums and difference arrays are a dual pair of inverse techniques:

Their domain of applicability is static or semi-static scenariosโ€”if data changes frequently, upgrade to segment trees or Binary Indexed Trees.

Key takeaways:

  1. Make the prefix array one element longer; prefix[0] = 0
  2. Sum of [l, r] = prefix[r+1] - prefix[l]
  3. Difference array update on [l, r]: diff[l] += val, diff[r+1] -= val
  4. Prefix sum + hash map = universal combo for "sum equals K" problems
  5. 2D operations use inclusion-exclusion: add, add, subtract, add

In the next chapter, we will study sorting algorithms. From the simplest bubble sort to industrial-grade Timsort, sorting is not only the most fundamental algorithmic problem but also an excellent training ground for understanding divide-and-conquer thinking and algorithm analysis.

Rate this chapter
4.8  / 5  (95 ratings)

๐Ÿ’ฌ Comments