Prefix Sums and Difference Arrays
Chapter 3: Prefix Sums and Difference Arrays
You have an array of 1 million elements and need to answer 100,000 queries of the form "what is the sum from index l to index r?" The naive approachโiterating from l to r for each queryโamounts to 10ยนยน operations total. But if you spend O(n) time on a single preprocessing pass, every subsequent query costs just O(1). This is not wizardry; it is a devastatingly simple idea: store the running total of everything before each position.
Prefix sums and difference arrays are among the most elegant examples of the "precompute to accelerate" paradigm in algorithm design. They require no sophisticated data structures, no advanced mathematicsโjust one addition and one subtraction. Yet this very simplicity makes them a frequent interview trap, because simple concepts breed subtle implementation errors.
Level 1 ยท What You Need to Know
3.1 The Core Idea Behind Prefix Sums
Principle: If you precompute the cumulative sum from the beginning of the array to every position, then the sum of any contiguous subarray can be obtained by subtracting two precomputed values.
Think of kilometer markers on a highway. The marker tells you the distance from the starting point to your current location. To find the distance between marker 30 and marker 75, you don't re-measureโyou subtract: 75 - 30 = 45 km.
Prefix sums are those kilometer markers.
Given an array nums of length n, the prefix sum array prefix is defined as:
prefix[0] = 0
prefix[1] = nums[0]
prefix[2] = nums[0] + nums[1]
...
prefix[i] = nums[0] + nums[1] + ... + nums[i-1]
That is, prefix[i] stores the sum of the first i elements. Note that we define prefix[0] = 0, making the prefix array of length n+1.
3.2 One-Dimensional Prefix Sums: Construction and Queries
Building the prefix sum array:
def build_prefix_sum(nums: list[int]) -> list[int]:
"""Build prefix sum array. prefix[i] = sum(nums[0..i-1])"""
n = len(nums)
prefix = [0] * (n + 1)
for i in range(n):
prefix[i + 1] = prefix[i] + nums[i]
return prefix
Time complexity: O(n) โ single pass. Space complexity: O(n) โ storing the prefix array.
Range queries:
To query the sum of nums[l..r] (inclusive on both ends):
def range_sum(prefix: list[int], l: int, r: int) -> int:
"""Query sum of nums[l..r] in O(1)"""
return prefix[r + 1] - prefix[l]
Why prefix[r+1] - prefix[l]?
prefix[r+1] = nums[0] + nums[1] + ... + nums[r]
prefix[l] = nums[0] + nums[1] + ... + nums[l-1]
Subtracting: nums[l] + nums[l+1] + ... + nums[r]
Exactly the sum of the interval [l, r].
Complete example:
nums = [3, 1, 4, 1, 5, 9, 2, 6]
prefix = build_prefix_sum(nums)
# prefix = [0, 3, 4, 8, 9, 14, 23, 25, 31]
# Query nums[2..5] = 4 + 1 + 5 + 9 = 19
print(range_sum(prefix, 2, 5)) # prefix[6] - prefix[2] = 23 - 4 = 19
3.3 Performance Comparison: Why Prefix Sums Matter
Suppose the array has n = 10โถ elements and there are q = 10โต queries:
| Approach | Preprocessing | Per Query | Total |
|---|---|---|---|
| Brute force | O(1) | O(n) | O(q ร n) = 10ยนยน |
| Prefix sum | O(n) | O(1) | O(n + q) = 10โถ + 10โต |
The gap is 100,000ร. This is why prefix sums should be your first instinct whenever you see "multiple range sum queries."
3.4 Two-Dimensional Prefix Sums: Matrix Region Queries
When data extends from one dimension to two, the prefix sum idea still applies. Given an mรn matrix, we want to quickly query the sum of any rectangular subregion.
Definition: prefix[i][j] represents the sum of all elements in the rectangle from (0,0) to (i-1, j-1).
Construction:
def build_2d_prefix_sum(matrix: list[list[int]]) -> list[list[int]]:
"""Build 2D prefix sum. prefix[i][j] = sum of matrix[0..i-1][0..j-1]"""
if not matrix or not matrix[0]:
return [[]]
m, n = len(matrix), len(matrix[0])
prefix = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(m):
for j in range(n):
prefix[i+1][j+1] = (matrix[i][j]
+ prefix[i][j+1]
+ prefix[i+1][j]
- prefix[i][j])
return prefix
Inclusion-exclusion principle: During construction, prefix[i+1][j+1] equals the current element plus the "prefix sum above" and the "prefix sum to the left," minus the "prefix sum at the upper-left corner" (which was counted twice).
Visualizing the inclusion-exclusion:
+-------+-------+
| | |
| A | B |
| | |
+-------+-------+
| | |
| C | D |
| | |
+-------+-------+
prefix(bottom-right of D) = A + B + C + D
prefix(bottom-right of B) = A + B
prefix(bottom-right of C) = A + C
prefix(bottom-right of A) = A
Therefore: D = prefix(D) - prefix(B) - prefix(C) + prefix(A)
Query: Sum of the rectangle from (r1, c1) to (r2, c2):
def range_sum_2d(prefix: list[list[int]], r1: int, c1: int, r2: int, c2: int) -> int:
"""Query sum of matrix[r1..r2][c1..c2] in O(1)"""
return (prefix[r2+1][c2+1]
- prefix[r1][c2+1]
- prefix[r2+1][c1]
+ prefix[r1][c1])
Complete example:
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
prefix = build_2d_prefix_sum(matrix)
# Query (1,1) to (2,2): 5+6+8+9 = 28
print(range_sum_2d(prefix, 1, 1, 2, 2)) # 28
3.5 Difference Arrays: The Weapon for Range Updates
Problem: Given an array, perform q range update operations (add val to every element in [l, r]), then output the final array.
Naive approach: iterate [l, r] for each operation โ total O(q ร n).
A difference array is the inverse of a prefix sum. If prefix sums "accumulate," difference arrays "decompose."
Definition: Given array nums, its difference array diff is:
diff[0] = nums[0]
diff[i] = nums[i] - nums[i-1] (i >= 1)
Key property: Taking the prefix sum of the difference array reconstructs the original array.
nums[0] = diff[0]
nums[1] = diff[0] + diff[1]
nums[2] = diff[0] + diff[1] + diff[2]
...
nums[i] = sum(diff[0..i])
O(1) range update:
To add val to every element in nums[l..r]:
diff[l] += val
if r + 1 < n:
diff[r + 1] -= val
Why does this work? When we reconstruct the array via prefix sum:
diff[l] += valcauses every position from l onward to gain valdiff[r+1] -= valcancels the extra val from position r+1 onward
Net effect: only elements in [l, r] receive the addition.
Complete implementation:
def range_add(n: int, operations: list[tuple[int, int, int]]) -> list[int]:
"""
Apply multiple range-add operations to a zero-initialized array of length n.
operations: [(l, r, val), ...] means add val to every element in [l, r]
Returns the final array.
"""
diff = [0] * n
for l, r, val in operations:
diff[l] += val
if r + 1 < n:
diff[r + 1] -= val
# Reconstruct via prefix sum
result = [0] * n
result[0] = diff[0]
for i in range(1, n):
result[i] = result[i - 1] + diff[i]
return result
# Example:
# Apply three operations to array of length 6:
# [1,3] +2, [2,5] +3, [0,2] -1
ops = [(1, 3, 2), (2, 5, 3), (0, 2, -1)]
print(range_add(6, ops))
# [-1, 1, 4, 5, 3, 3]
# Verification: start with [0,0,0,0,0,0]
# +2 on [1,3]: [0,2,2,2,0,0]
# +3 on [2,5]: [0,2,5,5,3,3]
# -1 on [0,2]: [-1,1,4,5,3,3] โ
3.6 Common Mistakes
Mistake 1: Off-by-one errors
The most frequent bug with prefix sums involves index offsets. Memorize this core formula:
# Sum of interval [l, r] (inclusive on both ends)
sum_lr = prefix[r + 1] - prefix[l]
Common incorrect versions:
# Wrong! This misses nums[l]
sum_lr = prefix[r] - prefix[l]
# Wrong! This includes nums[r+1] (if r+1 < n)
sum_lr = prefix[r + 1] - prefix[l - 1] # Also out of bounds when l=0
Tip: Make the prefix array one element longer (length n+1) with prefix[0] = 0. This eliminates the need to special-case l=0.
Mistake 2: Integer overflow
In Python, integers have arbitrary precision so overflow never occurs. But in C++/Java/Go:
# No problem in Python
nums = [10**9] * 100000
prefix = build_prefix_sum(nums) # prefix[-1] = 10^14, Python handles it
# In C++ with int (32-bit), 10^14 far exceeds 2^31 - 1 โ 2.1 ร 10^9
# Must use long long
Mistake 3: Boundary handling in difference arrays
# When r is the last element, r+1 = n, no need to subtract
diff[l] += val
if r + 1 < n: # Do NOT forget this check!
diff[r + 1] -= val
Level 2 ยท Going Deeper
3.7 Prefix Sum Variants
The core insight of prefix sumsโ"precompute prefix information to accelerate range queries"โgeneralizes to any operator that has an inverse operation.
Prefix XOR:
XOR satisfies a ^ a = 0 and a ^ 0 = a, giving it a "self-cancellation" property analogous to subtraction for addition.
def build_prefix_xor(nums: list[int]) -> list[int]:
"""Build prefix XOR array"""
n = len(nums)
prefix_xor = [0] * (n + 1)
for i in range(n):
prefix_xor[i + 1] = prefix_xor[i] ^ nums[i]
return prefix_xor
def range_xor(prefix_xor: list[int], l: int, r: int) -> int:
"""Query XOR of nums[l..r]"""
return prefix_xor[r + 1] ^ prefix_xor[l]
Why can XOR serve as a "prefix sum"? Because XOR is its own inverse: a ^ b ^ b = a. So prefix_xor[r+1] ^ prefix_xor[l] cancels out the [0, l-1] portion.
Prefix Product:
Multiplication's inverse is division. If the array contains no zeros, prefix products work:
def build_prefix_product(nums: list[int]) -> list[float]:
"""Prefix product, assuming no zero elements"""
n = len(nums)
prefix_prod = [1.0] * (n + 1)
for i in range(n):
prefix_prod[i + 1] = prefix_prod[i] * nums[i]
return prefix_prod
def range_product(prefix_prod: list[float], l: int, r: int) -> float:
"""Range product query"""
return prefix_prod[r + 1] / prefix_prod[l]
Note: division introduces floating-point precision issues, and zeros cannot be handled. LeetCode 238 ("Product of Array Except Self") exploits this limitation to motivate a more clever approach (see Level 4).
Prefix GCD:
GCD has no inverse, so you cannot use "right endpoint minus left endpoint" for arbitrary range queries. Prefix GCD can only answer "GCD of the first i elements," not arbitrary range GCD queries. This is the boundary of applicability for the prefix sum technique.
Summary โ which operations support the prefix sum trick:
| Operation | Inverse | Supports range query? |
|---|---|---|
| Addition (+) | Subtraction (-) | Yes |
| XOR (^) | XOR (^) | Yes |
| Multiplication (ร) | Division (รท) | Yes (if no zeros) |
| Maximum (max) | None | No |
| Minimum (min) | None | No |
| GCD | None | No |
For operations without inverses (max, min, GCD), range queries require Sparse Tables or Segment Trees.
3.8 The Subarray Sum Problem: Subarrays with Sum K
Problem (LeetCode 560): Given an integer array nums and an integer k, find the total number of contiguous subarrays whose sum equals k.
Key insight: Subarray [l, r] has sum k if and only if prefix[r+1] - prefix[l] = k, i.e., prefix[l] = prefix[r+1] - k.
So the question becomes: for each position r, how many previous positions l satisfy prefix[l] = prefix[r+1] - k?
This is a "two-sum difference" problem, solvable with a hash map in O(1) per lookup.
from collections import defaultdict
def subarray_sum(nums: list[int], k: int) -> int:
"""
Count contiguous subarrays with sum equal to k.
Time O(n), Space O(n).
"""
count = 0
prefix_sum = 0
# Track frequency of each prefix sum value
# Initialize {0: 1} for the "empty prefix" with sum 0
freq = defaultdict(int)
freq[0] = 1
for num in nums:
prefix_sum += num
# If prefix_sum - k appeared c times before,
# there are c starting positions yielding sum k
count += freq[prefix_sum - k]
freq[prefix_sum] += 1
return count
Why initialize freq[0] = 1?
This accounts for the "empty prefix." When prefix_sum itself equals k, the subarray from the start to the current position has sum k. We need freq[prefix_sum - k] = freq[0] to be 1.
Time complexity: O(n) โ one pass, hash map operations amortized O(1). Space complexity: O(n) โ the hash map stores at most n distinct prefix sums.
This technique reappears in Chapter 8 (Hash Tables) as a core case of the "prefix sum + hash map" combination.
3.9 Difference Arrays in Practice: Corporate Flight Bookings
Problem (LeetCode 1109): There are n flights (numbered 1 to n). Given a list of bookings bookings[i] = [first_i, last_i, seats_i] meaning seats_i seats are booked on every flight from first_i to last_i (inclusive), return an array answer where answer[i] is the total seats reserved for flight i+1.
This is the canonical use case for difference arrays: batch range updates, single final reconstruction.
def corp_flight_bookings(bookings: list[list[int]], n: int) -> list[int]:
"""Corporate Flight Bookings โ difference array approach"""
diff = [0] * n
for first, last, seats in bookings:
# Problem uses 1-indexed flights; convert to 0-indexed
diff[first - 1] += seats
if last < n:
diff[last] -= seats
# Reconstruct via prefix sum
answer = [0] * n
answer[0] = diff[0]
for i in range(1, n):
answer[i] = answer[i - 1] + diff[i]
return answer
Performance analysis:
- With m bookings, brute force is O(m ร n)
- Difference array approach is O(m + n)
When m = 2 ร 10โด and n = 2 ร 10โด, brute force = 4 ร 10โธ operations (TLE), difference = 4 ร 10โด operations.
3.10 Two-Dimensional Difference Arrays
One-dimensional difference handles range updates; two-dimensional difference handles rectangular region updates.
Problem: Given an mรn matrix, perform q operations each adding val to all elements in rectangle (r1,c1)-(r2,c2), then output the final matrix.
def apply_2d_range_add(m: int, n: int,
operations: list[tuple[int,int,int,int,int]]) -> list[list[int]]:
"""
2D difference array for rectangular region updates.
operations: [(r1, c1, r2, c2, val), ...]
"""
# Build 2D difference array
diff = [[0] * (n + 1) for _ in range(m + 1)]
for r1, c1, r2, c2, val in operations:
diff[r1][c1] += val
diff[r1][c2 + 1] -= val
diff[r2 + 1][c1] -= val
diff[r2 + 1][c2 + 1] += val
# Reconstruct via 2D prefix sum
result = [[0] * n for _ in range(m)]
for i in range(m):
for j in range(n):
diff[i][j] += (diff[i-1][j] if i > 0 else 0)
diff[i][j] += (diff[i][j-1] if j > 0 else 0)
diff[i][j] -= (diff[i-1][j-1] if i > 0 and j > 0 else 0)
result[i][j] = diff[i][j]
return result
Inclusion-exclusion again: The 2D difference update marks four corners:
- (r1, c1) add val: effect propagates to the bottom-right
- (r1, c2+1) subtract val: blocks rightward propagation
- (r2+1, c1) subtract val: blocks downward propagation
- (r2+1, c2+1) add val: compensates for double subtraction
This is perfectly symmetric with the 2D prefix sum inclusion-exclusion.
3.11 Prefix Sums vs. Sliding Window
Prefix sums and sliding windows (Chapter 23) solve the same category of problemsโsubarray/range queriesโbut under different constraints:
| Dimension | Prefix Sum | Sliding Window |
|---|---|---|
| Applicability | Unrestricted (handles negatives) | Usually needs monotonicity (non-negative elements, increasing subarray sums, etc.) |
| Query type | Exact value (sum equals K) | Longest/shortest interval satisfying condition |
| Time complexity | O(n) (with hash map) | O(n) |
| Space complexity | O(n) (hash map / prefix array) | O(1) to O(k) |
Rules of thumb:
- Array contains negative numbers? โ Prefix sum + hash map
- Find subarrays with "sum exactly K"? โ Prefix sum
- Find "longest subarray with sum โค K" and elements non-negative? โ Sliding window
- Find "shortest substring containing all characters"? โ Sliding window
The two techniques are sometimes interchangeable, but when negatives are present, the sliding window typically fails (expanding the window doesn't guarantee increasing sum), making prefix sums the only O(n) option.
Level 3 ยท Mathematical and Engineering Perspectives
3.12 Mathematical Foundations of Prefix Sums
Prefix sums are the discrete version of partial sums. In mathematics:
$$S_n = \sum_{i=0}^{n-1} a_i$$
And a range sum is:
$$\sum_{i=l}^{r} a_i = S_{r+1} - S_l$$
This identity holds because addition satisfies associativity and commutativity, and subtraction is the inverse of additionโproperties of the additive group (Z, +).
More generally, if an operation โ forms a group (identity element, closure, associativity, inverse elements), then the prefix "sum" technique generalizes.
- (Z, +) โ Prefix sums
- (Z, โ) where โ = XOR โ Prefix XOR (each element is its own inverse)
- (Rโบ, ร) โ Prefix products (inverse is reciprocal)
But (Z, max) does not form a group (max has no inverse), so the prefix sum trick cannot provide O(1) range maximum queries.
The duality between differencing and summation:
The difference operator ฮ is defined as: ฮ(f)(i) = f(i) - f(i-1)
The summation operator ฮฃ is defined as: ฮฃ(f)(i) = f(0) + f(1) + ... + f(i)
They are mutual inverses: ฮฃ(ฮ(f)) = f and ฮ(ฮฃ(f)) = f
This is a direct analogy to differentiation and integration in calculus.
3.13 The Analogy Between Prefix Sums and Integration
| Discrete (Arrays) | Continuous (Functions) |
|---|---|
| Array a[i] | Function f(x) |
| Prefix sum S[n] = ฮฃa[i] | Integral F(x) = โซf(t)dt |
| Difference ฮa[i] = a[i] - a[i-1] | Derivative f'(x) = df/dx |
| Range sum = S[r+1] - S[l] | Definite integral = F(b) - F(a) |
| Reconstruct from difference = prefix sum | Antiderivative + differentiation = identity |
This analogy is not merely conceptualโthe Fundamental Theorem of Calculus โซ[a,b] f(x)dx = F(b) - F(a) is structurally identical to the prefix sum formula sum(l,r) = prefix[r+1] - prefix[l].
The discrete Newton-Leibniz formula is exactly the prefix sum range query formula. This unified perspective helps you understand why differencing and prefix sums are inverse operationsโjust as differentiation and integration are inverse operations.
3.14 Prefix Sums in Database Systems
In database systems, prefix sums appear in multiple forms:
Running Aggregation:
-- Compute cumulative daily sales
SELECT
date,
daily_sales,
SUM(daily_sales) OVER (ORDER BY date) AS cumulative_sales
FROM sales;
The window function SUM(...) OVER (ORDER BY ...) is essentially computing a prefix sum. Internally, the database engine typically maintains an accumulator and scans the data onceโidentical to our build_prefix_sum.
Window functions for range queries:
-- Compute 7-day moving average
SELECT
date,
AVG(daily_sales) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM sales;
Although SQL window function syntax hides implementation details, many query optimizers use prefix sum ideas under the hood: compute the running total first, then subtract two running totals to obtain any window's sum.
Materialized views and incremental maintenance:
In data warehouses, precomputed aggregates (materialized views) are essentially "prefix information computed in advance." When underlying data changes, the differencing idea is used for incremental updatesโprocessing only the delta rather than recomputing everything.
3.15 Prefix Sums on Trees
When data structures extend from linear arrays to trees, prefix sums can be realized via DFS ordering.
Problem: Given a weighted tree, answer multiple queries for the sum of edge weights (or node values) on the path from the root to a given node.
Approach 1: Precompute distances via DFS
def build_tree_prefix(graph: dict, root: int,
weights: dict) -> dict[int, int]:
"""
Precompute path weight sum from root to every node.
graph: adjacency list
weights: edge weights {(u,v): w}
"""
dist = {root: 0}
stack = [root]
visited = {root}
while stack:
u = stack.pop()
for v in graph[u]:
if v not in visited:
visited.add(v)
edge_weight = weights.get((u, v), weights.get((v, u), 0))
dist[v] = dist[u] + edge_weight
stack.append(v)
return dist
With root-to-node "prefix sums," the path sum between any two nodes becomes:
def path_sum(dist: dict, lca_func, u: int, v: int) -> int:
"""Path sum from u to v = dist[u] + dist[v] - 2 * dist[LCA(u,v)]"""
ancestor = lca_func(u, v)
return dist[u] + dist[v] - 2 * dist[ancestor]
Here LCA (Lowest Common Ancestor) plays the role analogous to "subtracting the common prefix" in 1D prefix sums.
Approach 2: Euler tour (DFS order) + 1D prefix sums
Flattening a tree into a DFS sequence converts subtree operations into contiguous range operations:
def dfs_order(graph: dict, root: int) -> tuple[list[int], dict, dict]:
"""
Compute DFS order (Euler tour).
Returns: (sequence, entry_time tin, exit_time tout)
Subtree of node u corresponds to interval [tin[u], tout[u]]
"""
order = []
tin = {}
tout = {}
timer = [0]
visited = set()
def dfs(u):
visited.add(u)
tin[u] = timer[0]
order.append(u)
timer[0] += 1
for v in graph[u]:
if v not in visited:
dfs(v)
tout[u] = timer[0] - 1
dfs(root)
return order, tin, tout
Once the tree is linearized, "subtree sum" becomes "range sum," directly solvable with 1D prefix sums. This technique is ubiquitous in competitive programming and forms the foundation for segment tree operations on trees.
3.16 Higher-Dimensional Prefix Sums
Two-dimensional prefix sums generalize to arbitrary dimensions. In d-dimensional space, inclusion-exclusion involves 2^d vertices.
Three-dimensional prefix sum construction and query:
def build_3d_prefix_sum(arr, X, Y, Z):
"""3D prefix sum"""
prefix = [[[0]*(Z+1) for _ in range(Y+1)] for _ in range(X+1)]
for i in range(X):
for j in range(Y):
for k in range(Z):
prefix[i+1][j+1][k+1] = (arr[i][j][k]
+ prefix[i][j+1][k+1]
+ prefix[i+1][j][k+1]
+ prefix[i+1][j+1][k]
- prefix[i][j][k+1]
- prefix[i][j+1][k]
- prefix[i+1][j][k]
+ prefix[i][j][k])
return prefix
The 3D inclusion-exclusion has 8 terms (2ยณ = 8), alternating signs. In practice, prefix sums beyond three dimensions are rare because space complexity is O(n^d), growing exponentially.
Practical applications of higher-dimensional prefix sums:
- 2D: Integral images in computer vision, used for fast computation of rectangular region pixel sums/averages (Viola-Jones face detection)
- 3D: Voxel data analysis in medical imaging, computing statistics over cuboid regions
Level 4 ยท Interview Practice
3.17 High-Frequency Interview Problems
Problem 1: Subarray Sum Equals K (LeetCode 560)
We presented the solution in Section 3.8. Here we address common follow-up questions in interviews.
Follow-up 1: If all elements are positive, is there a better approach?
All positives means the prefix sum is strictly increasing, allowing a sliding window (two pointers) with O(1) space. But the original problem includes negatives and zeros, making the prefix sum non-monotonic and invalidating the sliding window.
Follow-up 2: Can we find all subarrays with sum K (not just count)?
Yes, but we need to record the position list for each prefix sum value, not just the count:
from collections import defaultdict
def find_all_subarrays_with_sum_k(nums: list[int], k: int) -> list[tuple[int, int]]:
"""Return all (start, end) pairs of subarrays with sum k"""
result = []
prefix_sum = 0
# Record positions where each prefix sum value occurred
positions = defaultdict(list)
positions[0].append(-1) # "Empty prefix" position is -1
for i, num in enumerate(nums):
prefix_sum += num
target = prefix_sum - k
if target in positions:
for start_pos in positions[target]:
result.append((start_pos + 1, i))
positions[prefix_sum].append(i)
return result
Note: In the worst case (e.g., all zeros with k=0), the number of results is O(nยฒ), so time complexity is also O(nยฒ).
Problem 2: Continuous Subarray Sum (LeetCode 523)
Problem: Given a non-negative integer array nums and an integer k, determine whether there exists a contiguous subarray of length at least 2 whose sum is a multiple of k.
Key insight: If prefix[j] - prefix[i] is a multiple of k (with j - i >= 2), this is equivalent to prefix[j] % k == prefix[i] % k.
So we use a hash map to record the first occurrence of each "prefix sum modulo k" remainder. If the same remainder appears again with a gap of at least 2, we found the answer.
def check_subarray_sum(nums: list[int], k: int) -> bool:
"""
Check if subarray of length >= 2 with sum divisible by k exists.
Time O(n), Space O(min(n, k)).
"""
# Record first index where each remainder was seen
remainder_index = {0: -1} # Empty prefix has remainder 0 at position -1
prefix_sum = 0
for i, num in enumerate(nums):
prefix_sum += num
remainder = prefix_sum % k if k != 0 else prefix_sum
if remainder in remainder_index:
# Ensure subarray length is at least 2
if i - remainder_index[remainder] >= 2:
return True
else:
remainder_index[remainder] = i
return False
Important details:
- Initialize
{0: -1}to handle subarrays starting from index 0 - Only record the first occurrence (maximizes gap)
- k = 0 requires special handling (find two positions with identical prefix sum)
Problem 3: Product of Array Except Self (LeetCode 238)
Problem: Given array nums, return array answer where answer[i] equals the product of all elements except nums[i]. Required: O(n) time and no division.
Approach: answer[i] = product of all elements to the left ร product of all elements to the right. Use prefix product and suffix product.
def product_except_self(nums: list[int]) -> list[int]:
"""
Product of array except self.
Time O(n), Space O(1) (excluding output array).
"""
n = len(nums)
answer = [1] * n
# First pass: left to right, answer[i] = product of all elements left of i
left_product = 1
for i in range(n):
answer[i] = left_product
left_product *= nums[i]
# Second pass: right to left, multiply by product of all elements right of i
right_product = 1
for i in range(n - 1, -1, -1):
answer[i] *= right_product
right_product *= nums[i]
return answer
Why no division?
- The problem explicitly forbids it
- If the array contains zeros, division breaks
- This solution demonstrates the general "prefix and suffix" bidirectional scan technique
Space optimization: The above uses only the output array and two variablesโO(1) extra space (the problem states the output array doesn't count). This is more elegant than maintaining two full prefix/suffix arrays.
Problem 4: Corporate Flight Bookings (LeetCode 1109)
Covered in detail in Section 3.9. Interview follow-up:
Follow-up: What if updates and queries are interleaved rather than batched?
Difference arrays are ideal for "do all updates first, then reconstruct once." If updates and queries alternate, you need more powerful structuresโsegment trees (supporting O(log n) range updates and range queries) or Binary Indexed Trees (Fenwick trees).
3.18 Handling Overflow in Prefix Sums
In languages with fixed-precision integers (C++/Java/Go), prefix sums are prone to overflow.
Scenario: Element values |a[i]| โค 10โน, array length n โค 10โต. Maximum prefix sum is approximately 10ยนโด, exceeding 32-bit integer range (โ 2.1 ร 10โน) but within 64-bit range (โ 9.2 ร 10ยนโธ).
Python advantage: Python integers never overflow, at the cost of slower arithmetic (large integers require more memory allocation and computation).
Modular arithmetic considerations:
Some problems require "answer modulo 10โน+7," in which case prefix sums must also be taken modulo:
MOD = 10**9 + 7
def build_prefix_sum_mod(nums: list[int]) -> list[int]:
n = len(nums)
prefix = [0] * (n + 1)
for i in range(n):
prefix[i + 1] = (prefix[i] + nums[i]) % MOD
return prefix
Trap: Subtraction after modulo can produce negative results!
def range_sum_mod(prefix: list[int], l: int, r: int) -> int:
"""Range sum under modular arithmetic"""
return (prefix[r + 1] - prefix[l]) % MOD
# In Python, % always returns non-negative, so this is fine
# But in C++: (-3) % 7 = -3 (not 4), must add MOD manually
# C++ version: return ((prefix[r+1] - prefix[l]) % MOD + MOD) % MOD
Python's % behavior: (-3) % 7 = 4 (Python) vs (-3) % 7 = -3 (C++/Java). Python's behavior is more convenient for algorithm problems, but if you use other languages, you must be aware of this difference.
3.19 Prefix Sums vs. Segment Trees: Static vs. Dynamic
| Dimension | Prefix Sum | Segment Tree |
|---|---|---|
| Build time | O(n) | O(n) |
| Single query | O(1) | O(log n) |
| Single update | O(n) (rebuild) | O(log n) |
| Space | O(n) | O(n) |
| Implementation complexity | Very low (few lines) | Moderate (50-100 lines) |
| Use case | Static data โ query only | Dynamic data โ interleaved updates and queries |
When to use prefix sums:
- Array is built once and never modified
- Multiple range sum queries
- Need ultimate query performance (O(1) vs O(log n))
- Simplicity and maintainability matter
When to use segment trees:
- Data changes dynamically (point updates + range queries)
- Need range updates + range queries
- Need to maintain non-invertible operations (max, min) over ranges
Decision flowchart:
Are there update operations?
โโโ No โ Use prefix sums (O(1) queries)
โโโ Yes โ Are updates batched or interleaved?
โโโ All updates first, then query once โ Difference array
โโโ Updates and queries interleaved โ Segment tree / BIT
3.20 Comprehensive Practice and Pattern Recognition
Pattern 1: Prefix sum + hash map (counting problems)
# Template: count subarrays satisfying some range property
def count_subarrays_with_property(nums, target):
count = 0
prefix = 0 # or another cumulative quantity
seen = defaultdict(int)
seen[0] = 1 # empty prefix
for num in nums:
prefix = update(prefix, num) # accumulate
# Check how many previous prefix values satisfy the condition
count += seen[complement(prefix, target)]
seen[prefix] += 1
return count
Applicable problems: LeetCode 560, 523, 974, 930, 1248
Pattern 2: Prefix/suffix bidirectional scan
# Template: need information from both left and right
def bidirectional_scan(nums):
n = len(nums)
left_info = compute_prefix(nums) # left to right
right_info = compute_suffix(nums) # right to left
answer = [0] * n
for i in range(n):
answer[i] = combine(left_info[i], right_info[i])
return answer
Applicable problems: LeetCode 238, 42 (Trapping Rain Water via prefix max), 135
Pattern 3: Difference array (batch range updates)
# Template: multiple range updates + final reconstruction
def batch_range_update(n, operations):
diff = [0] * (n + 1) # extra element avoids boundary checks
for l, r, val in operations:
diff[l] += val
diff[r + 1] -= val
# Prefix sum to reconstruct
for i in range(1, n):
diff[i] += diff[i - 1]
return diff[:n]
Applicable problems: LeetCode 1109, 1094, 370, 253
3.21 Chapter Summary
Prefix sums and difference arrays are a dual pair of inverse techniques:
- Prefix sums: O(n) preprocessing accelerates range queries to O(1)
- Difference arrays: Compress range updates to O(1), reconstruct in O(n)
Their domain of applicability is static or semi-static scenariosโif data changes frequently, upgrade to segment trees or Binary Indexed Trees.
Key takeaways:
- Make the prefix array one element longer; prefix[0] = 0
- Sum of [l, r] = prefix[r+1] - prefix[l]
- Difference array update on [l, r]: diff[l] += val, diff[r+1] -= val
- Prefix sum + hash map = universal combo for "sum equals K" problems
- 2D operations use inclusion-exclusion: add, add, subtract, add
In the next chapter, we will study sorting algorithms. From the simplest bubble sort to industrial-grade Timsort, sorting is not only the most fundamental algorithmic problem but also an excellent training ground for understanding divide-and-conquer thinking and algorithm analysis.