Chapter 15

Union-Find: The Connectivity Powerhouse

Chapter 15: Union-Find — The Connectivity Powerhouse

Imagine you have a map with hundreds of cities and some roads. Someone keeps asking you: "Can city A reach city B?" If you run BFS/DFS for every query, that's O(V+E) per query. But what if there are a million such queries?

Union-Find (also called Disjoint Set Union, or DSU) is the data structure designed exactly for this class of problems. It supports two core operations:

Find(x): Find the representative of the set containing x
Union(x, y): Merge the sets containing x and y

With path compression and union by rank, both operations run in amortized O(α(n)) time, where α is the inverse Ackermann function — a function that grows so incredibly slowly that for all practical inputs (even if n = the number of atoms in the universe), α(n) ≤ 4. In other words, each Union-Find operation is essentially constant time.

In this chapter, we'll start from the naive implementation, progressively add optimizations, understand why it's so fast, and then apply it to graph theory and interview problems.

Level 1 · What You Need to Know

15.1 The Starting Point: Dynamic Connectivity

Consider this scenario: you manage a social network where users can become friends. You need to frequently answer: "Are users A and B in the same friend circle?" (Not necessarily direct friends — transitive connections count.)

More abstractly, we have n elements, initially each in its own set. We need to support:

Union: Merge two sets into one
Find/Connected: Determine whether two elements are in the same set

This is the Dynamic Connectivity Problem.

Using an array where each element stores its component ID? Sure, but merging two sets requires traversing the entire array to update IDs — O(n). Using an adjacency list with BFS? Each query costs O(V+E). The elegance of Union-Find is that it achieves near-O(1) for both operations using a remarkably simple tree structure.

15.2 Core Idea: Representing Sets as Trees

The fundamental idea is:

Each set is represented as a tree
The root of the tree is the representative of that set
Two elements are in the same set if and only if they have the same root
Merging two sets = making one tree's root point to the other tree's root

We use a parent[] array to represent these trees: parent[i] stores the parent of node i. If parent[i] == i, then i is a root.

15.3 The Naive Implementation

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))  # Initially, each element is its own parent
    
    def find(self, x):
        """Find the root of x"""
        while self.parent[x] != x:
            x = self.parent[x]
        return x
    
    def union(self, x, y):
        """Merge the sets containing x and y"""
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x != root_y:
            self.parent[root_x] = root_y
    
    def connected(self, x, y):
        """Check if x and y are in the same set"""
        return self.find(x) == self.find(y)

What's the problem? This naive version can degenerate into a linked list! If you execute union(0,1), union(1,2), union(2,3), ..., the tree height becomes O(n), making each find O(n).

15.4 Optimization 1: Path Compression

The idea behind path compression is simple: since we only care about the root, during find we can directly attach every node on the path to the root. Next time we query any of these nodes, we reach the root in one step.

def find(self, x):
    """Find with path compression"""
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])  # Recursively point x directly to root
    return self.parent[x]

One line of recursion achieves full path compression. After execution, every node from x to the root points directly to the root.

If you're worried about recursion depth (Python has a default limit of 1000), here's the iterative version:

def find(self, x):
    """Iterative path compression"""
    root = x
    while self.parent[root] != root:
        root = self.parent[root]
    # Path compression: point all nodes from x to root directly to root
    while self.parent[x] != root:
        next_x = self.parent[x]
        self.parent[x] = root
        x = next_x
    return root

15.5 Optimization 2: Union by Rank

Path compression is already powerful, but we can also optimize the union operation: always attach the shorter tree under the root of the taller tree, so the merged tree's height doesn't increase.

We use rank to approximate height (path compression changes actual height, but rank is only updated during unions — it's an upper bound on height).

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n  # rank[i] is an upper bound on the height of tree rooted at i
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x == root_y:
            return False  # Already in the same set
        # Union by rank: attach shorter tree under taller tree
        if self.rank[root_x] < self.rank[root_y]:
            self.parent[root_x] = root_y
        elif self.rank[root_x] > self.rank[root_y]:
            self.parent[root_y] = root_x
        else:
            self.parent[root_y] = root_x
            self.rank[root_x] += 1  # Only increment rank when both ranks are equal
        return True
    
    def connected(self, x, y):
        return self.find(x) == self.find(y)

Why does rank only increase when both ranks are equal? Because a tree of rank h has at least 2^h nodes (provable by induction). When merging trees of different heights, attaching the shorter one under the taller one doesn't change the overall height. Only when two trees of equal height merge does the height increase by 1.

15.6 Complete Implementation with Component Counting

In many applications, we need to know "how many distinct connected components exist." Just maintain a counter that decrements on each successful union:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n  # Initially n components
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x == root_y:
            return False
        if self.rank[root_x] < self.rank[root_y]:
            self.parent[root_x] = root_y
        elif self.rank[root_x] > self.rank[root_y]:
            self.parent[root_y] = root_x
        else:
            self.parent[root_y] = root_x
            self.rank[root_x] += 1
        self.count -= 1  # Successful merge: one fewer component
        return True
    
    def connected(self, x, y):
        return self.find(x) == self.find(y)
    
    def get_count(self):
        return self.count

15.7 Usage Example

# 5 nodes, numbered 0-4
uf = UnionFind(5)
print(uf.get_count())  # 5, each node is its own component

uf.union(0, 1)  # {0,1}, {2}, {3}, {4}
uf.union(2, 3)  # {0,1}, {2,3}, {4}
print(uf.get_count())  # 3

print(uf.connected(0, 1))  # True
print(uf.connected(0, 2))  # False

uf.union(1, 3)  # {0,1,2,3}, {4}
print(uf.connected(0, 2))  # True
print(uf.get_count())  # 2

15.8 Union by Size Variant

Sometimes we need to know the size of each component. In such cases, "union by size" is more natural than "union by rank":

class UnionFindBySize:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n  # size[i] = size of set rooted at i
        self.count = n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x == root_y:
            return False
        # Attach smaller set under larger set
        if self.size[root_x] < self.size[root_y]:
            self.parent[root_x] = root_y
            self.size[root_y] += self.size[root_x]
        else:
            self.parent[root_y] = root_x
            self.size[root_x] += self.size[root_y]
        self.count -= 1
        return True
    
    def get_size(self, x):
        return self.size[self.find(x)]

Union by size and union by rank have the same time complexity: O(α(n)). The difference is that union by size can answer "how large is this set?" in O(1).

15.9 Union-Find vs Other Approaches

Approach	Union Time	Find/Connected Time	Space
Array labeling	O(n)	O(1)	O(n)
Adjacency list + BFS	O(1)	O(V+E)	O(V+E)
Naive Union-Find	O(n) worst	O(n) worst	O(n)
Path compression + rank	O(α(n))	O(α(n))	O(n)

Since α(n) ≤ 4 for all practical inputs, the last row is effectively O(1).

15.10 When to Use Union-Find

Union-Find is ideal when:

You only need connectivity, not the actual path
Edges are added dynamically (add-only — standard Union-Find doesn't support deletion)
Both offline and online processing work
You need efficient set merging

Not suitable when:

You need shortest paths → use BFS/Dijkstra
You need edge deletion → use Link-Cut Tree or offline reverse processing
You need to enumerate all elements in a set → Union-Find doesn't directly support this

Level 2 · How It Works Under the Hood

15.11 Complexity Analysis: Why It's Nearly O(1)

Path Compression Only

With only path compression (no rank/size-based union), m operations on n elements take O(m log n) total time, amortized O(log n) per operation.

Intuition: Path compression makes trees increasingly flat. The first find on a deep path is expensive, but it simultaneously flattens every node on that path directly to the root. Subsequent finds on those nodes become O(1). This is classic amortized analysis — occasional expensive operations pay for future cheap operations.

Union by Rank Only

With only union by rank (no path compression), tree height is guaranteed O(log n).

Proof: By induction, a tree of rank r has at least 2^r nodes.

Base: rank = 0, tree has only the root, 1 = 2^0 nodes ✓
Inductive step: A tree of rank r can only be formed by merging two trees of rank r-1. By the inductive hypothesis, each has at least 2^(r-1) nodes. After merging: at least 2^(r-1) + 2^(r-1) = 2^r nodes ✓

Therefore rank ≤ log₂(n), so find is O(log n).

Path Compression + Union by Rank

When both optimizations are combined, m operations take O(m · α(n)) total time, where α is the inverse Ackermann function.

15.12 The Inverse Ackermann Function α(n)

The Ackermann function A(m, n) grows astronomically fast:

A(0, n) = n + 1
A(m, 0) = A(m-1, 1)                     (m > 0)
A(m, n) = A(m-1, A(m, n-1))             (m > 0, n > 0)

Some values:

A(1, n) = n + 2
A(2, n) = 2n + 3
A(3, n) = 2^(n+3) - 3
A(4, 1) = 2^(2^(2^(65536))) - 3 — an astronomically large number
A(4, 2) — cannot even be written down

The inverse Ackermann function α(n) = min{k : A(k, k) ≥ n}. Since A grows so fast, α grows incredibly slowly:

n	α(n)
1-2	0
3	1
4-7	2
8-2047	3
2048 to A(4,1) ≈ 2^65536	4
A(4,1) to A(5,1)	5

Since the number of atoms in the observable universe is approximately 10^80 ≈ 2^265, far less than 2^65536 = A(4,1), for all practical inputs α(n) ≤ 4.

What does this mean? Union-Find with path compression and union by rank needs at most 4 extra steps per operation (in the amortized sense). This is why we say it's "essentially O(1)."

15.13 The Core Idea of Tarjan's Proof

Robert Tarjan proved the O(m · α(n)) upper bound in 1975. His proof approach (greatly simplified):

Define a potential function Φ based on each node's "level" in the tree
Partition nodes into "blocks" by rank, where block sizes increase according to the Ackermann function
Show that path compression either decreases potential (banking savings for future operations) or has low actual cost itself
Apply amortized analysis via the potential method to obtain total cost O(m · α(n))

For the lower bound, Fredman and Saks (1989) proved that in the cell-probe model, any data structure supporting Union and Find requires Ω(m · α(n)) time in the worst case for m operations. This means path compression + union by rank is asymptotically optimal.

15.14 Weighted Union-Find

Standard Union-Find only answers "are two elements in the same set?" But some problems require maintaining relative relationships between elements (e.g., relative distances, relative weights). Weighted Union-Find is designed for this.

Core idea: Besides parent, maintain a weight array where weight[x] represents the "distance" (or other relational value) from x to parent[x]. By accumulating weights along the find path, we can determine the total weight from any node to the root.

class WeightedUnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.weight = [0] * n  # weight[x] = weight from x to parent[x]
    
    def find(self, x):
        """Return root of x, with path compression and weight update"""
        if self.parent[x] != x:
            root = self.find(self.parent[x])
            self.weight[x] += self.weight[self.parent[x]]  # Accumulate weights
            self.parent[x] = root
        return self.parent[x]
    
    def union(self, x, y, w):
        """
        Merge x and y with weight(x) - weight(y) = w
        i.e., the weight difference of x relative to y is w
        """
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x == root_y:
            return  # Already in same set; could check consistency
        # After find: weight[x] = distance from x to root_x
        # After find: weight[y] = distance from y to root_y
        # We want root_x → root_y with weight = w + weight[y] - weight[x]
        if self.rank[root_x] < self.rank[root_y]:
            self.parent[root_x] = root_y
            self.weight[root_x] = w + self.weight[y] - self.weight[x]
        elif self.rank[root_x] > self.rank[root_y]:
            self.parent[root_y] = root_x
            self.weight[root_y] = -w + self.weight[x] - self.weight[y]
        else:
            self.parent[root_x] = root_y
            self.weight[root_x] = w + self.weight[y] - self.weight[x]
            self.rank[root_y] += 1
    
    def query(self, x, y):
        """Query weight(x) - weight(y); requires x and y in the same set"""
        if self.find(x) != self.find(y):
            return None  # Not in same set
        return self.weight[x] - self.weight[y]

Typical applications of Weighted Union-Find:

LeetCode 399 - Evaluate Division: Given a/b = 2, b/c = 3, find a/c. Model division relationships as multiplicative weights.
Food Chain Problem (NOI 2001): Three species A, B, C form a food chain (A eats B, B eats C, C eats A). Use weight mod 3 to represent relative relationships.

15.15 Kruskal's Minimum Spanning Tree Algorithm

The most classic graph theory application of Union-Find is Kruskal's algorithm for finding the Minimum Spanning Tree (MST).

Algorithm idea:

Sort all edges by weight in ascending order
For each edge (u, v, w): if u and v are not in the same component, add this edge (union u and v)
Repeat until n-1 edges are added (n = number of vertices)

def kruskal(n, edges):
    """
    n: number of vertices
    edges: [(weight, u, v), ...]
    Returns MST edges and total weight
    """
    edges.sort()  # Sort by weight
    uf = UnionFind(n)
    mst = []
    total_weight = 0
    
    for weight, u, v in edges:
        if not uf.connected(u, v):
            uf.union(u, v)
            mst.append((u, v, weight))
            total_weight += weight
            if len(mst) == n - 1:
                break  # Found n-1 edges, MST complete
    
    return mst, total_weight

Time complexity: Sorting O(E log E) + iterating edges with Union/Find O(E · α(V)). Total: O(E log E), since sorting dominates.

Why is Kruskal correct? This is a classic application of the greedy paradigm. Its correctness is guaranteed by the Cut Property: for any cut of the graph (partition of vertices into two groups), the minimum-weight edge crossing the cut belongs to some MST. Each time Kruskal adds an edge, it's the minimum-weight edge connecting two different components — exactly the cut property in action.

For deeper discussion of minimum spanning trees, see Chapter 18.

15.16 Variants of Path Compression

Besides full path compression (all nodes point directly to root), there are several variants:

Path Splitting: Make every node on the path point to its grandparent.

def find_path_splitting(self, x):
    while self.parent[x] != x:
        self.parent[x], x = self.parent[self.parent[x]], self.parent[x]
    return x

Path Halving: Every other node on the path points to its grandparent.

def find_path_halving(self, x):
    while self.parent[x] != x:
        self.parent[x] = self.parent[self.parent[x]]
        x = self.parent[x]
    return x

All three variants (full compression, path splitting, path halving) have the same O(m · α(n)) amortized complexity in theory. In practice, path halving is often fastest because it modifies only half the pointers, reducing memory writes.

15.17 Space Optimization

For consecutively numbered nodes, Union-Find needs just two arrays (parent and rank/size) — strictly O(n) space.

For non-integer keys (e.g., strings), use a hash map instead:

class UnionFindMap:
    def __init__(self):
        self.parent = {}
        self.rank = {}
    
    def add(self, x):
        if x not in self.parent:
            self.parent[x] = x
            self.rank[x] = 0
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        self.add(x)
        self.add(y)
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x == root_y:
            return False
        if self.rank[root_x] < self.rank[root_y]:
            self.parent[root_x] = root_y
        elif self.rank[root_x] > self.rank[root_y]:
            self.parent[root_y] = root_x
        else:
            self.parent[root_y] = root_x
            self.rank[root_x] += 1
        return True

Level 3 · What the Specification Says

15.18 History: Galler and Fischer's 1964 Paper

The invention of Union-Find traces back to Bernard A. Galler and Michael J. Fischer, who published "An Improved Equivalence Algorithm" in 1964 (Communications of the ACM, Vol. 7, No. 5, pp. 301-303).

Their motivation came from FORTRAN compilers handling EQUIVALENCE statements. FORTRAN's EQUIVALENCE statement allows programmers to declare that two variables share the same memory. The compiler needs to maintain the transitive closure of this equivalence relation — if A is equivalent to B, and B is equivalent to C, then A is equivalent to C.

Galler and Fischer proposed representing equivalence classes as trees, and introduced union by size as an optimization. They proved that m operations on n elements take O(m log n) total time. This was a major improvement over the previous O(mn) methods.

Original quote (Galler & Fischer, 1964):

"The improved algorithm which we describe here represents each class of equivalent identifiers as a tree structure, where the root of each tree serves as the canonical representative of its class."

This is one of the most concise and elegant data structure inventions in computer science history: just two pages long, core code under 20 lines, yet solving a practical and important problem.

15.19 Tarjan's 1975 Breakthrough

Robert Endre Tarjan (yes, the same Tarjan who invented the strongly connected components algorithm) published his landmark paper "Efficiency of a Good But Not Linear Set Union Algorithm" in 1975 (Journal of the ACM, Vol. 22, No. 2, pp. 215-225).

Tarjan's contributions:

Introduced formal analysis of path compression: While the idea of path compression existed before, Tarjan was the first to rigorously analyze its complexity.
Proved the O(m · α(n)) upper bound: This was a stunning result. People knew union by rank gives O(m log n), and that path compression is fast in practice, but no one could prove exactly how fast the combination is. Tarjan used an ingenious potential method, defining a hierarchical structure based on the Ackermann function.
Conjectured optimality: Tarjan conjectured that O(m · α(n)) is optimal, but could not prove the lower bound at the time.

The key tool — the Ackermann function: Tarjan's choice of the Ackermann function was not arbitrary. His analysis requires a fast-growing function to create "layers" — partitioning node ranks into increasingly large blocks. The super-exponential growth of the Ackermann function provides exactly the right granularity for the layering.

15.20 The Lower Bound: Fredman-Saks 1989

Michael Fredman and Michael Saks proved in their 1989 paper "The Cell Probe Complexity of Dynamic Data Structures" (Proceedings of STOC, pp. 345-354):

In the cell-probe model, any data structure supporting m Union-Find operations requires Ω(m · α(n)) time in the worst case.

This means Union-Find with path compression + union by rank is asymptotically optimal — you cannot invent a data structure that is asymptotically faster for the dynamic connectivity problem in the worst case.

The cell-probe model is extremely powerful (it only counts memory accesses; computation is free), so lower bounds in this model apply to all weaker models (including the RAM model).

15.21 Union-Find in Percolation

Percolation is a classic model in statistical physics and a beautiful application of Union-Find.

Problem description: Consider an n×n grid where each cell is "open" (passable) with probability p and "closed" with probability 1-p. Question: Is there a connected path from top to bottom?

Key concept — percolation threshold: There exists a critical probability p* ≈ 0.5927 (for the 2D square lattice) such that when p > p*, there is almost certainly a path from top to bottom, and when p < p*, there is almost certainly not. This is a phase transition phenomenon.

Solving with Union-Find:

import random

class Percolation:
    def __init__(self, n):
        self.n = n
        self.grid = [[False] * n for _ in range(n)]
        # n*n cells + 2 virtual nodes (top source, bottom sink)
        self.uf = UnionFind(n * n + 2)
        self.top = n * n          # Virtual top node
        self.bottom = n * n + 1   # Virtual bottom node
        self.open_count = 0
    
    def _index(self, row, col):
        return row * self.n + col
    
    def open(self, row, col):
        if self.grid[row][col]:
            return
        self.grid[row][col] = True
        self.open_count += 1
        
        # If in first row, connect to virtual top
        if row == 0:
            self.uf.union(self._index(row, col), self.top)
        # If in last row, connect to virtual bottom
        if row == self.n - 1:
            self.uf.union(self._index(row, col), self.bottom)
        
        # Connect to adjacent open cells
        for dr, dc in [(-1,0), (1,0), (0,-1), (0,1)]:
            nr, nc = row + dr, col + dc
            if 0 <= nr < self.n and 0 <= nc < self.n and self.grid[nr][nc]:
                self.uf.union(self._index(row, col), self._index(nr, nc))
    
    def percolates(self):
        """Does a path exist from top to bottom?"""
        return self.uf.connected(self.top, self.bottom)


def estimate_threshold(n, trials=1000):
    """Monte Carlo estimation of percolation threshold"""
    total = 0
    for _ in range(trials):
        perc = Percolation(n)
        sites = [(r, c) for r in range(n) for c in range(n)]
        random.shuffle(sites)
        for r, c in sites:
            perc.open(r, c)
            if perc.percolates():
                total += perc.open_count / (n * n)
                break
    return total / trials

This algorithm is used by Princeton's Robert Sedgewick as a classic programming assignment in his Algorithms course. It beautifully demonstrates how Union-Find powers scientific simulations — each cell opening requires only constant union operations, and the entire simulation runs in O(n² · α(n²)) time.

15.22 Academic Variants of Union-Find

Rollback Union-Find (Persistent Union-Find):

Standard Union-Find merges are irreversible. But in some algorithms (e.g., divide-and-conquer for edge deletion problems), we need to "undo" the last merge. With union by rank (without path compression), we can use a stack to record each modification and restore them in reverse order when rolling back.

class RollbackUnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.history = []  # Record all modifications
    
    def find(self, x):
        # Note: NO path compression! Otherwise rollback is impossible
        while self.parent[x] != x:
            x = self.parent[x]
        return x
    
    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x == root_y:
            self.history.append(None)  # Mark: no actual merge
            return False
        if self.rank[root_x] < self.rank[root_y]:
            root_x, root_y = root_y, root_x
        # Record state before modification
        self.history.append((root_y, self.parent[root_y], root_x, self.rank[root_x]))
        self.parent[root_y] = root_x
        if self.rank[root_x] == self.rank[root_y]:
            self.rank[root_x] += 1
        return True
    
    def rollback(self):
        """Undo the most recent union operation"""
        record = self.history.pop()
        if record is None:
            return  # That union didn't actually merge
        root_y, old_parent, root_x, old_rank = record
        self.parent[root_y] = old_parent
        self.rank[root_x] = old_rank

Note that rollback Union-Find cannot use path compression (a single find might modify many nodes, making simple rollback impossible), so its complexity is O(log n) rather than O(α(n)).

Offline edge deletion: If a problem involves edge deletion, a common trick is "time reversal" — convert deletions into additions and process from back to front using standard Union-Find. This avoids complex persistent data structures.

15.23 Relationship with Other Data Structures

Relationship with BFS/DFS on graphs: Union-Find can be seen as "preprocessing" for graph connectivity queries. If the edge set is fixed, one DFS marking components in O(V+E) suffices. Union-Find's advantage is when the edge set changes dynamically — new edges are continuously added.

Relationship with MST algorithms: Kruskal uses Union-Find; Prim uses a priority queue. Both have similar time complexity, but Kruskal is better for sparse graphs (E close to V), while Prim is better for dense graphs (E close to V²).

Relationship with Link-Cut Trees: Union-Find only supports merging, not splitting. Link-Cut Trees (Sleator & Tarjan, 1983) support both dynamic linking and cutting, but each operation costs O(log n) with larger constants. If you don't need deletion, Union-Find is the better choice.

Level 4 · Edge Cases and Pitfalls

15.24 Interview Problem 1: Number of Provinces (LeetCode 547)

Problem: Given n cities with isConnected[i][j] = 1 indicating cities i and j are directly connected, find the number of provinces (connected components).

Analysis: This is the most direct application of Union-Find — iterate over all edges, merge connected cities, and return the component count.

def findCircleNum(isConnected):
    n = len(isConnected)
    uf = UnionFind(n)
    
    for i in range(n):
        for j in range(i + 1, n):  # Only traverse upper triangle
            if isConnected[i][j] == 1:
                uf.union(i, j)
    
    return uf.get_count()

Time complexity: O(n² · α(n)), iterating the adjacency matrix. Space complexity: O(n).

Common mistakes:

Forgetting to only traverse the upper (or lower) triangle, causing redundant processing. Doesn't affect correctness but wastes time.
Forgetting that isConnected[i][i] = 1 doesn't need processing (a city is always connected to itself).

Alternative: DFS/BFS also works — traverse each unvisited node and DFS to mark the entire component. Same O(n²) time, but Union-Find has the advantage of cleaner code and flexibility for dynamic edge additions.

15.25 Interview Problem 2: Redundant Connection (LeetCode 684)

Problem: A tree with one extra edge becomes a graph. Find the redundant edge (if multiple answers exist, return the one appearing last in input).

Analysis: A tree has n nodes and n-1 edges. The problem gives n edges (one extra). Find the edge that creates a cycle.

Key insight: Process edges in input order; the first edge whose endpoints are already connected forms the cycle. Since the problem guarantees exactly one redundant edge, there's exactly one such edge.

def findRedundantConnection(edges):
    n = len(edges)
    uf = UnionFind(n + 1)  # Nodes numbered from 1
    
    for u, v in edges:
        if not uf.union(u, v):
            return [u, v]  # Both endpoints already connected — this edge is redundant
    
    return []  # Won't reach here

Why is this correct? A tree is an acyclic connected graph. Processing edges in order, earlier edges form a forest (acyclic). When an edge (u, v) is encountered where u and v are already in the same tree, this edge creates a cycle — it's the redundant one.

Follow-up: What about directed graphs? (LeetCode 685 - Redundant Connection II) This is significantly more complex, requiring case analysis: whether a node has in-degree 2, whether a directed cycle exists.

15.26 Interview Problem 3: Satisfiability of Equality Equations (LeetCode 990)

Problem: Given a set of equations and inequalities (e.g., ["a==b", "b!=c", "c==a"]), determine if there exists an assignment satisfying all constraints.

Analysis:

Equation a==b means a and b must be in the same set
Inequality a!=b means a and b must be in different sets

Strategy: First process all equations (union), then check all inequalities for conflicts.

def equationsPossible(equations):
    uf = UnionFind(26)  # 26 letters
    
    # First pass: process all equations
    for eq in equations:
        if eq[1] == '=':  # "x==y"
            x = ord(eq[0]) - ord('a')
            y = ord(eq[3]) - ord('a')
            uf.union(x, y)
    
    # Second pass: check all inequalities
    for eq in equations:
        if eq[1] == '!':  # "x!=y"
            x = ord(eq[0]) - ord('a')
            y = ord(eq[3]) - ord('a')
            if uf.connected(x, y):
                return False  # Contradiction! x and y are already in the same set
    
    return True

Why process equations first, then inequalities? Because equality is transitive (a==b, b==c → a==c). We need to establish all transitive relationships before checking inequalities for contradictions. Interleaving might miss transitive relationships.

Follow-up: What if inequalities are also transitive? Then simple Union-Find won't work — you'd need 2-SAT or constraint propagation.

15.27 Interview Problem 4: Largest Component Size by Common Factor (LeetCode 952)

Problem: Given array nums, if nums[i] and nums[j] share a common factor greater than 1, connect them with an edge. Find the largest connected component size.

Brute force: Check GCD for all pairs: O(n² log(max)). Too slow.

Optimized approach: Instead of checking GCD for every pair, factorize each number and put the number and its factors in the same set.

def largestComponentSize(nums):
    max_val = max(nums)
    uf = UnionFind(max_val + 1)
    
    for num in nums:
        # Factorize num and merge num with all its factors
        d = 2
        while d * d <= num:
            if num % d == 0:
                uf.union(num, d)
                uf.union(num, num // d)
            d += 1
    
    # Count how many nums elements are in each component
    from collections import Counter
    count = Counter(uf.find(num) for num in nums)
    return max(count.values())

Why is this correct? If nums[i] and nums[j] share a common factor p, then nums[i] and p are in the same set, and nums[j] and p are in the same set, so nums[i] and nums[j] are in the same set. Transitivity is automatically guaranteed by Union-Find.

Time complexity: Factorizing each number is O(√num), total O(n · √max_val). Much better than O(n²).

Space note: The implementation above creates a UnionFind of size max_val + 1. If numbers are large but few, use the hash map version instead.

Follow-up: What if values reach 10^9? Use a sieve for prime factorization preprocessing, or only factorize numbers in nums and use index mapping to compress the space.

15.28 Common Pitfalls with Union-Find

Pitfall 1: Comparing parents instead of roots

# WRONG!
if uf.parent[x] == uf.parent[y]:  # Compares direct parents, not roots!
    ...

# CORRECT
if uf.find(x) == uf.find(y):
    ...

Pitfall 2: Merging original nodes instead of roots

# WRONG!
def union(self, x, y):
    self.parent[x] = y  # Directly connects x to y, breaking tree structure

# CORRECT
def union(self, x, y):
    root_x = self.find(x)
    root_y = self.find(y)
    self.parent[root_x] = root_y  # Connect roots

Pitfall 3: Off-by-one with 1-indexed nodes

Many graph problems use nodes numbered 1 to n. Create UnionFind with size n + 1, or subtract 1 from all indices.

Pitfall 4: Python recursion depth

Python's default recursion limit is 1000. For deep trees (e.g., 10^5 nodes in a chain before path compression kicks in), recursive find will overflow. Solutions:

Use iterative find
Or import sys; sys.setrecursionlimit(200000) (not recommended — may actually stack overflow)

Pitfall 5: Assuming Union-Find supports edge deletion

Union-Find only supports merging, not splitting. If the problem involves edge deletion, either:

Process offline in reverse (convert deletions to additions)
Use Link-Cut Tree
Use rollback Union-Find + divide and conquer

15.29 Recognition Patterns for Interviews

When you encounter these signals in an interview, consider Union-Find:

Signal	Example
"connected", "reachable", "same group"	Number of Provinces, Number of Islands
"grouping", "classify", "equivalence"	Equality Equations, Account Merge
"common factor", "shared property"	Largest Component by Common Factor
"redundant edge", "cycle"	Redundant Connection
"minimum spanning tree", "minimum cost to connect"	Min Cost to Connect All Points
"earliest time fully connected"	Earliest Moment When Everyone Becomes Friends

Interview template:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return False
        if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
        self.count -= 1
        return True
    
    def connected(self, x, y):
        return self.find(x) == self.find(y)

This template is about 20 lines. In an interview, you should be able to write it from memory in under 2 minutes. Practice until it becomes muscle memory.

15.30 Comprehensive Comparison and Summary

Dimension	Union-Find	BFS/DFS	Adjacency List
Connectivity check	O(α(n)) ≈ O(1)	O(V+E)	O(V+E)
Dynamic edge addition	O(α(n))	Requires re-traversal	O(1) add, slow query
Edge deletion	Not supported	Requires re-traversal	O(1) delete, slow query
Find actual path	Not supported	Supported	Supported
Space	O(n)	O(V+E)	O(V+E)
Code complexity	Very low (20 lines)	Medium	Medium

One-sentence summary: Union-Find is the optimal data structure for dynamic connectivity — O(α(n)) near-constant time, O(n) space, and 20 lines of code. Its limitations are that it only supports merging (not splitting) and can only determine connectivity (not produce paths). In interviews, once you identify the "dynamic merging + connectivity query" pattern, Union-Find is your go-to weapon.

Rate this chapter

4.8 / 5 (20 ratings)