Strongly Connected Components and Network Flow
Chapter 19: Strongly Connected Components and Network Flow
The previous chapter covered the two classic optimization problems on weighted graphs โ shortest paths and minimum spanning trees. This chapter ventures into the "advanced territory" of graph theory: Strongly Connected Components (SCC) and Network Flow.
Strongly connected components reveal the "macro-structure" of directed graphs โ which nodes can reach each other? If we "contract" each SCC into a super-node, the entire graph becomes a DAG (directed acyclic graph), dramatically reducing problem complexity. This is the core technique of "simplification through structure" in graph theory.
Network flow is a class of models that appears theoretical but has remarkably wide applications. From "how much water can flow through a pipe network" to "how to optimally assign workers to tasks," network flow unifies a vast array of combinatorial optimization problems into one elegant framework. The Max-Flow Min-Cut theorem perfectly connects two seemingly unrelated questions: "how to maximize throughput" and "how to minimize the cost of disruption."
This chapter's core objectives: (1) master Tarjan's algorithm for SCCs, articulation points, and bridges; (2) understand Kosaraju's two-pass DFS design; (3) understand the maximum flow problem and the Ford-Fulkerson method; (4) learn bipartite matching and the Hungarian algorithm; (5) assess these algorithms' interview frequency and learning priority.
Level 1 ยท What You Need to Know
19.1 The Concept of Strongly Connected Components
Strongly connected: In a directed graph, if node u can reach v AND v can reach u, then u and v are strongly connected.
Strongly Connected Component (SCC): A maximal strongly connected subgraph โ any two nodes within it are strongly connected, and no external node can be added while maintaining strong connectivity.
Directed graph example:
0 โ 1 โ 2
โ โ
3 โ 4 โ 2
โ
5 โ 6
โ โ
8 โ 7
Strongly connected components:
SCC1: {0, 1, 2, 3, 4} (any two of these 5 nodes can reach each other)
SCC2: {5} (only itself)
SCC3: {6, 7, 8} (6โ7โ8โ6 forms a cycle)
Condensation DAG:
SCC1 โ SCC2 โ SCC3
Why are SCCs important?
- Problem simplification: Contracting SCCs into single nodes turns the graph into a DAG. Many problems on DAGs (topological sort, longest path, etc.) are far simpler than on general directed graphs.
- 2-SAT problems: Determining satisfiability of Boolean formulas (2-SAT) directly reduces to SCC computation.
- Reachability analysis: Nodes within an SCC can all reach each other; only inter-SCC reachability needs analysis.
19.2 Tarjan's Algorithm
Tarjan's algorithm is the most classic and efficient method for finding SCCs. It requires only a single DFS with O(V + E) time complexity.
Core concepts:
- DFS order (dfn): The timestamp when a node is first visited.
- low value: The smallest dfn reachable from the current node's subtree through tree edges or back edges.
- Stack: Maintains all nodes on the current DFS path that haven't been assigned to an SCC yet.
Key condition: If dfn[u] == low[u], then u is the "root" of some SCC โ from u, we cannot reach any earlier ancestor. Pop from the stack until u (inclusive); these nodes form one SCC.
from typing import List, Dict, Set, Tuple
def tarjan_scc(graph: Dict[int, List[int]], n: int) -> List[List[int]]:
"""
Tarjan's Algorithm for Strongly Connected Components
Args:
graph: Directed graph adjacency list
n: Number of nodes (0 to n-1)
Returns:
List of all SCCs (each SCC is a list of nodes)
"""
dfn = [0] * n # DFS order
low = [0] * n # low value
on_stack = [False] * n
stack = []
timer = [0] # Mutable counter via list
sccs = []
def dfs(u: int):
timer[0] += 1
dfn[u] = low[u] = timer[0]
stack.append(u)
on_stack[u] = True
for v in graph.get(u, []):
if dfn[v] == 0: # Unvisited
dfs(v)
low[u] = min(low[u], low[v])
elif on_stack[v]: # On stack = candidate for current SCC
low[u] = min(low[u], dfn[v])
# u is the root of an SCC
if dfn[u] == low[u]:
scc = []
while True:
v = stack.pop()
on_stack[v] = False
scc.append(v)
if v == u:
break
sccs.append(scc)
# Process all unvisited nodes (graph may be disconnected)
for i in range(n):
if dfn[i] == 0:
dfs(i)
return sccs
# Usage example
graph = {
0: [1],
1: [2],
2: [3],
3: [0, 4],
4: [5],
5: [6],
6: [4, 7],
7: []
}
sccs = tarjan_scc(graph, 8)
print("Strongly connected components:", sccs)
# Possible output: [[7], [4, 6, 5], [0, 3, 2, 1]]
# Note: SCCs are output in reverse topological order
Iterative version (avoids recursion depth limits):
def tarjan_scc_iterative(graph: Dict[int, List[int]], n: int) -> List[List[int]]:
"""Iterative Tarjan's Algorithm - avoids stack overflow"""
dfn = [0] * n
low = [0] * n
on_stack = [False] * n
stack = []
timer = [0]
sccs = []
for start in range(n):
if dfn[start] != 0:
continue
# Manually simulate recursion stack: (node, neighbor iterator index)
call_stack = [(start, 0)]
timer[0] += 1
dfn[start] = low[start] = timer[0]
stack.append(start)
on_stack[start] = True
while call_stack:
u, idx = call_stack[-1]
neighbors = graph.get(u, [])
if idx < len(neighbors):
call_stack[-1] = (u, idx + 1)
v = neighbors[idx]
if dfn[v] == 0:
timer[0] += 1
dfn[v] = low[v] = timer[0]
stack.append(v)
on_stack[v] = True
call_stack.append((v, 0))
elif on_stack[v]:
low[u] = min(low[u], dfn[v])
else:
# All neighbors of u processed
if dfn[u] == low[u]:
scc = []
while True:
v = stack.pop()
on_stack[v] = False
scc.append(v)
if v == u:
break
sccs.append(scc)
call_stack.pop()
if call_stack:
parent = call_stack[-1][0]
low[parent] = min(low[parent], low[u])
return sccs
19.3 Articulation Points and Bridges
Articulation Point (Cut Vertex): A node whose removal (along with its edges) increases the number of connected components.
Bridge (Cut Edge): An edge whose removal increases the number of connected components.
Articulation points and bridges reveal the "vulnerability" of a graph โ critical nodes/links in a network.
def find_bridges_and_articulation_points(graph: Dict[int, List[int]], n: int):
"""
Find articulation points and bridges in an undirected graph
using Tarjan's algorithm concepts.
Note: This operates on an UNDIRECTED graph!
"""
dfn = [0] * n
low = [0] * n
timer = [0]
bridges = []
articulation_points = set()
def dfs(u: int, parent: int):
timer[0] += 1
dfn[u] = low[u] = timer[0]
children = 0 # Number of children of u in DFS tree
for v in graph.get(u, []):
if v == parent:
continue # Skip the edge we came from (undirected graph)
if dfn[v] == 0: # Tree edge
children += 1
dfs(v, u)
low[u] = min(low[u], low[v])
# Bridge condition: v cannot reach u or u's ancestors without (u,v)
if low[v] > dfn[u]:
bridges.append((u, v))
# Articulation point (non-root case)
if parent != -1 and low[v] >= dfn[u]:
articulation_points.add(u)
else:
# Back edge
low[u] = min(low[u], dfn[v])
# Root articulation point: has >= 2 children in DFS tree
if parent == -1 and children >= 2:
articulation_points.add(u)
for i in range(n):
if dfn[i] == 0:
dfs(i, -1)
return bridges, articulation_points
# Example
graph = {
0: [1, 2],
1: [0, 2],
2: [0, 1, 3],
3: [2, 4],
4: [3, 5, 6],
5: [4, 6],
6: [4, 5]
}
bridges, aps = find_bridges_and_articulation_points(graph, 7)
print("Bridges:", bridges) # [(2, 3), (3, 4)] or similar
print("Articulation points:", aps) # {2, 3, 4} or similar
Logic explanation:
- Bridge: For tree edge (u, v), if
low[v] > dfn[u], then v's subtree cannot reach u or u's ancestors via back edges โ removing (u, v) disconnects v's subtree. - Articulation point (non-root): For tree edge (u, v), if
low[v] >= dfn[u], then v's subtree cannot reach u's ancestors without going through u โ removing u disconnects v's subtree. - Articulation point (root): The root is an articulation point iff it has >= 2 children in the DFS tree (removing the root disconnects these subtrees from each other).
19.4 Maximum Flow Overview
Problem definition: Given a directed graph (flow network) with a source s, a sink t, and a capacity on each edge, find the maximum flow from s to t.
Constraints:
- Capacity constraint: Flow on each edge โค capacity.
- Flow conservation: For every node except s and t, flow in = flow out.
Flow network example:
s ---10--โ A ---8--โ t
| โ โ โ
5 2 4 6
| | โ |
โ | โ |
B ---3--โ C ---6--โ D โ t
Maximum flow = maximum total flow pushable from s to t
19.5 Ford-Fulkerson Method
Ford-Fulkerson is not a specific algorithm but a method framework: repeatedly find an augmenting path from s to t in the residual graph, push flow along it, until no augmenting path exists.
Residual graph: For each edge (u, v) with capacity c and flow f:
- Forward residual capacity = c - f (how much more can be pushed)
- Backward residual capacity = f (how much flow can be "returned")
Augmenting path: A path from s to t in the residual graph.
from collections import deque
def bfs_find_path(residual: List[List[int]], source: int, sink: int, parent: List[int]) -> bool:
"""BFS to find augmenting path in residual graph (Edmonds-Karp)"""
n = len(residual)
visited = [False] * n
visited[source] = True
queue = deque([source])
while queue:
u = queue.popleft()
for v in range(n):
if not visited[v] and residual[u][v] > 0:
visited[v] = True
parent[v] = u
if v == sink:
return True
queue.append(v)
return False
def edmonds_karp(capacity: List[List[int]], source: int, sink: int) -> int:
"""
Edmonds-Karp Algorithm (Ford-Fulkerson + BFS)
Args:
capacity: capacity[u][v] = capacity of edge (u,v)
source: Source node
sink: Sink node
Returns:
Maximum flow value
"""
n = len(capacity)
residual = [row[:] for row in capacity]
max_flow = 0
parent = [-1] * n
while bfs_find_path(residual, source, sink, parent):
# Find bottleneck capacity along the path
path_flow = float('inf')
v = sink
while v != source:
u = parent[v]
path_flow = min(path_flow, residual[u][v])
v = u
# Update residual graph
v = sink
while v != source:
u = parent[v]
residual[u][v] -= path_flow # Decrease forward
residual[v][u] += path_flow # Increase backward (allow flow cancellation)
v = u
max_flow += path_flow
parent = [-1] * n
return max_flow
# Example
# 4 nodes: 0=source, 3=sink
capacity = [
[0, 10, 10, 0], # 0โ1: 10, 0โ2: 10
[0, 0, 2, 8], # 1โ2: 2, 1โ3: 8
[0, 0, 0, 10], # 2โ3: 10
[0, 0, 0, 0]
]
print(edmonds_karp(capacity, 0, 3)) # 18
Edmonds-Karp complexity: O(VEยฒ). Using BFS to find shortest augmenting paths guarantees at most O(VE) augmentations.
19.6 Bipartite Maximum Matching (Hungarian Algorithm)
Bipartite graph: Nodes can be split into two groups L and R where all edges connect an L-node to an R-node.
Maximum matching: Find the maximum number of edges such that each node is covered by at most one edge.
Classic applications: Worker-task assignment, student-course matching, etc.
def hungarian(graph: Dict[int, List[int]], n_left: int, n_right: int) -> int:
"""
Hungarian Algorithm for Maximum Bipartite Matching
Args:
graph: Left-side adjacency list, graph[u] = [matchable right-side nodes]
n_left: Number of left nodes (0 to n_left-1)
n_right: Number of right nodes (0 to n_right-1)
Returns:
Maximum matching size
"""
match_right = [-1] * n_right
def dfs(u: int, visited: List[bool]) -> bool:
"""Try to find a match for left node u"""
for v in graph.get(u, []):
if visited[v]:
continue
visited[v] = True
# If v is unmatched, or v's current match can find another partner
if match_right[v] == -1 or dfs(match_right[v], visited):
match_right[v] = u
return True
return False
result = 0
for u in range(n_left):
visited = [False] * n_right
if dfs(u, visited):
result += 1
return result
# Example: 3 workers, 4 tasks
graph = {
0: [0, 1], # Worker 0: can do tasks 0, 1
1: [1, 2], # Worker 1: can do tasks 1, 2
2: [2, 3] # Worker 2: can do tasks 2, 3
}
print(hungarian(graph, 3, 4)) # 3 (optimal: worker0โtask0, worker1โtask1, worker2โtask2)
Core idea of the Hungarian algorithm โ augmenting paths:
When finding a match for left node u, if the directly available right node v is already taken, try to make v's current match "yield" (find another match). This "yielding" process recurses, essentially finding an alternating path from an unmatched node, alternating between non-matching and matching edges, ending at another unmatched node โ an "augmenting path."
Time complexity: O(V ร E). One DFS per left node, each DFS visits at most all edges.
19.7 Complete Application Example
"""
Comprehensive example: Social network analysis
"""
def social_network_analysis():
"""
Analyze community structure in social networks (SCC = mutual-follow groups)
"""
# Follow relationships (directed graph)
follow_graph = {
0: [1],
1: [2],
2: [0, 3], # 0,1,2 form a mutual-follow cycle
3: [4],
4: [3], # 3,4 mutual-follow pair
5: [4] # 5 follows 4, nobody follows 5
}
sccs = tarjan_scc(follow_graph, 6)
print("Mutual-follow groups:", sccs)
# [[3, 4], [0, 2, 1], [5]]
print("Information flow: group{0,1,2} โ group{3,4}")
print("group{5} โ group{3,4}")
def task_assignment():
"""
Task assignment: maximize completed tasks
"""
developer_skills = {
0: [0, 1, 2],
1: [1, 3],
2: [2, 4],
3: [3, 4, 5],
4: [5]
}
max_tasks = hungarian(developer_skills, 5, 6)
print(f"Maximum tasks completable simultaneously: {max_tasks}")
social_network_analysis()
task_assignment()
Level 2 ยท How It Works Under the Hood
19.8 Kosaraju's Algorithm (Two-Pass DFS)
Kosaraju's algorithm is another classic method for finding SCCs, conceptually more intuitive than Tarjan's but requiring two DFS passes.
Core idea:
- First DFS: On the original graph, record nodes in DFS finish order (post-order).
- Reverse the graph: Reverse all edge directions.
- Second DFS: In reverse finish order (topological order), DFS on the reversed graph. Each DFS tree corresponds to one SCC.
Why is it correct? Intuition:
- Nodes within an SCC can reach each other in both the original and reversed graph.
- Processing in reverse finish order ensures: if SCC_A has an edge to SCC_B (in the original graph), then SCC_A is processed first. In the reversed graph, this edge becomes SCC_B โ SCC_A, so processing SCC_A won't "leak into" SCC_B.
def kosaraju_scc(graph: Dict[int, List[int]], n: int) -> List[List[int]]:
"""
Kosaraju's Algorithm for Strongly Connected Components
Args:
graph: Directed graph adjacency list
n: Number of nodes
Returns:
List of all SCCs
"""
# Step 1: DFS on original graph, record finish order
visited = [False] * n
finish_order = []
def dfs1(u: int):
visited[u] = True
for v in graph.get(u, []):
if not visited[v]:
dfs1(v)
finish_order.append(u)
for i in range(n):
if not visited[i]:
dfs1(i)
# Step 2: Build reversed graph
reverse_graph: Dict[int, List[int]] = {i: [] for i in range(n)}
for u in range(n):
for v in graph.get(u, []):
reverse_graph[v].append(u)
# Step 3: DFS on reversed graph in reverse finish order
visited = [False] * n
sccs = []
def dfs2(u: int, scc: List[int]):
visited[u] = True
scc.append(u)
for v in reverse_graph[u]:
if not visited[v]:
dfs2(v, scc)
for u in reversed(finish_order):
if not visited[u]:
scc = []
dfs2(u, scc)
sccs.append(scc)
return sccs
# Test
graph = {0: [1], 1: [2], 2: [0, 3], 3: [4], 4: [3]}
print(kosaraju_scc(graph, 5))
# [[0, 1, 2], [3, 4]] โ two SCCs
Tarjan vs Kosaraju comparison:
| Property | Tarjan | Kosaraju |
|---|---|---|
| DFS passes | 1 | 2 |
| Needs reversed graph? | No | Yes |
| Extra space | Stack + dfn/low arrays | Reversed graph + finish_order |
| Implementation difficulty | Higher (low value understanding) | Lower (two simple DFS) |
| Interview recommendation | Yes (more efficient) | Yes (easier to explain) |
| Output order | Reverse topological | Topological |
19.9 Deep Dive into Tarjan's low Value
The precise definition of low[u]: "The smallest dfn reachable from u's subtree through exactly one back edge (or cross edge to a stack node)."
Common confusion:
# For tree edge (u โ v):
low[u] = min(low[u], low[v]) # Earliest ancestor reachable through subtree v
# For back edge (u โ v), v on stack:
low[u] = min(low[u], dfn[v]) # Note: dfn[v], NOT low[v]!
# Why dfn[v] and not low[v]?
# Because the back edge goes directly to v, reaching v itself (dfn[v])
# Using low[v] would transitively "pass through" multiple back edges,
# potentially incorrectly merging nodes from different SCCs
However, in practice, using low[u] = min(low[u], low[v]) for back edges is also correct (in most cases) โ because if v is on the stack and can reach an earlier node via v, then v and u must be in the same SCC. Many implementations (including our earlier code) use dfn[v] because it's Tarjan's original definition and theoretically more precise.
Execution trace:
Graph: 0โ1โ2โ0, 2โ3
DFS starting at node 0:
Visit 0: dfn[0]=1, low[0]=1, stack=[0]
Visit 1: dfn[1]=2, low[1]=2, stack=[0,1]
Visit 2: dfn[2]=3, low[2]=3, stack=[0,1,2]
Neighbor 0: on stack, low[2] = min(3, dfn[0]) = min(3,1) = 1
Neighbor 3: unvisited
Visit 3: dfn[3]=4, low[3]=4, stack=[0,1,2,3]
No neighbors
dfn[3]==low[3] โ SCC={3}, pop 3, stack=[0,1,2]
low[2] = min(low[2], low[3]) = min(1, 4) = 1
Return to 1: low[1] = min(low[1], low[2]) = min(2, 1) = 1
Return to 0: low[0] = min(low[0], low[1]) = min(1, 1) = 1
dfn[0]==low[0] โ SCC={2,1,0}, pop until 0
Result: SCC1={3}, SCC2={0,1,2}
19.10 The Max-Flow Min-Cut Theorem
Theorem (Ford & Fulkerson, 1956): In any flow network, the value of the maximum flow equals the capacity of the minimum cut.
Cut: Partition nodes into S (containing s) and T (containing t). Cut capacity = sum of capacities of all edges from S to T.
Intuitive explanation:
- Max flow โค any cut's capacity (flow cannot exceed the bottleneck).
- When Ford-Fulkerson terminates, s cannot reach t in the residual graph. At this point, define S = {nodes reachable from s in the residual graph}, T = V \ S. All original edges from S to T are saturated (otherwise residual capacity > 0 and s could reach a node in T). This cut's capacity exactly equals the max flow.
Applications of Min-Cut:
- Network security: Minimum number of communication links to cut to disconnect s from t.
- Image segmentation: Classify pixels as foreground/background minimizing segmentation cost.
- Project selection: Select a set of projects to maximize net revenue (equivalent to dual of min-cut).
def find_min_cut(capacity: List[List[int]], source: int, sink: int):
"""
Find min-cut: first compute max flow, then BFS from source in residual graph
"""
n = len(capacity)
residual = [row[:] for row in capacity]
parent = [-1] * n
# Run Edmonds-Karp to get final residual graph
while bfs_find_path(residual, source, sink, parent):
path_flow = float('inf')
v = sink
while v != source:
u = parent[v]
path_flow = min(path_flow, residual[u][v])
v = u
v = sink
while v != source:
u = parent[v]
residual[u][v] -= path_flow
residual[v][u] += path_flow
v = u
parent = [-1] * n
# BFS from source in residual graph to find S set
visited = [False] * n
queue = deque([source])
visited[source] = True
while queue:
u = queue.popleft()
for v in range(n):
if not visited[v] and residual[u][v] > 0:
visited[v] = True
queue.append(v)
S = {i for i in range(n) if visited[i]}
T = {i for i in range(n) if not visited[i]}
# Min-cut edges = SโT edges with original capacity > 0
cut_edges = []
for u in S:
for v in T:
if capacity[u][v] > 0:
cut_edges.append((u, v, capacity[u][v]))
return S, T, cut_edges
19.11 Dinic's Algorithm
Dinic's algorithm is more efficient than Edmonds-Karp. Key improvement: instead of finding one augmenting path at a time, first use BFS to build a level graph, then use DFS on the level graph to find multiple augmenting paths at once.
def dinic(capacity: List[List[int]], source: int, sink: int) -> int:
"""
Dinic's Algorithm for Maximum Flow
Time complexity: O(VยฒE), O(EโV) for unit-capacity graphs
"""
n = len(capacity)
residual = [row[:] for row in capacity]
def bfs_level() -> bool:
"""BFS to build level graph"""
nonlocal level
level = [-1] * n
level[source] = 0
queue = deque([source])
while queue:
u = queue.popleft()
for v in range(n):
if level[v] == -1 and residual[u][v] > 0:
level[v] = level[u] + 1
queue.append(v)
return level[sink] != -1
def dfs_blocking(u: int, pushed: int) -> int:
"""DFS to find blocking flow in level graph"""
if u == sink:
return pushed
while iter_ptr[u] < n:
v = iter_ptr[u]
if level[v] == level[u] + 1 and residual[u][v] > 0:
flow = dfs_blocking(v, min(pushed, residual[u][v]))
if flow > 0:
residual[u][v] -= flow
residual[v][u] += flow
return flow
iter_ptr[u] += 1
return 0
max_flow = 0
level = [-1] * n
while bfs_level():
iter_ptr = [0] * n
while True:
flow = dfs_blocking(source, float('inf'))
if flow == 0:
break
max_flow += flow
return max_flow
Dinic vs Edmonds-Karp complexity comparison:
| Algorithm | Time Complexity | Use Case |
|---|---|---|
| Edmonds-Karp | O(VEยฒ) | General |
| Dinic | O(VยฒE) | General (faster) |
| Dinic (unit capacity) | O(EโV) | Bipartite matching etc. |
Why is Dinic faster? After each BFS phase, the shortest path length from source to sink increases by at least 1 (because the blocking flow blocks all paths at the current level). Since the shortest path is at most V-1, there are at most V-1 BFS phases. Each phase (BFS + blocking flow) takes O(VE), giving O(VยฒE) total.
19.12 Bipartite Matching from a Network Flow Perspective
The Hungarian algorithm essentially finds augmenting paths in a bipartite graph โ exactly the same as the augmenting path method in network flow! In fact, maximum bipartite matching directly reduces to maximum flow:
Construct flow network:
1. Add super-source s, connect to all left nodes (capacity 1)
2. Add super-sink t, all right nodes connect to t (capacity 1)
3. Original LโR edges get capacity 1
Maximum flow = Maximum matching
s โ L1 โ R1 โ t
s โ L2 โ R2 โ t
s โ L3 โ R3 โ t
(cross-connections omitted)
Kรถnig's Theorem: In a bipartite graph,
- Maximum matching = Minimum vertex cover
- Maximum independent set = n - Maximum matching
This is the classic manifestation of LP duality in combinatorial optimization.
19.13 Applications of the Condensation DAG
After contracting SCCs, the resulting DAG enables:
- Reachability queries: Use topological sort + DP on the DAG to determine inter-SCC reachability.
- Longest path: Longest path on DAG (critical path method).
- 2-SAT: If variable x and its negation ยฌx are in the same SCC, no solution exists.
def scc_dag(graph: Dict[int, List[int]], n: int):
"""
Condensation: Transform original graph into DAG of SCCs
"""
sccs = tarjan_scc(graph, n)
# Which SCC each node belongs to
scc_id = [0] * n
for i, scc in enumerate(sccs):
for node in scc:
scc_id[node] = i
# Build DAG (edges between SCCs)
num_sccs = len(sccs)
dag = {i: set() for i in range(num_sccs)}
for u in range(n):
for v in graph.get(u, []):
if scc_id[u] != scc_id[v]:
dag[scc_id[u]].add(scc_id[v])
dag = {k: list(v) for k, v in dag.items()}
return sccs, scc_id, dag
# 2-SAT application
def solve_2sat(n: int, clauses: List[Tuple[int, int]]) -> List[bool]:
"""
2-SAT Solver
Variables numbered 0 to n-1
Positive literal x โ number x, negative literal ยฌx โ number x+n
clauses: [(a, b), ...] representing (a โจ b)
where a, b are literal numbers
"""
total = 2 * n
graph = {i: [] for i in range(total)}
def neg(x):
return x + n if x < n else x - n
for a, b in clauses:
graph[neg(a)].append(b) # ยฌa โ b
graph[neg(b)].append(a) # ยฌb โ a
sccs = tarjan_scc(graph, total)
scc_id = [0] * total
for i, scc in enumerate(sccs):
for node in scc:
scc_id[node] = i
# If x and ยฌx are in the same SCC, unsatisfiable
for x in range(n):
if scc_id[x] == scc_id[x + n]:
return None
# Assignment: if scc_id[x] > scc_id[ยฌx], then x = True
assignment = [False] * n
for x in range(n):
assignment[x] = scc_id[x] > scc_id[x + n]
return assignment
Level 3 ยท What the Specifications Say
19.14 Tarjan's 1972 Paper
Paper: Robert E. Tarjan, "Depth-First Search and Linear Graph Algorithms", SIAM Journal on Computing, 1972.
This paper is a milestone in graph algorithms. It systematically demonstrated the power of DFS. Tarjan presented:
- Strongly connected components algorithm (the core of this chapter)
- Biconnected components algorithm (identifying articulation points and bridges)
- Systematic classification of DFS edges: tree edges, back edges, forward edges, cross edges
Tarjan's key insight: The DFS search tree has excellent structural properties. In a directed graph's DFS tree, all non-tree edges are either back edges (to ancestors), forward edges (to descendants), or cross edges (to fully processed subtrees). Exploiting these properties, many graph problems can be solved in O(V+E) time.
Mathematical definition of the low value (Tarjan's original):
low(v) = min(
dfn(v),
min{dfn(w) | (u, w) is a back edge, u is a descendant of v (including v itself)}
)
That is, low(v) is the earliest ancestor's DFS number reachable from v's subtree through at most one back edge.
Tarjan's academic contributions: Robert Tarjan (1948-) is one of the giants of theoretical computer science. His contributions include:
- Union-Find O(ฮฑ(n)) analysis (1975)
- Fibonacci heaps (with Fredman, 1984)
- Splay trees (with Sleator, 1985)
- Strongly connected components algorithm (1972)
- Offline LCA algorithm (1979)
He received the Turing Award in 1986 (jointly with John Hopcroft), recognizing their fundamental contributions to algorithms and data structures.
19.15 The Max-Flow Min-Cut Theorem (Ford & Fulkerson 1956)
Paper: L.R. Ford Jr. & D.R. Fulkerson, "Maximal Flow through a Network", Canadian Journal of Mathematics, 1956.
Context: This theorem has Cold War origins. Ford and Fulkerson worked at RAND Corporation, and the theorem was initially used to analyze the bottleneck capacity of the Soviet railway network โ determining how many railway lines needed to be cut to cripple the Soviet transport system.
Formal statement:
Let f* be the maximum flow in a network, and C* be the minimum cut capacity. Then f* = C*.
Proof framework:
-
Weak duality (f โค C for any flow f and cut C):
- Any flow from s to t must completely "pass through" any s-t cut.
- The flow through a cut โค the cut's capacity (each cut edge's flow โค its capacity).
- Therefore |f| โค cap(S, T) for any cut (S, T).
-
Strong duality (there exist f, C such that f = C):
- When Ford-Fulkerson terminates, s cannot reach t in the residual graph.
- Define S = {nodes reachable from s in residual graph}, T = V \ S.
- All original edges from S to T are saturated (residual capacity = 0).
- All original edges from T to S have zero flow (otherwise backward residual capacity would let s reach nodes in T).
- So |f| = cap(S, T). Combined with weak duality: f is max flow, (S,T) is min cut.
Termination issue of the Ford-Fulkerson method:
If all capacities are integers, each augmentation increases flow by at least 1, so at most |f*| augmentations occur and the algorithm terminates. But if capacities are irrational, Ford-Fulkerson (using DFS for augmenting paths) may not terminate! Zwick (1995) constructed such counterexamples.
Edmonds-Karp (using BFS) guarantees termination with O(VEยฒ) complexity, independent of capacity magnitudes.
19.16 Network Flow in Practice
Application 1: Bipartite Matching (discussed earlier)
Application 2: Project Selection Problem
Given n projects, each with profit p_i (positive or negative). Projects have dependencies: choosing project i requires choosing project j. Find maximum net profit.
Reduction to min-cut:
- Source s connects to positive-profit projects (capacity = profit)
- Negative-profit projects connect to sink t (capacity = |profit|)
- Dependency iโj gets capacity โ
- Maximum net profit = sum of all positive profits - min cut
Application 3: Image Segmentation
Classify pixels as foreground/background. Each pixel has a "preference" for foreground and background, and adjacent pixels in different classes incur a "penalty."
Reduction to min-cut:
- s โ pixel: capacity = foreground preference
- pixel โ t: capacity = background preference
- Between adjacent pixels: capacity = different-class penalty
Application 4: Baseball Elimination
Determine whether a team has been mathematically eliminated (cannot win the league even by winning all remaining games). This can be determined using max flow!
19.17 History of the Hungarian Algorithm
The Hungarian Algorithm was first proposed by Harold Kuhn in 1955, but his algorithm was based on work by Hungarian mathematicians Dรฉnes Kรถnig (1931) and Jenล Egervรกry (1931). Kuhn named it to honor these Hungarian mathematicians.
Kuhn's original algorithm solved the weighted bipartite optimal matching (Assignment Problem) in O(nยณ) time. The simplified version we showed in Level 1 (unweighted maximum matching) is the O(VE) augmenting-path algorithm.
Hopcroft-Karp algorithm (1973) improved unweighted bipartite maximum matching to O(EโV) โ its core idea is nearly identical to Dinic's algorithm (level graph + multiple augmenting paths).
Kรถnig's Theorem (1931): In a bipartite graph,
- Maximum matching = Minimum vertex cover
- Maximum independent set = n - Maximum matching
This is a classic manifestation of LP duality in combinatorial optimization.
19.18 Theoretical Comparison of SCC Algorithms
| Algorithm | Time | Space | DFS Passes | Published |
|---|---|---|---|---|
| Tarjan | O(V+E) | O(V) | 1 | 1972 |
| Kosaraju-Sharir | O(V+E) | O(V+E) | 2 + reversed graph | 1981 (Sharir) |
| Path-based (Gabow) | O(V+E) | O(V) | 1 | 2000 |
| Dijkstra (partition) | O(V+E) | O(V) | 1 | 1976 |
Kosaraju's algorithm was published in Sharir's 1981 technical report, but Kosaraju reportedly described it in a 1978 lecture (unpublished).
Path-based algorithm (Gabow 2000) uses two stacks instead of Tarjan's low-value array, conceptually cleaner but with negligible practical performance difference.
Level 4 ยท Edge Cases and Pitfalls
19.19 Interview Frequency of These Algorithms
Let's talk data. The following frequency analysis is based on LeetCode tag statistics and mainstream interview experience sharing:
High frequency (must know):
- Dijkstra: ~50+ LeetCode problems, commonly tested. Variants (minimax, multi-source) are hidden test points.
- Topological sort: ~30+ problems, extremely high frequency.
- Union-Find (MST-related): ~70+ problems, high frequency.
- BFS/DFS: ubiquitous.
Medium frequency (should know):
- Bellman-Ford/SPFA: ~15 problems. Required when problem mentions "negative weights."
- Kruskal/Prim (MST): ~10 problems. Usually direct template application.
- Bipartite graph detection: ~10 problems.
Low frequency (awareness level, rarely directly tested in interviews):
- Strongly connected components (Tarjan/Kosaraju): ~5 LeetCode problems. Almost never in interviews, but concepts used in competitive programming and system design.
- Maximum flow/network flow: ~5 LeetCode problems. Extremely rare in interviews, but the ultimate test of modeling ability.
- Hungarian algorithm: Almost never in interviews. But the "can this be modeled as matching?" mindset is valuable.
- Articulation points/bridges: ~5 problems (e.g., LeetCode 1192). Occasionally in system design for network vulnerability analysis.
Conclusions:
- Dijkstra + MST (Kruskal) are interview essentials โ must be writable from memory in 20 minutes.
- SCC and network flow are "knowledge reserves" โ direct interview probability < 5%.
- Max-flow min-cut theorem thinking is occasionally useful in system design interviews (bottleneck analysis).
- If preparing for Google/Meta hard interviews or competitive programming, SCC and network flow are worth deep study.
19.20 When Do You Need to Learn These
Recommended learning priority:
Priority 1 (Interview essentials):
- Dijkstra (and variants)
- BFS/DFS
- Topological sort
- Union-Find + Kruskal
Priority 2 (Advanced bonus):
- Bellman-Ford (negative weight scenarios)
- Floyd-Warshall (all-pairs)
- Bipartite detection
- Tarjan for bridges/articulation points
Priority 3 (Advanced/competitive):
- SCC (Tarjan/Kosaraju)
- 2-SAT
- Maximum flow (Dinic)
- Bipartite maximum matching
Priority 4 (Theoretical/PhD-level):
- Minimum cost maximum flow
- General graph matching (Blossom algorithm)
- Gomory-Hu tree
- Maximum weight closure
Signals: When do you need these algorithms?
| Problem Characteristic | Likely Algorithm |
|---|---|
| "Groups that can mutually reach each other" | SCC |
| "Removal disconnects the graph" | Articulation points/bridges |
| "Maximum amount that can be transmitted/assigned" | Maximum flow |
| "Optimal one-to-one correspondence" | Bipartite matching |
| "Interdependent select/don't-select decisions" | 2-SAT or network flow |
19.21 LeetCode 1192: Critical Connections in a Network
Problem: Given n servers and connections (undirected graph), find all "critical connections" โ connections whose removal disconnects some servers. I.e., find all bridges.
def criticalConnections(n: int, connections: List[List[int]]) -> List[List[int]]:
"""
LeetCode 1192: Critical Connections in a Network
Direct application of Tarjan's bridge-finding algorithm
"""
graph = {i: [] for i in range(n)}
for u, v in connections:
graph[u].append(v)
graph[v].append(u)
dfn = [0] * n
low = [0] * n
timer = [0]
bridges = []
def dfs(u: int, parent: int):
timer[0] += 1
dfn[u] = low[u] = timer[0]
for v in graph[u]:
if v == parent:
continue
if dfn[v] == 0:
dfs(v, u)
low[u] = min(low[u], low[v])
if low[v] > dfn[u]:
bridges.append([u, v])
else:
low[u] = min(low[u], dfn[v])
dfs(0, -1)
return bridges
# Test
n = 4
connections = [[0,1],[1,2],[2,0],[1,3]]
print(criticalConnections(n, connections)) # [[1, 3]]
# Explanation: 0-1-2 forms a cycle, removing any one doesn't affect connectivity
# But 1-3 is a bridge โ removing it isolates node 3
Interview note: This problem may hit recursion depth limits (n can reach 10^5). In production, use the iterative version or increase the recursion limit (sys.setrecursionlimit).
19.22 Real Engineering Applications
Scenario 1: Microservice Dependency Analysis (SCC)
def analyze_microservice_dependencies(services: List[str], calls: List[Tuple[str, str]]):
"""
Analyze circular dependencies among microservices.
If a group of services forms an SCC (mutual calls), they should be
merged or decoupled.
"""
name_to_id = {name: i for i, name in enumerate(services)}
n = len(services)
graph = {i: [] for i in range(n)}
for caller, callee in calls:
graph[name_to_id[caller]].append(name_to_id[callee])
sccs = tarjan_scc(graph, n)
# Find groups with circular dependencies (SCC size > 1)
circular_groups = []
for scc in sccs:
if len(scc) > 1:
group_names = [services[i] for i in scc]
circular_groups.append(group_names)
return circular_groups
# Example
services = ["auth", "user", "order", "payment", "notification"]
calls = [
("auth", "user"),
("user", "auth"), # Circular!
("order", "payment"),
("payment", "notification"),
("notification", "order"), # Circular: orderโpaymentโnotificationโorder
]
groups = analyze_microservice_dependencies(services, calls)
print("Circular dependency groups:", groups)
# [["order", "notification", "payment"], ["auth", "user"]]
Scenario 2: CDN Bandwidth Allocation (Network Flow)
def cdn_bandwidth_allocation():
"""
Simplified CDN traffic allocation problem.
Multiple origin servers โ CDN nodes โ user groups
Goal: maximize total bandwidth
"""
# Nodes: 0=super-source, 1-3=origin servers, 4-6=CDN nodes, 7-9=user groups, 10=super-sink
n = 11
capacity = [[0] * n for _ in range(n)]
# Origin server capacities
capacity[0][1] = 100 # Server 1: 100 Mbps
capacity[0][2] = 80 # Server 2: 80 Mbps
capacity[0][3] = 60 # Server 3: 60 Mbps
# ServerโCDN link capacities
capacity[1][4] = 50; capacity[1][5] = 70
capacity[2][4] = 40; capacity[2][6] = 60
capacity[3][5] = 30; capacity[3][6] = 50
# CDNโUser group link capacities
capacity[4][7] = 60; capacity[4][8] = 30
capacity[5][7] = 40; capacity[5][9] = 50
capacity[6][8] = 45; capacity[6][9] = 35
# User group demands
capacity[7][10] = 80
capacity[8][10] = 70
capacity[9][10] = 60
max_bw = edmonds_karp(capacity, 0, 10)
print(f"Maximum allocatable bandwidth: {max_bw} Mbps")
cdn_bandwidth_allocation()
19.23 Common Pitfalls and Summary
Pitfall 1: Handling parallel edges in Tarjan's
# In undirected graphs, if edge (u,v) appears twice (parallel edges),
# you cannot simply skip with "if v == parent: continue"
# because only ONE traversal should be skipped (the edge we came from),
# the second parallel edge should be considered
# Correct approach: use edge IDs to avoid double-counting
def dfs_with_edge_id(u, parent_edge_id):
for v, edge_id in graph[u]:
if edge_id == parent_edge_id:
continue # Only skip the specific edge we came from
# ...
Pitfall 2: Recursion depth limits
import sys
sys.setrecursionlimit(200000) # For graphs with n โค 10^5
# Or use iterative implementation (recommended for production)
Pitfall 3: Adjacency matrix limitation of Edmonds-Karp
# Adjacency matrix Edmonds-Karp uses O(Vยฒ) space
# If V = 10^5, that's 10^10 bytes โ infeasible!
# Large-scale network flow must use adjacency list implementation
# For interviews with V โค a few hundred, adjacency matrix is clearest
# For V โค 10^4-10^5, must use adjacency list + forward star
Pitfall 4: Confusing SCC with connected components
# Undirected graph connected components: simple BFS/DFS
# Directed graph strongly connected components: need Tarjan/Kosaraju
# Directed graph weakly connected components: connected components ignoring edge direction
# In interviews, CONFIRM whether it's directed or undirected!
Algorithm quick reference:
| Algorithm | Problem | Time | Interview Frequency |
|---|---|---|---|
| Tarjan (SCC) | Directed graph SCCs | O(V+E) | Low |
| Tarjan (bridges/APs) | Undirected critical edges/nodes | O(V+E) | Medium-low |
| Kosaraju | Directed graph SCCs | O(V+E) | Low |
| Edmonds-Karp | Maximum flow | O(VEยฒ) | Very low |
| Dinic | Maximum flow | O(VยฒE) | Very low |
| Hungarian | Bipartite max matching | O(VE) | Low |
| Hopcroft-Karp | Bipartite max matching | O(EโV) | Very low |
Core takeaways:
- SCC is the key to understanding directed graph structure โ contraction yields a DAG, simplifying problems.
- Network flow is the Swiss army knife of combinatorial optimization โ many problems that "don't look like flow" reduce to network flow.
- Max flow = Min cut โ this duality is central to understanding network flow applications.
- These algorithms are rarely tested in interviews, but knowing they exist matters โ some hard problems can only be efficiently solved by modeling as network flow.
- In real engineering, SCC is most useful โ circular dependency detection, deadlock analysis, compiler optimization all use it.