Chapter 19

Strongly Connected Components and Network Flow

Chapter 19: Strongly Connected Components and Network Flow

The previous chapter covered the two classic optimization problems on weighted graphs — shortest paths and minimum spanning trees. This chapter ventures into the "advanced territory" of graph theory: Strongly Connected Components (SCC) and Network Flow.

Strongly connected components reveal the "macro-structure" of directed graphs — which nodes can reach each other? If we "contract" each SCC into a super-node, the entire graph becomes a DAG (directed acyclic graph), dramatically reducing problem complexity. This is the core technique of "simplification through structure" in graph theory.

Network flow is a class of models that appears theoretical but has remarkably wide applications. From "how much water can flow through a pipe network" to "how to optimally assign workers to tasks," network flow unifies a vast array of combinatorial optimization problems into one elegant framework. The Max-Flow Min-Cut theorem perfectly connects two seemingly unrelated questions: "how to maximize throughput" and "how to minimize the cost of disruption."

This chapter's core objectives: (1) master Tarjan's algorithm for SCCs, articulation points, and bridges; (2) understand Kosaraju's two-pass DFS design; (3) understand the maximum flow problem and the Ford-Fulkerson method; (4) learn bipartite matching and the Hungarian algorithm; (5) assess these algorithms' interview frequency and learning priority.

Level 1 · What You Need to Know

19.1 The Concept of Strongly Connected Components

Strongly connected: In a directed graph, if node u can reach v AND v can reach u, then u and v are strongly connected.

Strongly Connected Component (SCC): A maximal strongly connected subgraph — any two nodes within it are strongly connected, and no external node can be added while maintaining strong connectivity.

Directed graph example:

    0 → 1 → 2
    ↑       ↓
    3 ← 4 ← 2
        ↓
        5 → 6
            ↑ ↓
            8 ← 7

Strongly connected components:
  SCC1: {0, 1, 2, 3, 4}  (any two of these 5 nodes can reach each other)
  SCC2: {5}              (only itself)
  SCC3: {6, 7, 8}        (6→7→8→6 forms a cycle)

Condensation DAG:
  SCC1 → SCC2 → SCC3

Why are SCCs important?

Problem simplification: Contracting SCCs into single nodes turns the graph into a DAG. Many problems on DAGs (topological sort, longest path, etc.) are far simpler than on general directed graphs.
2-SAT problems: Determining satisfiability of Boolean formulas (2-SAT) directly reduces to SCC computation.
Reachability analysis: Nodes within an SCC can all reach each other; only inter-SCC reachability needs analysis.

19.2 Tarjan's Algorithm

Tarjan's algorithm is the most classic and efficient method for finding SCCs. It requires only a single DFS with O(V + E) time complexity.

Core concepts:

DFS order (dfn): The timestamp when a node is first visited.
low value: The smallest dfn reachable from the current node's subtree through tree edges or back edges.
Stack: Maintains all nodes on the current DFS path that haven't been assigned to an SCC yet.

Key condition: If dfn[u] == low[u], then u is the "root" of some SCC — from u, we cannot reach any earlier ancestor. Pop from the stack until u (inclusive); these nodes form one SCC.

from typing import List, Dict, Set, Tuple

def tarjan_scc(graph: Dict[int, List[int]], n: int) -> List[List[int]]:
    """
    Tarjan's Algorithm for Strongly Connected Components
    
    Args:
        graph: Directed graph adjacency list
        n: Number of nodes (0 to n-1)
    
    Returns:
        List of all SCCs (each SCC is a list of nodes)
    """
    dfn = [0] * n       # DFS order
    low = [0] * n       # low value
    on_stack = [False] * n
    stack = []
    timer = [0]         # Mutable counter via list
    sccs = []
    
    def dfs(u: int):
        timer[0] += 1
        dfn[u] = low[u] = timer[0]
        stack.append(u)
        on_stack[u] = True
        
        for v in graph.get(u, []):
            if dfn[v] == 0:  # Unvisited
                dfs(v)
                low[u] = min(low[u], low[v])
            elif on_stack[v]:  # On stack = candidate for current SCC
                low[u] = min(low[u], dfn[v])
        
        # u is the root of an SCC
        if dfn[u] == low[u]:
            scc = []
            while True:
                v = stack.pop()
                on_stack[v] = False
                scc.append(v)
                if v == u:
                    break
            sccs.append(scc)
    
    # Process all unvisited nodes (graph may be disconnected)
    for i in range(n):
        if dfn[i] == 0:
            dfs(i)
    
    return sccs


# Usage example
graph = {
    0: [1],
    1: [2],
    2: [3],
    3: [0, 4],
    4: [5],
    5: [6],
    6: [4, 7],
    7: []
}

sccs = tarjan_scc(graph, 8)
print("Strongly connected components:", sccs)
# Possible output: [[7], [4, 6, 5], [0, 3, 2, 1]]
# Note: SCCs are output in reverse topological order

Iterative version (avoids recursion depth limits):

def tarjan_scc_iterative(graph: Dict[int, List[int]], n: int) -> List[List[int]]:
    """Iterative Tarjan's Algorithm - avoids stack overflow"""
    dfn = [0] * n
    low = [0] * n
    on_stack = [False] * n
    stack = []
    timer = [0]
    sccs = []
    
    for start in range(n):
        if dfn[start] != 0:
            continue
        
        # Manually simulate recursion stack: (node, neighbor iterator index)
        call_stack = [(start, 0)]
        timer[0] += 1
        dfn[start] = low[start] = timer[0]
        stack.append(start)
        on_stack[start] = True
        
        while call_stack:
            u, idx = call_stack[-1]
            neighbors = graph.get(u, [])
            
            if idx < len(neighbors):
                call_stack[-1] = (u, idx + 1)
                v = neighbors[idx]
                
                if dfn[v] == 0:
                    timer[0] += 1
                    dfn[v] = low[v] = timer[0]
                    stack.append(v)
                    on_stack[v] = True
                    call_stack.append((v, 0))
                elif on_stack[v]:
                    low[u] = min(low[u], dfn[v])
            else:
                # All neighbors of u processed
                if dfn[u] == low[u]:
                    scc = []
                    while True:
                        v = stack.pop()
                        on_stack[v] = False
                        scc.append(v)
                        if v == u:
                            break
                    sccs.append(scc)
                
                call_stack.pop()
                if call_stack:
                    parent = call_stack[-1][0]
                    low[parent] = min(low[parent], low[u])
    
    return sccs

19.3 Articulation Points and Bridges

Articulation Point (Cut Vertex): A node whose removal (along with its edges) increases the number of connected components.

Bridge (Cut Edge): An edge whose removal increases the number of connected components.

Articulation points and bridges reveal the "vulnerability" of a graph — critical nodes/links in a network.

def find_bridges_and_articulation_points(graph: Dict[int, List[int]], n: int):
    """
    Find articulation points and bridges in an undirected graph
    using Tarjan's algorithm concepts.
    
    Note: This operates on an UNDIRECTED graph!
    """
    dfn = [0] * n
    low = [0] * n
    timer = [0]
    bridges = []
    articulation_points = set()
    
    def dfs(u: int, parent: int):
        timer[0] += 1
        dfn[u] = low[u] = timer[0]
        children = 0  # Number of children of u in DFS tree
        
        for v in graph.get(u, []):
            if v == parent:
                continue  # Skip the edge we came from (undirected graph)
            
            if dfn[v] == 0:  # Tree edge
                children += 1
                dfs(v, u)
                low[u] = min(low[u], low[v])
                
                # Bridge condition: v cannot reach u or u's ancestors without (u,v)
                if low[v] > dfn[u]:
                    bridges.append((u, v))
                
                # Articulation point (non-root case)
                if parent != -1 and low[v] >= dfn[u]:
                    articulation_points.add(u)
            else:
                # Back edge
                low[u] = min(low[u], dfn[v])
        
        # Root articulation point: has >= 2 children in DFS tree
        if parent == -1 and children >= 2:
            articulation_points.add(u)
    
    for i in range(n):
        if dfn[i] == 0:
            dfs(i, -1)
    
    return bridges, articulation_points


# Example
graph = {
    0: [1, 2],
    1: [0, 2],
    2: [0, 1, 3],
    3: [2, 4],
    4: [3, 5, 6],
    5: [4, 6],
    6: [4, 5]
}

bridges, aps = find_bridges_and_articulation_points(graph, 7)
print("Bridges:", bridges)              # [(2, 3), (3, 4)] or similar
print("Articulation points:", aps)      # {2, 3, 4} or similar

Logic explanation:

Bridge: For tree edge (u, v), if low[v] > dfn[u], then v's subtree cannot reach u or u's ancestors via back edges — removing (u, v) disconnects v's subtree.
Articulation point (non-root): For tree edge (u, v), if low[v] >= dfn[u], then v's subtree cannot reach u's ancestors without going through u — removing u disconnects v's subtree.
Articulation point (root): The root is an articulation point iff it has >= 2 children in the DFS tree (removing the root disconnects these subtrees from each other).

19.4 Maximum Flow Overview

Problem definition: Given a directed graph (flow network) with a source s, a sink t, and a capacity on each edge, find the maximum flow from s to t.

Constraints:

Capacity constraint: Flow on each edge ≤ capacity.
Flow conservation: For every node except s and t, flow in = flow out.

Flow network example:

    s ---10--→ A ---8--→ t
    |          ↑  ↘      ↑
    5          2    4     6
    |          |      ↘   |
    ↓          |       ↘  |
    B ---3--→  C ---6--→ D → t

Maximum flow = maximum total flow pushable from s to t

19.5 Ford-Fulkerson Method

Ford-Fulkerson is not a specific algorithm but a method framework: repeatedly find an augmenting path from s to t in the residual graph, push flow along it, until no augmenting path exists.

Residual graph: For each edge (u, v) with capacity c and flow f:

Forward residual capacity = c - f (how much more can be pushed)
Backward residual capacity = f (how much flow can be "returned")

Augmenting path: A path from s to t in the residual graph.

from collections import deque

def bfs_find_path(residual: List[List[int]], source: int, sink: int, parent: List[int]) -> bool:
    """BFS to find augmenting path in residual graph (Edmonds-Karp)"""
    n = len(residual)
    visited = [False] * n
    visited[source] = True
    queue = deque([source])
    
    while queue:
        u = queue.popleft()
        for v in range(n):
            if not visited[v] and residual[u][v] > 0:
                visited[v] = True
                parent[v] = u
                if v == sink:
                    return True
                queue.append(v)
    
    return False


def edmonds_karp(capacity: List[List[int]], source: int, sink: int) -> int:
    """
    Edmonds-Karp Algorithm (Ford-Fulkerson + BFS)
    
    Args:
        capacity: capacity[u][v] = capacity of edge (u,v)
        source: Source node
        sink: Sink node
    
    Returns:
        Maximum flow value
    """
    n = len(capacity)
    residual = [row[:] for row in capacity]
    
    max_flow = 0
    parent = [-1] * n
    
    while bfs_find_path(residual, source, sink, parent):
        # Find bottleneck capacity along the path
        path_flow = float('inf')
        v = sink
        while v != source:
            u = parent[v]
            path_flow = min(path_flow, residual[u][v])
            v = u
        
        # Update residual graph
        v = sink
        while v != source:
            u = parent[v]
            residual[u][v] -= path_flow  # Decrease forward
            residual[v][u] += path_flow  # Increase backward (allow flow cancellation)
            v = u
        
        max_flow += path_flow
        parent = [-1] * n
    
    return max_flow


# Example
# 4 nodes: 0=source, 3=sink
capacity = [
    [0, 10, 10, 0],  # 0→1: 10, 0→2: 10
    [0, 0, 2, 8],    # 1→2: 2, 1→3: 8
    [0, 0, 0, 10],   # 2→3: 10
    [0, 0, 0, 0]
]

print(edmonds_karp(capacity, 0, 3))  # 18

Edmonds-Karp complexity: O(VE²). Using BFS to find shortest augmenting paths guarantees at most O(VE) augmentations.

19.6 Bipartite Maximum Matching (Hungarian Algorithm)

Bipartite graph: Nodes can be split into two groups L and R where all edges connect an L-node to an R-node.

Maximum matching: Find the maximum number of edges such that each node is covered by at most one edge.

Classic applications: Worker-task assignment, student-course matching, etc.

def hungarian(graph: Dict[int, List[int]], n_left: int, n_right: int) -> int:
    """
    Hungarian Algorithm for Maximum Bipartite Matching
    
    Args:
        graph: Left-side adjacency list, graph[u] = [matchable right-side nodes]
        n_left: Number of left nodes (0 to n_left-1)
        n_right: Number of right nodes (0 to n_right-1)
    
    Returns:
        Maximum matching size
    """
    match_right = [-1] * n_right
    
    def dfs(u: int, visited: List[bool]) -> bool:
        """Try to find a match for left node u"""
        for v in graph.get(u, []):
            if visited[v]:
                continue
            visited[v] = True
            
            # If v is unmatched, or v's current match can find another partner
            if match_right[v] == -1 or dfs(match_right[v], visited):
                match_right[v] = u
                return True
        
        return False
    
    result = 0
    for u in range(n_left):
        visited = [False] * n_right
        if dfs(u, visited):
            result += 1
    
    return result


# Example: 3 workers, 4 tasks
graph = {
    0: [0, 1],   # Worker 0: can do tasks 0, 1
    1: [1, 2],   # Worker 1: can do tasks 1, 2
    2: [2, 3]    # Worker 2: can do tasks 2, 3
}

print(hungarian(graph, 3, 4))  # 3 (optimal: worker0→task0, worker1→task1, worker2→task2)

Core idea of the Hungarian algorithm — augmenting paths:

When finding a match for left node u, if the directly available right node v is already taken, try to make v's current match "yield" (find another match). This "yielding" process recurses, essentially finding an alternating path from an unmatched node, alternating between non-matching and matching edges, ending at another unmatched node — an "augmenting path."

Time complexity: O(V × E). One DFS per left node, each DFS visits at most all edges.

19.7 Complete Application Example

"""
Comprehensive example: Social network analysis
"""

def social_network_analysis():
    """
    Analyze community structure in social networks (SCC = mutual-follow groups)
    """
    # Follow relationships (directed graph)
    follow_graph = {
        0: [1],
        1: [2],
        2: [0, 3],  # 0,1,2 form a mutual-follow cycle
        3: [4],
        4: [3],     # 3,4 mutual-follow pair
        5: [4]      # 5 follows 4, nobody follows 5
    }
    
    sccs = tarjan_scc(follow_graph, 6)
    print("Mutual-follow groups:", sccs)
    # [[3, 4], [0, 2, 1], [5]]
    
    print("Information flow: group{0,1,2} → group{3,4}")
    print("group{5} → group{3,4}")


def task_assignment():
    """
    Task assignment: maximize completed tasks
    """
    developer_skills = {
        0: [0, 1, 2],
        1: [1, 3],
        2: [2, 4],
        3: [3, 4, 5],
        4: [5]
    }
    
    max_tasks = hungarian(developer_skills, 5, 6)
    print(f"Maximum tasks completable simultaneously: {max_tasks}")


social_network_analysis()
task_assignment()

Level 2 · How It Works Under the Hood

19.8 Kosaraju's Algorithm (Two-Pass DFS)

Kosaraju's algorithm is another classic method for finding SCCs, conceptually more intuitive than Tarjan's but requiring two DFS passes.

Core idea:

First DFS: On the original graph, record nodes in DFS finish order (post-order).
Reverse the graph: Reverse all edge directions.
Second DFS: In reverse finish order (topological order), DFS on the reversed graph. Each DFS tree corresponds to one SCC.

Why is it correct? Intuition:

Nodes within an SCC can reach each other in both the original and reversed graph.
Processing in reverse finish order ensures: if SCC_A has an edge to SCC_B (in the original graph), then SCC_A is processed first. In the reversed graph, this edge becomes SCC_B → SCC_A, so processing SCC_A won't "leak into" SCC_B.

def kosaraju_scc(graph: Dict[int, List[int]], n: int) -> List[List[int]]:
    """
    Kosaraju's Algorithm for Strongly Connected Components
    
    Args:
        graph: Directed graph adjacency list
        n: Number of nodes
    
    Returns:
        List of all SCCs
    """
    # Step 1: DFS on original graph, record finish order
    visited = [False] * n
    finish_order = []
    
    def dfs1(u: int):
        visited[u] = True
        for v in graph.get(u, []):
            if not visited[v]:
                dfs1(v)
        finish_order.append(u)
    
    for i in range(n):
        if not visited[i]:
            dfs1(i)
    
    # Step 2: Build reversed graph
    reverse_graph: Dict[int, List[int]] = {i: [] for i in range(n)}
    for u in range(n):
        for v in graph.get(u, []):
            reverse_graph[v].append(u)
    
    # Step 3: DFS on reversed graph in reverse finish order
    visited = [False] * n
    sccs = []
    
    def dfs2(u: int, scc: List[int]):
        visited[u] = True
        scc.append(u)
        for v in reverse_graph[u]:
            if not visited[v]:
                dfs2(v, scc)
    
    for u in reversed(finish_order):
        if not visited[u]:
            scc = []
            dfs2(u, scc)
            sccs.append(scc)
    
    return sccs


# Test
graph = {0: [1], 1: [2], 2: [0, 3], 3: [4], 4: [3]}
print(kosaraju_scc(graph, 5))
# [[0, 1, 2], [3, 4]] — two SCCs

Tarjan vs Kosaraju comparison:

Property	Tarjan	Kosaraju
DFS passes	1	2
Needs reversed graph?	No	Yes
Extra space	Stack + dfn/low arrays	Reversed graph + finish_order
Implementation difficulty	Higher (low value understanding)	Lower (two simple DFS)
Interview recommendation	Yes (more efficient)	Yes (easier to explain)
Output order	Reverse topological	Topological

19.9 Deep Dive into Tarjan's low Value

The precise definition of low[u]: "The smallest dfn reachable from u's subtree through exactly one back edge (or cross edge to a stack node)."

Common confusion:

# For tree edge (u → v):
low[u] = min(low[u], low[v])   # Earliest ancestor reachable through subtree v

# For back edge (u → v), v on stack:
low[u] = min(low[u], dfn[v])   # Note: dfn[v], NOT low[v]!
# Why dfn[v] and not low[v]?
# Because the back edge goes directly to v, reaching v itself (dfn[v])
# Using low[v] would transitively "pass through" multiple back edges,
# potentially incorrectly merging nodes from different SCCs

However, in practice, using low[u] = min(low[u], low[v]) for back edges is also correct (in most cases) — because if v is on the stack and can reach an earlier node via v, then v and u must be in the same SCC. Many implementations (including our earlier code) use dfn[v] because it's Tarjan's original definition and theoretically more precise.

Execution trace:

Graph: 0→1→2→0, 2→3

DFS starting at node 0:
  Visit 0: dfn[0]=1, low[0]=1, stack=[0]
    Visit 1: dfn[1]=2, low[1]=2, stack=[0,1]
      Visit 2: dfn[2]=3, low[2]=3, stack=[0,1,2]
        Neighbor 0: on stack, low[2] = min(3, dfn[0]) = min(3,1) = 1
        Neighbor 3: unvisited
          Visit 3: dfn[3]=4, low[3]=4, stack=[0,1,2,3]
            No neighbors
            dfn[3]==low[3] → SCC={3}, pop 3, stack=[0,1,2]
        low[2] = min(low[2], low[3]) = min(1, 4) = 1
      Return to 1: low[1] = min(low[1], low[2]) = min(2, 1) = 1
    Return to 0: low[0] = min(low[0], low[1]) = min(1, 1) = 1
    dfn[0]==low[0] → SCC={2,1,0}, pop until 0

Result: SCC1={3}, SCC2={0,1,2}

19.10 The Max-Flow Min-Cut Theorem

Theorem (Ford & Fulkerson, 1956): In any flow network, the value of the maximum flow equals the capacity of the minimum cut.

Cut: Partition nodes into S (containing s) and T (containing t). Cut capacity = sum of capacities of all edges from S to T.

Intuitive explanation:

Max flow ≤ any cut's capacity (flow cannot exceed the bottleneck).
When Ford-Fulkerson terminates, s cannot reach t in the residual graph. At this point, define S = {nodes reachable from s in the residual graph}, T = V \ S. All original edges from S to T are saturated (otherwise residual capacity > 0 and s could reach a node in T). This cut's capacity exactly equals the max flow.

Applications of Min-Cut:

Network security: Minimum number of communication links to cut to disconnect s from t.
Image segmentation: Classify pixels as foreground/background minimizing segmentation cost.
Project selection: Select a set of projects to maximize net revenue (equivalent to dual of min-cut).

def find_min_cut(capacity: List[List[int]], source: int, sink: int):
    """
    Find min-cut: first compute max flow, then BFS from source in residual graph
    """
    n = len(capacity)
    residual = [row[:] for row in capacity]
    parent = [-1] * n
    
    # Run Edmonds-Karp to get final residual graph
    while bfs_find_path(residual, source, sink, parent):
        path_flow = float('inf')
        v = sink
        while v != source:
            u = parent[v]
            path_flow = min(path_flow, residual[u][v])
            v = u
        v = sink
        while v != source:
            u = parent[v]
            residual[u][v] -= path_flow
            residual[v][u] += path_flow
            v = u
        parent = [-1] * n
    
    # BFS from source in residual graph to find S set
    visited = [False] * n
    queue = deque([source])
    visited[source] = True
    while queue:
        u = queue.popleft()
        for v in range(n):
            if not visited[v] and residual[u][v] > 0:
                visited[v] = True
                queue.append(v)
    
    S = {i for i in range(n) if visited[i]}
    T = {i for i in range(n) if not visited[i]}
    
    # Min-cut edges = S→T edges with original capacity > 0
    cut_edges = []
    for u in S:
        for v in T:
            if capacity[u][v] > 0:
                cut_edges.append((u, v, capacity[u][v]))
    
    return S, T, cut_edges

19.11 Dinic's Algorithm

Dinic's algorithm is more efficient than Edmonds-Karp. Key improvement: instead of finding one augmenting path at a time, first use BFS to build a level graph, then use DFS on the level graph to find multiple augmenting paths at once.

def dinic(capacity: List[List[int]], source: int, sink: int) -> int:
    """
    Dinic's Algorithm for Maximum Flow
    
    Time complexity: O(V²E), O(E√V) for unit-capacity graphs
    """
    n = len(capacity)
    residual = [row[:] for row in capacity]
    
    def bfs_level() -> bool:
        """BFS to build level graph"""
        nonlocal level
        level = [-1] * n
        level[source] = 0
        queue = deque([source])
        while queue:
            u = queue.popleft()
            for v in range(n):
                if level[v] == -1 and residual[u][v] > 0:
                    level[v] = level[u] + 1
                    queue.append(v)
        return level[sink] != -1
    
    def dfs_blocking(u: int, pushed: int) -> int:
        """DFS to find blocking flow in level graph"""
        if u == sink:
            return pushed
        
        while iter_ptr[u] < n:
            v = iter_ptr[u]
            if level[v] == level[u] + 1 and residual[u][v] > 0:
                flow = dfs_blocking(v, min(pushed, residual[u][v]))
                if flow > 0:
                    residual[u][v] -= flow
                    residual[v][u] += flow
                    return flow
            iter_ptr[u] += 1
        
        return 0
    
    max_flow = 0
    level = [-1] * n
    
    while bfs_level():
        iter_ptr = [0] * n
        while True:
            flow = dfs_blocking(source, float('inf'))
            if flow == 0:
                break
            max_flow += flow
    
    return max_flow

Dinic vs Edmonds-Karp complexity comparison:

Algorithm	Time Complexity	Use Case
Edmonds-Karp	O(VE²)	General
Dinic	O(V²E)	General (faster)
Dinic (unit capacity)	O(E√V)	Bipartite matching etc.

Why is Dinic faster? After each BFS phase, the shortest path length from source to sink increases by at least 1 (because the blocking flow blocks all paths at the current level). Since the shortest path is at most V-1, there are at most V-1 BFS phases. Each phase (BFS + blocking flow) takes O(VE), giving O(V²E) total.

19.12 Bipartite Matching from a Network Flow Perspective

The Hungarian algorithm essentially finds augmenting paths in a bipartite graph — exactly the same as the augmenting path method in network flow! In fact, maximum bipartite matching directly reduces to maximum flow:

Construct flow network:
1. Add super-source s, connect to all left nodes (capacity 1)
2. Add super-sink t, all right nodes connect to t (capacity 1)
3. Original L→R edges get capacity 1

Maximum flow = Maximum matching

    s → L1 → R1 → t
    s → L2 → R2 → t
    s → L3 → R3 → t
    (cross-connections omitted)

König's Theorem: In a bipartite graph,

Maximum matching = Minimum vertex cover
Maximum independent set = n - Maximum matching

This is the classic manifestation of LP duality in combinatorial optimization.

19.13 Applications of the Condensation DAG

After contracting SCCs, the resulting DAG enables:

Reachability queries: Use topological sort + DP on the DAG to determine inter-SCC reachability.
Longest path: Longest path on DAG (critical path method).
2-SAT: If variable x and its negation ¬x are in the same SCC, no solution exists.

def scc_dag(graph: Dict[int, List[int]], n: int):
    """
    Condensation: Transform original graph into DAG of SCCs
    """
    sccs = tarjan_scc(graph, n)
    
    # Which SCC each node belongs to
    scc_id = [0] * n
    for i, scc in enumerate(sccs):
        for node in scc:
            scc_id[node] = i
    
    # Build DAG (edges between SCCs)
    num_sccs = len(sccs)
    dag = {i: set() for i in range(num_sccs)}
    
    for u in range(n):
        for v in graph.get(u, []):
            if scc_id[u] != scc_id[v]:
                dag[scc_id[u]].add(scc_id[v])
    
    dag = {k: list(v) for k, v in dag.items()}
    return sccs, scc_id, dag


# 2-SAT application
def solve_2sat(n: int, clauses: List[Tuple[int, int]]) -> List[bool]:
    """
    2-SAT Solver
    
    Variables numbered 0 to n-1
    Positive literal x → number x, negative literal ¬x → number x+n
    
    clauses: [(a, b), ...] representing (a ∨ b)
    where a, b are literal numbers
    """
    total = 2 * n
    graph = {i: [] for i in range(total)}
    
    def neg(x):
        return x + n if x < n else x - n
    
    for a, b in clauses:
        graph[neg(a)].append(b)   # ¬a → b
        graph[neg(b)].append(a)   # ¬b → a
    
    sccs = tarjan_scc(graph, total)
    scc_id = [0] * total
    for i, scc in enumerate(sccs):
        for node in scc:
            scc_id[node] = i
    
    # If x and ¬x are in the same SCC, unsatisfiable
    for x in range(n):
        if scc_id[x] == scc_id[x + n]:
            return None
    
    # Assignment: if scc_id[x] > scc_id[¬x], then x = True
    assignment = [False] * n
    for x in range(n):
        assignment[x] = scc_id[x] > scc_id[x + n]
    
    return assignment

Level 3 · What the Specifications Say

19.14 Tarjan's 1972 Paper

Paper: Robert E. Tarjan, "Depth-First Search and Linear Graph Algorithms", SIAM Journal on Computing, 1972.

This paper is a milestone in graph algorithms. It systematically demonstrated the power of DFS. Tarjan presented:

Strongly connected components algorithm (the core of this chapter)
Biconnected components algorithm (identifying articulation points and bridges)
Systematic classification of DFS edges: tree edges, back edges, forward edges, cross edges

Tarjan's key insight: The DFS search tree has excellent structural properties. In a directed graph's DFS tree, all non-tree edges are either back edges (to ancestors), forward edges (to descendants), or cross edges (to fully processed subtrees). Exploiting these properties, many graph problems can be solved in O(V+E) time.

Mathematical definition of the low value (Tarjan's original):

low(v) = min(
    dfn(v),
    min{dfn(w) | (u, w) is a back edge, u is a descendant of v (including v itself)}
)

That is, low(v) is the earliest ancestor's DFS number reachable from v's subtree through at most one back edge.

Tarjan's academic contributions: Robert Tarjan (1948-) is one of the giants of theoretical computer science. His contributions include:

Union-Find O(α(n)) analysis (1975)
Fibonacci heaps (with Fredman, 1984)
Splay trees (with Sleator, 1985)
Strongly connected components algorithm (1972)
Offline LCA algorithm (1979)

He received the Turing Award in 1986 (jointly with John Hopcroft), recognizing their fundamental contributions to algorithms and data structures.

19.15 The Max-Flow Min-Cut Theorem (Ford & Fulkerson 1956)

Paper: L.R. Ford Jr. & D.R. Fulkerson, "Maximal Flow through a Network", Canadian Journal of Mathematics, 1956.

Context: This theorem has Cold War origins. Ford and Fulkerson worked at RAND Corporation, and the theorem was initially used to analyze the bottleneck capacity of the Soviet railway network — determining how many railway lines needed to be cut to cripple the Soviet transport system.

Formal statement:

Let f* be the maximum flow in a network, and C* be the minimum cut capacity. Then f* = C*.

Proof framework:

Weak duality (f ≤ C for any flow f and cut C):
- Any flow from s to t must completely "pass through" any s-t cut.
- The flow through a cut ≤ the cut's capacity (each cut edge's flow ≤ its capacity).
- Therefore |f| ≤ cap(S, T) for any cut (S, T).
Strong duality (there exist f, C such that f = C):
- When Ford-Fulkerson terminates, s cannot reach t in the residual graph.
- Define S = {nodes reachable from s in residual graph}, T = V \ S.
- All original edges from S to T are saturated (residual capacity = 0).
- All original edges from T to S have zero flow (otherwise backward residual capacity would let s reach nodes in T).
- So |f| = cap(S, T). Combined with weak duality: f is max flow, (S,T) is min cut.

Termination issue of the Ford-Fulkerson method:

If all capacities are integers, each augmentation increases flow by at least 1, so at most |f*| augmentations occur and the algorithm terminates. But if capacities are irrational, Ford-Fulkerson (using DFS for augmenting paths) may not terminate! Zwick (1995) constructed such counterexamples.

Edmonds-Karp (using BFS) guarantees termination with O(VE²) complexity, independent of capacity magnitudes.

19.16 Network Flow in Practice

Application 1: Bipartite Matching (discussed earlier)

Application 2: Project Selection Problem

Given n projects, each with profit p_i (positive or negative). Projects have dependencies: choosing project i requires choosing project j. Find maximum net profit.

Reduction to min-cut:

Source s connects to positive-profit projects (capacity = profit)
Negative-profit projects connect to sink t (capacity = |profit|)
Dependency i→j gets capacity ∞
Maximum net profit = sum of all positive profits - min cut

Application 3: Image Segmentation

Classify pixels as foreground/background. Each pixel has a "preference" for foreground and background, and adjacent pixels in different classes incur a "penalty."

Reduction to min-cut:

s → pixel: capacity = foreground preference
pixel → t: capacity = background preference
Between adjacent pixels: capacity = different-class penalty

Application 4: Baseball Elimination

Determine whether a team has been mathematically eliminated (cannot win the league even by winning all remaining games). This can be determined using max flow!

19.17 History of the Hungarian Algorithm

The Hungarian Algorithm was first proposed by Harold Kuhn in 1955, but his algorithm was based on work by Hungarian mathematicians Dénes König (1931) and Jenő Egerváry (1931). Kuhn named it to honor these Hungarian mathematicians.

Kuhn's original algorithm solved the weighted bipartite optimal matching (Assignment Problem) in O(n³) time. The simplified version we showed in Level 1 (unweighted maximum matching) is the O(VE) augmenting-path algorithm.

Hopcroft-Karp algorithm (1973) improved unweighted bipartite maximum matching to O(E√V) — its core idea is nearly identical to Dinic's algorithm (level graph + multiple augmenting paths).

König's Theorem (1931): In a bipartite graph,

Maximum matching = Minimum vertex cover
Maximum independent set = n - Maximum matching

This is a classic manifestation of LP duality in combinatorial optimization.

19.18 Theoretical Comparison of SCC Algorithms

Algorithm	Time	Space	DFS Passes	Published
Tarjan	O(V+E)	O(V)	1	1972
Kosaraju-Sharir	O(V+E)	O(V+E)	2 + reversed graph	1981 (Sharir)
Path-based (Gabow)	O(V+E)	O(V)	1	2000
Dijkstra (partition)	O(V+E)	O(V)	1	1976

Kosaraju's algorithm was published in Sharir's 1981 technical report, but Kosaraju reportedly described it in a 1978 lecture (unpublished).

Path-based algorithm (Gabow 2000) uses two stacks instead of Tarjan's low-value array, conceptually cleaner but with negligible practical performance difference.

Level 4 · Edge Cases and Pitfalls

19.19 Interview Frequency of These Algorithms

Let's talk data. The following frequency analysis is based on LeetCode tag statistics and mainstream interview experience sharing:

High frequency (must know):

Dijkstra: ~50+ LeetCode problems, commonly tested. Variants (minimax, multi-source) are hidden test points.
Topological sort: ~30+ problems, extremely high frequency.
Union-Find (MST-related): ~70+ problems, high frequency.
BFS/DFS: ubiquitous.

Medium frequency (should know):

Bellman-Ford/SPFA: ~15 problems. Required when problem mentions "negative weights."
Kruskal/Prim (MST): ~10 problems. Usually direct template application.
Bipartite graph detection: ~10 problems.

Low frequency (awareness level, rarely directly tested in interviews):

Strongly connected components (Tarjan/Kosaraju): ~5 LeetCode problems. Almost never in interviews, but concepts used in competitive programming and system design.
Maximum flow/network flow: ~5 LeetCode problems. Extremely rare in interviews, but the ultimate test of modeling ability.
Hungarian algorithm: Almost never in interviews. But the "can this be modeled as matching?" mindset is valuable.
Articulation points/bridges: ~5 problems (e.g., LeetCode 1192). Occasionally in system design for network vulnerability analysis.

Conclusions:

Dijkstra + MST (Kruskal) are interview essentials — must be writable from memory in 20 minutes.
SCC and network flow are "knowledge reserves" — direct interview probability < 5%.
Max-flow min-cut theorem thinking is occasionally useful in system design interviews (bottleneck analysis).
If preparing for Google/Meta hard interviews or competitive programming, SCC and network flow are worth deep study.

19.20 When Do You Need to Learn These

Recommended learning priority:

Priority 1 (Interview essentials):
  - Dijkstra (and variants)
  - BFS/DFS
  - Topological sort
  - Union-Find + Kruskal
  
Priority 2 (Advanced bonus):
  - Bellman-Ford (negative weight scenarios)
  - Floyd-Warshall (all-pairs)
  - Bipartite detection
  - Tarjan for bridges/articulation points

Priority 3 (Advanced/competitive):
  - SCC (Tarjan/Kosaraju)
  - 2-SAT
  - Maximum flow (Dinic)
  - Bipartite maximum matching
  
Priority 4 (Theoretical/PhD-level):
  - Minimum cost maximum flow
  - General graph matching (Blossom algorithm)
  - Gomory-Hu tree
  - Maximum weight closure

Signals: When do you need these algorithms?

Problem Characteristic	Likely Algorithm
"Groups that can mutually reach each other"	SCC
"Removal disconnects the graph"	Articulation points/bridges
"Maximum amount that can be transmitted/assigned"	Maximum flow
"Optimal one-to-one correspondence"	Bipartite matching
"Interdependent select/don't-select decisions"	2-SAT or network flow

19.21 LeetCode 1192: Critical Connections in a Network

Problem: Given n servers and connections (undirected graph), find all "critical connections" — connections whose removal disconnects some servers. I.e., find all bridges.

def criticalConnections(n: int, connections: List[List[int]]) -> List[List[int]]:
    """
    LeetCode 1192: Critical Connections in a Network
    
    Direct application of Tarjan's bridge-finding algorithm
    """
    graph = {i: [] for i in range(n)}
    for u, v in connections:
        graph[u].append(v)
        graph[v].append(u)
    
    dfn = [0] * n
    low = [0] * n
    timer = [0]
    bridges = []
    
    def dfs(u: int, parent: int):
        timer[0] += 1
        dfn[u] = low[u] = timer[0]
        
        for v in graph[u]:
            if v == parent:
                continue
            if dfn[v] == 0:
                dfs(v, u)
                low[u] = min(low[u], low[v])
                if low[v] > dfn[u]:
                    bridges.append([u, v])
            else:
                low[u] = min(low[u], dfn[v])
    
    dfs(0, -1)
    return bridges


# Test
n = 4
connections = [[0,1],[1,2],[2,0],[1,3]]
print(criticalConnections(n, connections))  # [[1, 3]]
# Explanation: 0-1-2 forms a cycle, removing any one doesn't affect connectivity
# But 1-3 is a bridge — removing it isolates node 3

Interview note: This problem may hit recursion depth limits (n can reach 10^5). In production, use the iterative version or increase the recursion limit (sys.setrecursionlimit).

19.22 Real Engineering Applications

Scenario 1: Microservice Dependency Analysis (SCC)

def analyze_microservice_dependencies(services: List[str], calls: List[Tuple[str, str]]):
    """
    Analyze circular dependencies among microservices.
    If a group of services forms an SCC (mutual calls), they should be
    merged or decoupled.
    """
    name_to_id = {name: i for i, name in enumerate(services)}
    n = len(services)
    
    graph = {i: [] for i in range(n)}
    for caller, callee in calls:
        graph[name_to_id[caller]].append(name_to_id[callee])
    
    sccs = tarjan_scc(graph, n)
    
    # Find groups with circular dependencies (SCC size > 1)
    circular_groups = []
    for scc in sccs:
        if len(scc) > 1:
            group_names = [services[i] for i in scc]
            circular_groups.append(group_names)
    
    return circular_groups

# Example
services = ["auth", "user", "order", "payment", "notification"]
calls = [
    ("auth", "user"),
    ("user", "auth"),              # Circular!
    ("order", "payment"),
    ("payment", "notification"),
    ("notification", "order"),     # Circular: order→payment→notification→order
]

groups = analyze_microservice_dependencies(services, calls)
print("Circular dependency groups:", groups)
# [["order", "notification", "payment"], ["auth", "user"]]

Scenario 2: CDN Bandwidth Allocation (Network Flow)

def cdn_bandwidth_allocation():
    """
    Simplified CDN traffic allocation problem.
    Multiple origin servers → CDN nodes → user groups
    Goal: maximize total bandwidth
    """
    # Nodes: 0=super-source, 1-3=origin servers, 4-6=CDN nodes, 7-9=user groups, 10=super-sink
    n = 11
    capacity = [[0] * n for _ in range(n)]
    
    # Origin server capacities
    capacity[0][1] = 100  # Server 1: 100 Mbps
    capacity[0][2] = 80   # Server 2: 80 Mbps
    capacity[0][3] = 60   # Server 3: 60 Mbps
    
    # Server→CDN link capacities
    capacity[1][4] = 50; capacity[1][5] = 70
    capacity[2][4] = 40; capacity[2][6] = 60
    capacity[3][5] = 30; capacity[3][6] = 50
    
    # CDN→User group link capacities
    capacity[4][7] = 60; capacity[4][8] = 30
    capacity[5][7] = 40; capacity[5][9] = 50
    capacity[6][8] = 45; capacity[6][9] = 35
    
    # User group demands
    capacity[7][10] = 80
    capacity[8][10] = 70
    capacity[9][10] = 60
    
    max_bw = edmonds_karp(capacity, 0, 10)
    print(f"Maximum allocatable bandwidth: {max_bw} Mbps")

cdn_bandwidth_allocation()

19.23 Common Pitfalls and Summary

Pitfall 1: Handling parallel edges in Tarjan's

# In undirected graphs, if edge (u,v) appears twice (parallel edges),
# you cannot simply skip with "if v == parent: continue"
# because only ONE traversal should be skipped (the edge we came from),
# the second parallel edge should be considered

# Correct approach: use edge IDs to avoid double-counting
def dfs_with_edge_id(u, parent_edge_id):
    for v, edge_id in graph[u]:
        if edge_id == parent_edge_id:
            continue  # Only skip the specific edge we came from
        # ...

Pitfall 2: Recursion depth limits

import sys
sys.setrecursionlimit(200000)  # For graphs with n ≤ 10^5

# Or use iterative implementation (recommended for production)

Pitfall 3: Adjacency matrix limitation of Edmonds-Karp

# Adjacency matrix Edmonds-Karp uses O(V²) space
# If V = 10^5, that's 10^10 bytes — infeasible!
# Large-scale network flow must use adjacency list implementation

# For interviews with V ≤ a few hundred, adjacency matrix is clearest
# For V ≤ 10^4-10^5, must use adjacency list + forward star

Pitfall 4: Confusing SCC with connected components

# Undirected graph connected components: simple BFS/DFS
# Directed graph strongly connected components: need Tarjan/Kosaraju
# Directed graph weakly connected components: connected components ignoring edge direction

# In interviews, CONFIRM whether it's directed or undirected!

Algorithm quick reference:

Algorithm	Problem	Time	Interview Frequency
Tarjan (SCC)	Directed graph SCCs	O(V+E)	Low
Tarjan (bridges/APs)	Undirected critical edges/nodes	O(V+E)	Medium-low
Kosaraju	Directed graph SCCs	O(V+E)	Low
Edmonds-Karp	Maximum flow	O(VE²)	Very low
Dinic	Maximum flow	O(V²E)	Very low
Hungarian	Bipartite max matching	O(VE)	Low
Hopcroft-Karp	Bipartite max matching	O(E√V)	Very low

Core takeaways:

SCC is the key to understanding directed graph structure — contraction yields a DAG, simplifying problems.
Network flow is the Swiss army knife of combinatorial optimization — many problems that "don't look like flow" reduce to network flow.
Max flow = Min cut — this duality is central to understanding network flow applications.
These algorithms are rarely tested in interviews, but knowing they exist matters — some hard problems can only be efficiently solved by modeling as network flow.
In real engineering, SCC is most useful — circular dependency detection, deadlock analysis, compiler optimization all use it.

Rate this chapter

4.6 / 5 (12 ratings)