Chapter 18

Shortest Paths and Minimum Spanning Trees

Chapter 18: Shortest Paths and Minimum Spanning Trees

Graph traversal (BFS/DFS) tells us "which nodes are reachable," but in the real world, we care about cost: What is the fastest route from A to B? What is the cheapest fiber network connecting all cities? These two questions correspond to the two most classic optimization problems in graph theory — Shortest Path and Minimum Spanning Tree (MST).

These problems appear similar (both find "optimal" structures on weighted graphs), but are fundamentally different: shortest path is path optimization from source to target, while MST is edge set optimization for global connectivity. Their algorithmic design philosophies are also distinct — shortest path relies on relaxation to progressively approach the optimal solution, while MST relies on greedy strategies to progressively build the optimal solution.

In this chapter, we will thoroughly master five core algorithms: Dijkstra (single-source shortest path, non-negative weights), Bellman-Ford (single-source shortest path, allows negative weights), Floyd-Warshall (all-pairs shortest path), Prim (MST, vertex-centric), and Kruskal (MST, edge-centric). For each algorithm, we will answer: What problem does it solve? Why is it correct? What is its time complexity? When should you choose it?

Level 1 · What You Need to Know

18.1 Problem Definitions

Single-Source Shortest Path (SSSP): Given a weighted directed graph G=(V, E) and a source node s, find the shortest path distances from s to all other nodes.

All-Pairs Shortest Path (APSP): Find the shortest path distances between all pairs of nodes.

Minimum Spanning Tree (MST): Given a weighted undirected connected graph G=(V, E), find a tree containing all nodes that minimizes the sum of edge weights.

Weighted directed graph example (SSSP):

    A --2--> B --3--> D
    |        ^        ^
    4        1        2
    |        |        |
    v        |        |
    C -------+   E ---+

Weighted undirected connected graph example (MST):

    A ---4--- B
    |  \      |  \
    1    3    6    5
    |      \  |      \
    C ---2--- D ---7--- E

Key differences:

Shortest path: Can have cycles, can have negative-weight edges (Bellman-Ford), goal is a path.
MST: Defined only on undirected connected graphs, goal is a set of edges (a tree).

18.2 Dijkstra's Algorithm

Applicability: Graphs with non-negative edge weights (undirected/directed).

Core idea: Maintain a set S of nodes whose shortest distances have been finalized. Each iteration, select the node u with the smallest distance from among unfinalized nodes, add it to S, then use u's outgoing edges to relax its neighbors' distances.

Relaxation operation: If dist[u] + w(u,v) < dist[v], update dist[v] = dist[u] + w(u,v).

import heapq
from typing import List, Tuple, Dict

def dijkstra(graph: Dict[int, List[Tuple[int, int]]], source: int, n: int) -> List[float]:
    """
    Dijkstra's Algorithm - Priority Queue Implementation
    
    Args:
        graph: Adjacency list, graph[u] = [(v, weight), ...]
        source: Source node
        n: Number of nodes (0 to n-1)
    
    Returns:
        dist: dist[i] = shortest distance from source to node i
    """
    dist = [float('inf')] * n
    dist[source] = 0
    
    # Priority queue: (distance, node)
    pq = [(0, source)]
    
    while pq:
        d, u = heapq.heappop(pq)
        
        # Key optimization: skip if outdated entry
        if d > dist[u]:
            continue
        
        # Relax all neighbors
        for v, weight in graph[u]:
            new_dist = dist[u] + weight
            if new_dist < dist[v]:
                dist[v] = new_dist
                heapq.heappush(pq, (new_dist, v))
    
    return dist

Usage example:

# Build graph
graph = {
    0: [(1, 2), (2, 4)],  # 0->1 weight 2, 0->2 weight 4
    1: [(2, 1), (3, 7)],  # 1->2 weight 1, 1->3 weight 7
    2: [(3, 3)],           # 2->3 weight 3
    3: []
}

dist = dijkstra(graph, 0, 4)
print(dist)  # [0, 2, 3, 6]
# Explanation: 0->0=0, 0->1=2, 0->1->2=3, 0->1->2->3=6

Time complexity: O((V + E) log V). Each node is enqueued at most V times (far fewer in practice), each enqueue/dequeue is O(log V).

Common mistakes:

# Mistake 1: Using Dijkstra on graphs with negative edges
graph = {0: [(1, -3), (2, 2)], 1: [(2, 1)], 2: []}
# Dijkstra gives dist[2] = 2 (direct 0->2)
# But actual shortest is 0->1->2 = -3+1 = -2
# Because once Dijkstra finalizes a node, it never updates it!

# Mistake 2: Forgetting to skip stale queue entries
def dijkstra_wrong(graph, source, n):
    dist = [float('inf')] * n
    dist[source] = 0
    pq = [(0, source)]
    while pq:
        d, u = heapq.heappop(pq)
        # Missing: if d > dist[u]: continue
        # Results in redundant processing, correct output but much slower
        for v, w in graph[u]:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                heapq.heappush(pq, (dist[v], v))
    return dist

18.3 Bellman-Ford Algorithm

Applicability: Graphs with negative-weight edges. Can detect negative cycles reachable from the source.

Core idea: Repeatedly relax all edges V-1 times. Since a shortest path contains at most V-1 edges (in the acyclic case), V-1 rounds of relaxation are guaranteed to find all shortest paths.

def bellman_ford(edges: List[Tuple[int, int, int]], source: int, n: int) -> Tuple[List[float], bool]:
    """
    Bellman-Ford Algorithm
    
    Args:
        edges: Edge list [(u, v, weight), ...]
        source: Source node
        n: Number of nodes
    
    Returns:
        (dist, has_negative_cycle): Distance array and whether a negative cycle exists
    """
    dist = [float('inf')] * n
    dist[source] = 0
    
    # V-1 rounds of relaxation
    for i in range(n - 1):
        updated = False
        for u, v, w in edges:
            if dist[u] != float('inf') and dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                updated = True
        # Early termination: if no updates in a round, already converged
        if not updated:
            break
    
    # V-th round: detect negative cycles
    for u, v, w in edges:
        if dist[u] != float('inf') and dist[u] + w < dist[v]:
            return dist, True  # Negative cycle exists
    
    return dist, False

Why are V-1 rounds sufficient? If the shortest path uses k edges, it will be correctly computed by round k. A simple path uses at most V-1 edges (visiting all V nodes), so V-1 rounds suffice.

Time complexity: O(VE). V-1 rounds, each traversing all E edges.

Usage example:

edges = [
    (0, 1, 4),
    (0, 2, 5),
    (1, 2, -3),  # Negative weight edge
    (2, 3, 4),
    (3, 1, -2),  # Note: 1->2->3->1 forms cycle, weight = -3+4-2 = -1 < 0, negative cycle!
]

dist, has_neg_cycle = bellman_ford(edges, 0, 4)
if has_neg_cycle:
    print("Graph contains a negative cycle, shortest paths undefined")
else:
    print(dist)

18.4 Floyd-Warshall Algorithm

Applicability: All-pairs shortest paths. Allows negative edges but no negative cycles.

Core idea: Dynamic programming. Define dp[k][i][j] = shortest path from i to j using only intermediate nodes numbered ≤ k. Transition:

dp[k][i][j] = min(dp[k-1][i][j], dp[k-1][i][k] + dp[k-1][k][j])

Either don't go through node k (inherit dp[k-1][i][j]), or go through k (concatenate i→k and k→j paths).

def floyd_warshall(n: int, edges: List[Tuple[int, int, int]]) -> List[List[float]]:
    """
    Floyd-Warshall All-Pairs Shortest Paths
    
    Args:
        n: Number of nodes
        edges: Edge list [(u, v, weight), ...]
    
    Returns:
        dist: dist[i][j] = shortest distance from i to j
    """
    INF = float('inf')
    dist = [[INF] * n for _ in range(n)]
    
    # Initialization
    for i in range(n):
        dist[i][i] = 0
    for u, v, w in edges:
        dist[u][v] = min(dist[u][v], w)  # Handle parallel edges
    
    # DP: progressively add intermediate nodes
    for k in range(n):
        for i in range(n):
            for j in range(n):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]
    
    return dist

Time complexity: O(V³). Space-optimized to O(V²) (in-place updates).

Use cases:

Small number of nodes (V ≤ a few hundred) but need all-pairs shortest paths
Dense graphs (adjacency matrix representation more convenient)
Need to detect negative cycles (if dist[i][i] < 0, there's a negative cycle through i)

18.5 Minimum Spanning Tree: Prim's Algorithm

Applicability: Weighted undirected connected graphs.

Core idea: Starting from any node, repeatedly select the minimum-weight edge connecting a node "in the tree" to a node "not in the tree," and add it to the MST. This is a greedy strategy.

def prim(graph: Dict[int, List[Tuple[int, int]]], n: int) -> Tuple[int, List[Tuple[int, int, int]]]:
    """
    Prim's Algorithm - Priority Queue Implementation
    
    Args:
        graph: Undirected graph adjacency list, graph[u] = [(v, weight), ...]
        n: Number of nodes
    
    Returns:
        (total_weight, mst_edges): MST total weight and edge list
    """
    visited = [False] * n
    mst_edges = []
    total_weight = 0
    
    # (weight, target_node, source_node)
    pq = [(0, 0, -1)]  # Start from node 0
    
    while pq and len(mst_edges) < n - 1:
        weight, u, parent = heapq.heappop(pq)
        
        if visited[u]:
            continue
        
        visited[u] = True
        total_weight += weight
        if parent != -1:
            mst_edges.append((parent, u, weight))
        
        for v, w in graph[u]:
            if not visited[v]:
                heapq.heappush(pq, (w, v, u))
    
    return total_weight, mst_edges

Time complexity: O((V + E) log V) (binary heap), O(E + V log V) (Fibonacci heap).

18.6 Minimum Spanning Tree: Kruskal's Algorithm

Core idea: Sort all edges by weight. Consider each edge from smallest to largest. If adding this edge doesn't form a cycle (checked using Union-Find), add it to the MST.

class UnionFind:
    """Union-Find with path compression + union by rank"""
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x: int, y: int) -> bool:
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False  # Already connected, would form cycle
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1
        return True


def kruskal(n: int, edges: List[Tuple[int, int, int]]) -> Tuple[int, List[Tuple[int, int, int]]]:
    """
    Kruskal's Algorithm
    
    Args:
        n: Number of nodes
        edges: Edge list [(u, v, weight), ...]
    
    Returns:
        (total_weight, mst_edges): MST total weight and edge list
    """
    # Sort by weight
    edges.sort(key=lambda e: e[2])
    
    uf = UnionFind(n)
    mst_edges = []
    total_weight = 0
    
    for u, v, w in edges:
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            total_weight += w
            if len(mst_edges) == n - 1:
                break
    
    return total_weight, mst_edges

Time complexity: O(E log E), i.e., O(E log V) (since E ≤ V², log E = O(log V)).

Prim vs Kruskal quick selection guide:

Scenario	Recommended	Reason
Dense graph (E ≈ V²)	Prim	No need to sort all edges
Sparse graph (E ≈ V)	Kruskal	Low sorting cost, simple Union-Find
Already have Union-Find	Kruskal	More concise code
Need to add nodes online	Prim	Easy to extend from current tree
Interview whiteboard	Kruskal	Shorter to implement, fewer bugs

18.7 Complete Example: Building from Scratch

"""
Comprehensive example: A city road network
"""

def demo():
    # City network: 6 cities, bidirectional roads
    # Edges: (cityA, cityB, distance)
    roads = [
        (0, 1, 7), (0, 2, 9), (0, 5, 14),
        (1, 2, 10), (1, 3, 15),
        (2, 3, 11), (2, 5, 2),
        (3, 4, 6),
        (4, 5, 9)
    ]
    n = 6
    
    # 1. Single-source shortest path (Dijkstra)
    graph = {i: [] for i in range(n)}
    for u, v, w in roads:
        graph[u].append((v, w))
        graph[v].append((u, w))  # Undirected graph
    
    dist = dijkstra(graph, 0, n)
    print("Shortest distances from city 0:", dist)
    # [0, 7, 9, 20, 20, 11]
    
    # 2. All-pairs shortest path (Floyd-Warshall)
    directed_edges = []
    for u, v, w in roads:
        directed_edges.append((u, v, w))
        directed_edges.append((v, u, w))
    
    all_dist = floyd_warshall(n, directed_edges)
    print("Shortest distance from city 2 to city 4:", all_dist[2][4])
    
    # 3. Minimum spanning tree (Kruskal)
    total, mst = kruskal(n, roads[:])  # Pass copy since sort is in-place
    print(f"Minimum total road length to connect all cities: {total}")
    print("Selected roads:", mst)

demo()

Level 2 · How It Works Under the Hood

18.8 Correctness Proof of Dijkstra's Algorithm

Theorem: For graphs with non-negative edge weights, Dijkstra's algorithm correctly computes the shortest distances to all nodes reachable from the source.

Proof (by induction):

Let S be the set of nodes whose shortest distances have been finalized. We prove: each time a node u is added to S, dist[u] is indeed the shortest distance from s to u.

Base case: When source s is added to S, dist[s] = 0, which is correct.

Inductive step: Assume all nodes in S have correctly computed distances. Let u be the next selected node (the unfinalized node with smallest dist value).

Proof by contradiction: Suppose there exists a shorter path P: s → ... → x → y → ... → u, where x ∈ S, y ∉ S.

Since y is the first node on P not in S, the length of s → ... → x → y ≤ total length of P.
Since all edge weights are non-negative, length of s → ... → x → y ≤ length of s → ... → x → y → ... → u.
By inductive hypothesis, dist[x] is correct, so when processing x, y was relaxed: dist[y] ≤ dist[x] + w(x,y) = length of s→...→x→y.
But u is the unfinalized node with minimum dist, so dist[u] ≤ dist[y].
Combining: dist[u] ≤ dist[y] ≤ length of P < dist[u]. Contradiction!

Therefore dist[u] is the shortest distance. ∎

Why do negative edges break correctness? If there's a negative edge on the path (y, ..., u), then the length of "s → ... → x → y → ... → u" could be less than "s → ... → x → y" (later edges reduce total distance). In this case, dist[y] ≤ dist[u] no longer holds, and the inductive hypothesis fails.

18.9 SPFA: Queue-Optimized Bellman-Ford

SPFA (Shortest Path Faster Algorithm) is a practical optimization of Bellman-Ford. Core observation: if node u's dist value was not updated in some round, then u's outgoing edges cannot produce new relaxations in the next round. So we only relax from "recently updated nodes," maintained in a queue.

from collections import deque

def spfa(graph: Dict[int, List[Tuple[int, int]]], source: int, n: int) -> Tuple[List[float], bool]:
    """
    SPFA (Shortest Path Faster Algorithm)
    Queue-optimized Bellman-Ford
    
    Returns:
        (dist, has_negative_cycle)
    """
    dist = [float('inf')] * n
    dist[source] = 0
    
    in_queue = [False] * n
    count = [0] * n  # count[v] = times v enqueued; > n-1 indicates negative cycle
    
    queue = deque([source])
    in_queue[source] = True
    
    while queue:
        u = queue.popleft()
        in_queue[u] = False
        
        for v, w in graph[u]:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                if not in_queue[v]:
                    queue.append(v)
                    in_queue[v] = True
                    count[v] += 1
                    if count[v] >= n:
                        return dist, True  # Negative cycle
    
    return dist, False

Complexity analysis:

Worst case: O(VE), same as Bellman-Ford (carefully constructed graphs can make SPFA degenerate).
Average case: O(kE), where k is a small constant (empirically around 2).
Warning: Do not blindly trust SPFA's "average complexity" in competitive programming with negative-weight graphs — problem setters can construct adversarial inputs.

SLF (Shortest Label First) optimization: Compare the newly enqueued node with the front of the queue; if its dist is smaller, insert at front instead of back.

# SLF optimization
if not in_queue[v]:
    if queue and dist[v] < dist[queue[0]]:
        queue.appendleft(v)  # Insert at front
    else:
        queue.append(v)      # Insert at back
    in_queue[v] = True

18.10 Deep Dive into Negative Cycle Detection

Method 1: Bellman-Ford V-th round detection (shown earlier)

Method 2: SPFA enqueue count — If a node is enqueued more than V-1 times, a negative cycle exists.

Method 3: Virtual source node (detect negative cycles anywhere in the graph)

def detect_negative_cycle(graph: Dict[int, List[Tuple[int, int]]], n: int) -> bool:
    """
    Detect negative cycles anywhere in the graph.
    Strategy: Add a virtual source node connected to all nodes with weight 0.
    """
    edges = []
    for u in range(n):
        edges.append((n, u, 0))  # Virtual source -> all nodes
        for v, w in graph[u]:
            edges.append((u, v, w))
    
    dist, has_neg_cycle = bellman_ford(edges, n, n + 1)
    return has_neg_cycle

Practical application: Arbitrage detection. In foreign exchange markets, exchange rates can be modeled as edge weights (taking logarithms converts multiplication to addition). A negative cycle means an arbitrage opportunity exists — converting currencies around the cycle yields more of the starting currency.

import math

def detect_arbitrage(currencies: List[str], rates: List[List[float]]) -> bool:
    """
    Detect foreign exchange arbitrage opportunities.
    rates[i][j] = how much of currency j you get for 1 unit of currency i
    
    Taking -log: multiplication becomes addition, rate > 1 becomes negative weight.
    Arbitrage ≡ negative cycle.
    """
    n = len(currencies)
    edges = []
    
    for i in range(n):
        for j in range(n):
            if i != j and rates[i][j] > 0:
                edges.append((i, j, -math.log(rates[i][j])))
    
    # Use Bellman-Ford to detect negative cycles
    dist = [0.0] * n  # Start from all nodes simultaneously (equivalent to virtual source)
    
    for _ in range(n - 1):
        for u, v, w in edges:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
    
    for u, v, w in edges:
        if dist[u] + w < dist[v]:
            return True  # Arbitrage opportunity exists
    
    return False

18.11 Prim vs Kruskal: In-Depth Comparison

Theoretical foundation — Cut Property:

For any cut of a graph (partitioning vertices into two non-empty sets S and V\S), the minimum-weight edge crossing the cut must belong to some MST.

Prim uses the cut property: S = nodes already in MST. Each step selects the minimum edge between S and V\S.
Kruskal uses the cut property: Each step selects the globally minimum non-cycle-forming edge, which is equivalent to selecting "the minimum crossing edge of some current cut."

Why are both correct? Because both exploit the cut property. Each selected edge must belong to an MST. This is the mathematical guarantee of greedy correctness.

Implementation complexity comparison:

Operation	Prim (binary heap)	Prim (Fibonacci heap)	Kruskal
Total time	O((V+E) log V)	O(E + V log V)	O(E log E)
Best for	Dense graphs	Theoretical optimal	Sparse graphs
Code complexity	Medium	High	Low
Space	O(V)	O(V)	O(V+E)

Key insight: For dense graphs (E ≈ V²), Prim is O(V² log V) and Kruskal is O(V² log V), roughly equal. But with an adjacency matrix Prim implementation (no heap, linear scan for minimum), Prim achieves O(V²), clearly superior to Kruskal.

def prim_dense(adj_matrix: List[List[float]], n: int) -> int:
    """
    Prim's with adjacency matrix - for dense graphs
    Time O(V²), no heap needed
    """
    INF = float('inf')
    in_mst = [False] * n
    min_edge = [INF] * n  # min_edge[v] = minimum edge weight from v to MST set
    min_edge[0] = 0
    total = 0
    
    for _ in range(n):
        # Find non-MST node with minimum min_edge
        u = -1
        for v in range(n):
            if not in_mst[v] and (u == -1 or min_edge[v] < min_edge[u]):
                u = v
        
        in_mst[u] = True
        total += min_edge[u]
        
        # Update neighbors' min_edge
        for v in range(n):
            if not in_mst[v] and adj_matrix[u][v] < min_edge[v]:
                min_edge[v] = adj_matrix[u][v]
    
    return total

18.12 Path Reconstruction

Algorithms often need to output the actual path, not just the distance. The method is to record each node's predecessor.

def dijkstra_with_path(graph: Dict[int, List[Tuple[int, int]]], source: int, n: int):
    """Dijkstra + path reconstruction"""
    dist = [float('inf')] * n
    dist[source] = 0
    prev = [-1] * n  # Predecessor array
    
    pq = [(0, source)]
    
    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:
            continue
        for v, w in graph[u]:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                prev[v] = u
                heapq.heappush(pq, (dist[v], v))
    
    return dist, prev


def reconstruct_path(prev: List[int], target: int) -> List[int]:
    """Reconstruct path from predecessor array"""
    path = []
    node = target
    while node != -1:
        path.append(node)
        node = prev[node]
    path.reverse()
    return path


# Usage
dist, prev = dijkstra_with_path(graph, 0, 6)
path = reconstruct_path(prev, 5)
print(f"Shortest path from 0 to 5: {' -> '.join(map(str, path))}")
print(f"Distance: {dist[5]}")

18.13 Algorithm Selection Decision Tree

When facing a shortest path problem, decide in this order:

1. Need all-pairs shortest paths?
   ├─ Yes → V ≤ a few hundred?
   │     ├─ Yes → Floyd-Warshall O(V³)
   │     └─ No → Run V times Dijkstra O(VE log V) or Johnson's algorithm
   └─ No → Single-source shortest path
         ├─ Negative edges?
         │   ├─ Yes → Bellman-Ford O(VE) or SPFA
         │   └─ No → Dijkstra O((V+E) log V)
         └─ All edge weights = 1?
               └─ BFS O(V+E) (simplest and fastest)

Level 3 · What the Specifications Say

18.14 Dijkstra's 1959 Paper

Paper: Edsger W. Dijkstra, "A Note on Two Problems in Connexion with Graphs", Numerische Mathematik, 1959.

This three-page paper solved two problems: (1) shortest paths, (2) minimum spanning trees. Dijkstra described the algorithm concisely without a formal proof — he considered the correctness "obvious."

Historical context: Dijkstra was working at the Mathematical Center in Amsterdam. The algorithm was inspired by a practical problem — designing a demonstration program for the ARMAC computer that would compute the shortest railway routes between 64 Dutch cities.

The algorithm as described in the original (restated in modern terms):

Mark all nodes as "unvisited," set distances to ∞, source distance to 0.
Select the unvisited node u with minimum distance.
Relax all neighbors of u.
Mark u as "visited."
Repeat until all nodes are visited.

Dijkstra's original did not use a priority queue — he used linear scanning to find the minimum, giving O(V²) time complexity. The priority queue optimization came later (Johnson 1977, Fredman & Tarjan 1987 with Fibonacci heaps).

Fun fact: Dijkstra later said in an interview: "The algorithm took about twenty minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee, and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path."

18.15 Bellman 1958 / Ford 1956

Richard Bellman described this algorithm in his 1958 paper "On a Routing Problem," as part of his development of dynamic programming theory.

Lester R. Ford Jr. independently described the same algorithm in a 1956 RAND Corporation memorandum.

In fact, Alfonso Shimbel described a similar algorithm in 1955. The algorithm is sometimes called the Bellman-Ford-Moore algorithm (Edward F. Moore also independently discovered it in 1959).

Bellman's dynamic programming perspective: Define d_k(v) = shortest path from s to v using at most k edges. Recurrence:

d_k(v) = min(d_{k-1}(v), min_{(u,v)∈E} (d_{k-1}(u) + w(u,v)))

This is the essence of Bellman-Ford — a dynamic programming "relax by stages" process.

Principle of Optimality (Bellman, 1957): "An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy." Applied to shortest paths: if s→...→u→v is the shortest path from s to v, then s→...→u must be the shortest path from s to u. This is the foundation of correctness for all shortest path algorithms.

18.16 Floyd 1962

Paper: Robert W. Floyd, "Algorithm 97: Shortest Path", Communications of the ACM, 1962.

Floyd's paper is remarkably brief (less than one page), presenting only the triple-loop implementation. But the underlying idea — progressively relaxing "which intermediate nodes are allowed" — is a classic application of dynamic programming in graph theory.

Relationship to Warshall: Stephen Warshall's 1962 paper "A Theorem on Boolean Matrices" solved the transitive closure problem (determining reachability between nodes) using the exact same triple-loop structure. Floyd extended Warshall's Boolean matrices to weighted matrices, generalizing "reachability" to "shortest distance."

Relationship to Kleene's algorithm: At a deeper level, Floyd-Warshall is essentially Kleene's algorithm from regular expression/finite automata theory generalized over semirings. The Kleene algorithm on the (min, +) semiring gives Floyd-Warshall; on the (∨, ∧) Boolean semiring gives Warshall's transitive closure.

18.17 Kruskal 1956 and Prim 1957

Joseph B. Kruskal, "On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem", Proceedings of the American Mathematical Society, 1956.

Kruskal's paper not only proposed the MST algorithm but also discussed its relationship to the Traveling Salesman Problem. He proved the correctness of the greedy strategy: selecting edges in order of increasing weight that don't form cycles.

Robert C. Prim, "Shortest Connection Networks And Some Generalizations", Bell System Technical Journal, 1957.

Prim independently discovered this algorithm while working at Bell Labs. Interestingly, the Czech mathematician Vojtěch Jarník described essentially the same algorithm in 1930 in his paper "O jistém problému minimálním." The algorithm is therefore sometimes called the Jarník-Prim algorithm or DJP algorithm (Dijkstra-Jarník-Prim, since Dijkstra's 1959 paper also independently described it).

Formal proof of the Cut Property (cornerstone of MST correctness):

Theorem (Cut Property): Let (S, V\S) be any cut of graph G, and let e be the unique minimum-weight edge crossing the cut. Then e belongs to every MST of G.

Proof: By contradiction. Suppose T is an MST and e = (u,v) ∉ T. In T, there is a path P between u and v (since T is connected). P must cross the cut (since u ∈ S, v ∈ V\S), so P contains a crossing edge e' ≠ e. Since e is the unique minimum crossing edge, w(e) < w(e'). Remove e' from T and add e, obtaining a new spanning tree T'. w(T') = w(T) - w(e') + w(e) < w(T), contradicting T being an MST. ∎

Generalization: If the minimum crossing edge is not unique (ties exist), at least one MST exists that includes that edge.

18.18 Complexity Lower Bounds and Optimal Algorithms

Shortest path complexity lower bounds:

The comparison-model lower bound for SSSP is Ω(V + E) (must examine all edges at least once).
Dijkstra + Fibonacci heap achieves O(E + V log V), which is optimal for dense graphs (E = Θ(V²)).
Whether a deterministic O(V + E) algorithm exists for sparse graphs remains an open problem.

MST complexity lower bounds:

Comparison-model lower bound Ω(E) (must examine all edges).
Chazelle (2000) gave an O(E α(V)) algorithm (α is the inverse Ackermann function, practically a constant).
Pettie & Ramachandran (2002) gave the optimal deterministic algorithm, but its complexity cannot be expressed using standard functions.
In the randomized model, Karger, Klein & Tarjan (1995) gave an expected O(V + E) linear-time algorithm.

Level 4 · Edge Cases and Pitfalls

18.19 LeetCode 743: Network Delay Time

Problem: There are n network nodes. Given directed edges times[i] = [u, v, w] (transmission time from u to v is w), a signal is sent from node k. Return the minimum time for all nodes to receive the signal. If any node is unreachable, return -1.

Analysis: Classic SSSP problem. All edge weights are positive (transmission times), so use Dijkstra directly.

def networkDelayTime(times: List[List[int]], n: int, k: int) -> int:
    """
    LeetCode 743: Network Delay Time
    
    Straightforward Dijkstra application
    """
    # Build graph (note: nodes numbered from 1)
    graph = {i: [] for i in range(1, n + 1)}
    for u, v, w in times:
        graph[u].append((v, w))
    
    # Dijkstra
    dist = {i: float('inf') for i in range(1, n + 1)}
    dist[k] = 0
    pq = [(0, k)]
    
    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:
            continue
        for v, w in graph[u]:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                heapq.heappush(pq, (dist[v], v))
    
    result = max(dist.values())
    return result if result < float('inf') else -1


# Test
times = [[2,1,1],[2,3,1],[3,4,1]]
print(networkDelayTime(times, 4, 2))  # 2
# From node 2: 2->1(1), 2->3(1), 2->3->4(2)
# Maximum distance to any node = 2

Complexity: O((V + E) log V), where V = n, E = len(times).

Interview follow-ups:

Q: "What if edges can be negative?" → Use Bellman-Ford.
Q: "What if you need to return the actual propagation path?" → Record prev array.
Q: "What if the graph is very large (millions of nodes)?" → If only need path to a specific target, use A* or bidirectional Dijkstra.

18.20 LeetCode 1631: Path With Minimum Effort

Problem: Given a rows×columns height matrix, find a path from top-left (0,0) to bottom-right (rows-1, columns-1). The "effort" of a path is the maximum absolute height difference between consecutive cells. Minimize this effort.

Analysis: This is not a traditional "minimize sum of path weights" problem, but a "minimize the maximum edge weight on a path" (minimax path). The core idea can still be solved with a Dijkstra variant!

Key insight: Redefine dist[v] from "sum of path weights" to "maximum edge weight on path," and change relaxation from dist[u] + w to max(dist[u], w). Dijkstra's greedy strategy still holds — the node with minimum dist extracted from the priority queue cannot achieve a smaller "maximum edge weight" through any other path.

def minimumEffortPath(heights: List[List[int]]) -> int:
    """
    LeetCode 1631: Path With Minimum Effort
    
    Dijkstra variant: dist = maximum edge weight on path
    """
    rows, cols = len(heights), len(heights[0])
    
    dist = [[float('inf')] * cols for _ in range(rows)]
    dist[0][0] = 0
    
    # (effort, row, col)
    pq = [(0, 0, 0)]
    
    directions = [(0, 1), (0, -1), (1, 0), (-1, 0)]
    
    while pq:
        effort, r, c = heapq.heappop(pq)
        
        if r == rows - 1 and c == cols - 1:
            return effort
        
        if effort > dist[r][c]:
            continue
        
        for dr, dc in directions:
            nr, nc = r + dr, c + dc
            if 0 <= nr < rows and 0 <= nc < cols:
                edge_weight = abs(heights[nr][nc] - heights[r][c])
                new_effort = max(effort, edge_weight)
                if new_effort < dist[nr][nc]:
                    dist[nr][nc] = new_effort
                    heapq.heappush(pq, (new_effort, nr, nc))
    
    return 0


# Test
heights = [[1,2,2],[3,8,2],[5,3,5]]
print(minimumEffortPath(heights))  # 2

Alternative solution — Binary search + BFS:

def minimumEffortPath_binary_search(heights: List[List[int]]) -> int:
    """Binary search on answer: guess max effort mid, BFS to verify feasibility"""
    rows, cols = len(heights), len(heights[0])
    
    def can_reach(max_effort: int) -> bool:
        visited = [[False] * cols for _ in range(rows)]
        queue = deque([(0, 0)])
        visited[0][0] = True
        
        while queue:
            r, c = queue.popleft()
            if r == rows - 1 and c == cols - 1:
                return True
            for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                nr, nc = r + dr, c + dc
                if 0 <= nr < rows and 0 <= nc < cols and not visited[nr][nc]:
                    if abs(heights[nr][nc] - heights[r][c]) <= max_effort:
                        visited[nr][nc] = True
                        queue.append((nr, nc))
        return False
    
    lo, hi = 0, 10**6
    while lo < hi:
        mid = (lo + hi) // 2
        if can_reach(mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

Alternative solution — Union-Find + edge sorting (Kruskal-style):

def minimumEffortPath_kruskal(heights: List[List[int]]) -> int:
    """
    Kruskal approach: sort all edges by weight, add until source and target are connected
    """
    rows, cols = len(heights), len(heights[0])
    
    edges = []
    for r in range(rows):
        for c in range(cols):
            idx = r * cols + c
            if c + 1 < cols:
                w = abs(heights[r][c+1] - heights[r][c])
                edges.append((w, idx, idx + 1))
            if r + 1 < rows:
                w = abs(heights[r+1][c] - heights[r][c])
                edges.append((w, idx, idx + cols))
    
    edges.sort()
    uf = UnionFind(rows * cols)
    
    source = 0
    target = rows * cols - 1
    
    for w, u, v in edges:
        uf.union(u, v)
        if uf.find(source) == uf.find(target):
            return w
    
    return 0

This problem perfectly demonstrates how three different algorithmic paradigms can solve the same problem: Dijkstra variant, binary search + BFS, and Kruskal variant. Being able to present multiple solutions and analyze trade-offs is an interview differentiator.

18.21 LeetCode 1584: Min Cost to Connect All Points

Problem: Given n points points[i] = [xi, yi], the connection cost between two points is the Manhattan distance |xi-xj| + |yi-yj|. Return the minimum cost to connect all points.

Analysis: Classic MST problem. Complete graph (any two points can be connected), n points yield n(n-1)/2 edges.

def minCostConnectPoints(points: List[List[int]]) -> int:
    """
    LeetCode 1584: Min Cost to Connect All Points
    
    Method 1: Kruskal (sort all edges)
    """
    n = len(points)
    
    edges = []
    for i in range(n):
        for j in range(i + 1, n):
            dist = abs(points[i][0] - points[j][0]) + abs(points[i][1] - points[j][1])
            edges.append((dist, i, j))
    
    edges.sort()
    
    uf = UnionFind(n)
    total = 0
    count = 0
    
    for w, u, v in edges:
        if uf.union(u, v):
            total += w
            count += 1
            if count == n - 1:
                break
    
    return total


def minCostConnectPoints_prim(points: List[List[int]]) -> int:
    """
    Method 2: Prim (no heap), optimal for complete graphs O(n²)
    """
    n = len(points)
    
    in_mst = [False] * n
    min_dist = [float('inf')] * n
    min_dist[0] = 0
    total = 0
    
    for _ in range(n):
        u = -1
        for v in range(n):
            if not in_mst[v] and (u == -1 or min_dist[v] < min_dist[u]):
                u = v
        
        in_mst[u] = True
        total += min_dist[u]
        
        for v in range(n):
            if not in_mst[v]:
                d = abs(points[u][0] - points[v][0]) + abs(points[u][1] - points[v][1])
                min_dist[v] = min(min_dist[v], d)
    
    return total


# Test
points = [[0,0],[2,2],[3,10],[5,2],[7,0]]
print(minCostConnectPoints(points))  # 20
print(minCostConnectPoints_prim(points))  # 20

Performance comparison:

Kruskal: O(n² log n) time, O(n²) space (store all edges)
Prim (no heap): O(n²) time, O(n) space
For this problem with n ≤ 1000, both pass. But Prim without heap is optimal for complete graphs.

18.22 Common Pitfalls and Interview Tips Summary

Pitfall 1: Dijkstra with negative edges

# Classic counterexample
# 0 -> 1 (weight 1)
# 0 -> 2 (weight 5)
# 1 -> 2 (weight -10)
# Dijkstra: finalizes dist[1]=1, then dist[2]=5 is finalized
# Actual shortest: 0->1->2 = 1+(-10) = -9

In interviews: if the problem guarantees non-negative weights, use Dijkstra; otherwise use Bellman-Ford.

Pitfall 2: Floyd-Warshall loop order

# Correct: k is the outermost loop
for k in range(n):
    for i in range(n):
        for j in range(n):
            ...

# Wrong: k not outermost produces incorrect results
# Because dp[k] depends on dp[k-1], k must be the outer loop

Pitfall 3: MST ≠ Shortest Path Tree

    A --1-- B --1-- C
    |               |
    3               3
    |               |
    D ------1------ E

MST includes A-B, B-C, D-E, and A-D or C-E (total weight 6)
But shortest path from A to E is A-B-C-E (weight 5), not following MST edges

Pitfall 4: Graph not connected → MST doesn't exist

def kruskal_safe(n, edges):
    total, mst = kruskal(n, edges)
    if len(mst) < n - 1:
        return -1  # Graph not connected, MST doesn't exist
    return total

Interview template summary:

Problem Type	Algorithm	Key Code
SSSP (positive weights)	Dijkstra	heapq + dist array
SSSP (negative weights)	Bellman-Ford/SPFA	V-1 rounds relaxation
APSP	Floyd-Warshall	Triple loop, k outermost
MST	Kruskal	Sort + Union-Find
MST (dense graph)	Prim	min_dist array
Minimax path	Dijkstra variant	max instead of +
Connectivity check	Union-Find/BFS	union + find

Time complexity quick reference:

Algorithm	Time	Space	Applicability
Dijkstra (heap)	O((V+E) log V)	O(V+E)	Non-negative weights
Bellman-Ford	O(VE)	O(V)	Any weights (detects negative cycles)
SPFA	Avg O(kE), worst O(VE)	O(V+E)	Any weights
Floyd-Warshall	O(V³)	O(V²)	APSP, small V
Prim (heap)	O((V+E) log V)	O(V+E)	Undirected connected
Prim (no heap)	O(V²)	O(V)	Dense graphs
Kruskal	O(E log E)	O(V+E)	Undirected connected

Rate this chapter

4.7 / 5 (13 ratings)