Shortest Paths and Minimum Spanning Trees
Chapter 18: Shortest Paths and Minimum Spanning Trees
Graph traversal (BFS/DFS) tells us "which nodes are reachable," but in the real world, we care about cost: What is the fastest route from A to B? What is the cheapest fiber network connecting all cities? These two questions correspond to the two most classic optimization problems in graph theory — Shortest Path and Minimum Spanning Tree (MST).
These problems appear similar (both find "optimal" structures on weighted graphs), but are fundamentally different: shortest path is path optimization from source to target, while MST is edge set optimization for global connectivity. Their algorithmic design philosophies are also distinct — shortest path relies on relaxation to progressively approach the optimal solution, while MST relies on greedy strategies to progressively build the optimal solution.
In this chapter, we will thoroughly master five core algorithms: Dijkstra (single-source shortest path, non-negative weights), Bellman-Ford (single-source shortest path, allows negative weights), Floyd-Warshall (all-pairs shortest path), Prim (MST, vertex-centric), and Kruskal (MST, edge-centric). For each algorithm, we will answer: What problem does it solve? Why is it correct? What is its time complexity? When should you choose it?
Level 1 · What You Need to Know
18.1 Problem Definitions
Single-Source Shortest Path (SSSP): Given a weighted directed graph G=(V, E) and a source node s, find the shortest path distances from s to all other nodes.
All-Pairs Shortest Path (APSP): Find the shortest path distances between all pairs of nodes.
Minimum Spanning Tree (MST): Given a weighted undirected connected graph G=(V, E), find a tree containing all nodes that minimizes the sum of edge weights.
Weighted directed graph example (SSSP):
A --2--> B --3--> D
| ^ ^
4 1 2
| | |
v | |
C -------+ E ---+
Weighted undirected connected graph example (MST):
A ---4--- B
| \ | \
1 3 6 5
| \ | \
C ---2--- D ---7--- E
Key differences:
- Shortest path: Can have cycles, can have negative-weight edges (Bellman-Ford), goal is a path.
- MST: Defined only on undirected connected graphs, goal is a set of edges (a tree).
18.2 Dijkstra's Algorithm
Applicability: Graphs with non-negative edge weights (undirected/directed).
Core idea: Maintain a set S of nodes whose shortest distances have been finalized. Each iteration, select the node u with the smallest distance from among unfinalized nodes, add it to S, then use u's outgoing edges to relax its neighbors' distances.
Relaxation operation: If dist[u] + w(u,v) < dist[v], update dist[v] = dist[u] + w(u,v).
import heapq
from typing import List, Tuple, Dict
def dijkstra(graph: Dict[int, List[Tuple[int, int]]], source: int, n: int) -> List[float]:
"""
Dijkstra's Algorithm - Priority Queue Implementation
Args:
graph: Adjacency list, graph[u] = [(v, weight), ...]
source: Source node
n: Number of nodes (0 to n-1)
Returns:
dist: dist[i] = shortest distance from source to node i
"""
dist = [float('inf')] * n
dist[source] = 0
# Priority queue: (distance, node)
pq = [(0, source)]
while pq:
d, u = heapq.heappop(pq)
# Key optimization: skip if outdated entry
if d > dist[u]:
continue
# Relax all neighbors
for v, weight in graph[u]:
new_dist = dist[u] + weight
if new_dist < dist[v]:
dist[v] = new_dist
heapq.heappush(pq, (new_dist, v))
return dist
Usage example:
# Build graph
graph = {
0: [(1, 2), (2, 4)], # 0->1 weight 2, 0->2 weight 4
1: [(2, 1), (3, 7)], # 1->2 weight 1, 1->3 weight 7
2: [(3, 3)], # 2->3 weight 3
3: []
}
dist = dijkstra(graph, 0, 4)
print(dist) # [0, 2, 3, 6]
# Explanation: 0->0=0, 0->1=2, 0->1->2=3, 0->1->2->3=6
Time complexity: O((V + E) log V). Each node is enqueued at most V times (far fewer in practice), each enqueue/dequeue is O(log V).
Common mistakes:
# Mistake 1: Using Dijkstra on graphs with negative edges
graph = {0: [(1, -3), (2, 2)], 1: [(2, 1)], 2: []}
# Dijkstra gives dist[2] = 2 (direct 0->2)
# But actual shortest is 0->1->2 = -3+1 = -2
# Because once Dijkstra finalizes a node, it never updates it!
# Mistake 2: Forgetting to skip stale queue entries
def dijkstra_wrong(graph, source, n):
dist = [float('inf')] * n
dist[source] = 0
pq = [(0, source)]
while pq:
d, u = heapq.heappop(pq)
# Missing: if d > dist[u]: continue
# Results in redundant processing, correct output but much slower
for v, w in graph[u]:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
heapq.heappush(pq, (dist[v], v))
return dist
18.3 Bellman-Ford Algorithm
Applicability: Graphs with negative-weight edges. Can detect negative cycles reachable from the source.
Core idea: Repeatedly relax all edges V-1 times. Since a shortest path contains at most V-1 edges (in the acyclic case), V-1 rounds of relaxation are guaranteed to find all shortest paths.
def bellman_ford(edges: List[Tuple[int, int, int]], source: int, n: int) -> Tuple[List[float], bool]:
"""
Bellman-Ford Algorithm
Args:
edges: Edge list [(u, v, weight), ...]
source: Source node
n: Number of nodes
Returns:
(dist, has_negative_cycle): Distance array and whether a negative cycle exists
"""
dist = [float('inf')] * n
dist[source] = 0
# V-1 rounds of relaxation
for i in range(n - 1):
updated = False
for u, v, w in edges:
if dist[u] != float('inf') and dist[u] + w < dist[v]:
dist[v] = dist[u] + w
updated = True
# Early termination: if no updates in a round, already converged
if not updated:
break
# V-th round: detect negative cycles
for u, v, w in edges:
if dist[u] != float('inf') and dist[u] + w < dist[v]:
return dist, True # Negative cycle exists
return dist, False
Why are V-1 rounds sufficient? If the shortest path uses k edges, it will be correctly computed by round k. A simple path uses at most V-1 edges (visiting all V nodes), so V-1 rounds suffice.
Time complexity: O(VE). V-1 rounds, each traversing all E edges.
Usage example:
edges = [
(0, 1, 4),
(0, 2, 5),
(1, 2, -3), # Negative weight edge
(2, 3, 4),
(3, 1, -2), # Note: 1->2->3->1 forms cycle, weight = -3+4-2 = -1 < 0, negative cycle!
]
dist, has_neg_cycle = bellman_ford(edges, 0, 4)
if has_neg_cycle:
print("Graph contains a negative cycle, shortest paths undefined")
else:
print(dist)
18.4 Floyd-Warshall Algorithm
Applicability: All-pairs shortest paths. Allows negative edges but no negative cycles.
Core idea: Dynamic programming. Define dp[k][i][j] = shortest path from i to j using only intermediate nodes numbered ≤ k. Transition:
dp[k][i][j] = min(dp[k-1][i][j], dp[k-1][i][k] + dp[k-1][k][j])
Either don't go through node k (inherit dp[k-1][i][j]), or go through k (concatenate i→k and k→j paths).
def floyd_warshall(n: int, edges: List[Tuple[int, int, int]]) -> List[List[float]]:
"""
Floyd-Warshall All-Pairs Shortest Paths
Args:
n: Number of nodes
edges: Edge list [(u, v, weight), ...]
Returns:
dist: dist[i][j] = shortest distance from i to j
"""
INF = float('inf')
dist = [[INF] * n for _ in range(n)]
# Initialization
for i in range(n):
dist[i][i] = 0
for u, v, w in edges:
dist[u][v] = min(dist[u][v], w) # Handle parallel edges
# DP: progressively add intermediate nodes
for k in range(n):
for i in range(n):
for j in range(n):
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
return dist
Time complexity: O(V³). Space-optimized to O(V²) (in-place updates).
Use cases:
- Small number of nodes (V ≤ a few hundred) but need all-pairs shortest paths
- Dense graphs (adjacency matrix representation more convenient)
- Need to detect negative cycles (if
dist[i][i] < 0, there's a negative cycle through i)
18.5 Minimum Spanning Tree: Prim's Algorithm
Applicability: Weighted undirected connected graphs.
Core idea: Starting from any node, repeatedly select the minimum-weight edge connecting a node "in the tree" to a node "not in the tree," and add it to the MST. This is a greedy strategy.
def prim(graph: Dict[int, List[Tuple[int, int]]], n: int) -> Tuple[int, List[Tuple[int, int, int]]]:
"""
Prim's Algorithm - Priority Queue Implementation
Args:
graph: Undirected graph adjacency list, graph[u] = [(v, weight), ...]
n: Number of nodes
Returns:
(total_weight, mst_edges): MST total weight and edge list
"""
visited = [False] * n
mst_edges = []
total_weight = 0
# (weight, target_node, source_node)
pq = [(0, 0, -1)] # Start from node 0
while pq and len(mst_edges) < n - 1:
weight, u, parent = heapq.heappop(pq)
if visited[u]:
continue
visited[u] = True
total_weight += weight
if parent != -1:
mst_edges.append((parent, u, weight))
for v, w in graph[u]:
if not visited[v]:
heapq.heappush(pq, (w, v, u))
return total_weight, mst_edges
Time complexity: O((V + E) log V) (binary heap), O(E + V log V) (Fibonacci heap).
18.6 Minimum Spanning Tree: Kruskal's Algorithm
Core idea: Sort all edges by weight. Consider each edge from smallest to largest. If adding this edge doesn't form a cycle (checked using Union-Find), add it to the MST.
class UnionFind:
"""Union-Find with path compression + union by rank"""
def __init__(self, n: int):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x: int) -> int:
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x: int, y: int) -> bool:
rx, ry = self.find(x), self.find(y)
if rx == ry:
return False # Already connected, would form cycle
if self.rank[rx] < self.rank[ry]:
rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1
return True
def kruskal(n: int, edges: List[Tuple[int, int, int]]) -> Tuple[int, List[Tuple[int, int, int]]]:
"""
Kruskal's Algorithm
Args:
n: Number of nodes
edges: Edge list [(u, v, weight), ...]
Returns:
(total_weight, mst_edges): MST total weight and edge list
"""
# Sort by weight
edges.sort(key=lambda e: e[2])
uf = UnionFind(n)
mst_edges = []
total_weight = 0
for u, v, w in edges:
if uf.union(u, v):
mst_edges.append((u, v, w))
total_weight += w
if len(mst_edges) == n - 1:
break
return total_weight, mst_edges
Time complexity: O(E log E), i.e., O(E log V) (since E ≤ V², log E = O(log V)).
Prim vs Kruskal quick selection guide:
| Scenario | Recommended | Reason |
|---|---|---|
| Dense graph (E ≈ V²) | Prim | No need to sort all edges |
| Sparse graph (E ≈ V) | Kruskal | Low sorting cost, simple Union-Find |
| Already have Union-Find | Kruskal | More concise code |
| Need to add nodes online | Prim | Easy to extend from current tree |
| Interview whiteboard | Kruskal | Shorter to implement, fewer bugs |
18.7 Complete Example: Building from Scratch
"""
Comprehensive example: A city road network
"""
def demo():
# City network: 6 cities, bidirectional roads
# Edges: (cityA, cityB, distance)
roads = [
(0, 1, 7), (0, 2, 9), (0, 5, 14),
(1, 2, 10), (1, 3, 15),
(2, 3, 11), (2, 5, 2),
(3, 4, 6),
(4, 5, 9)
]
n = 6
# 1. Single-source shortest path (Dijkstra)
graph = {i: [] for i in range(n)}
for u, v, w in roads:
graph[u].append((v, w))
graph[v].append((u, w)) # Undirected graph
dist = dijkstra(graph, 0, n)
print("Shortest distances from city 0:", dist)
# [0, 7, 9, 20, 20, 11]
# 2. All-pairs shortest path (Floyd-Warshall)
directed_edges = []
for u, v, w in roads:
directed_edges.append((u, v, w))
directed_edges.append((v, u, w))
all_dist = floyd_warshall(n, directed_edges)
print("Shortest distance from city 2 to city 4:", all_dist[2][4])
# 3. Minimum spanning tree (Kruskal)
total, mst = kruskal(n, roads[:]) # Pass copy since sort is in-place
print(f"Minimum total road length to connect all cities: {total}")
print("Selected roads:", mst)
demo()
Level 2 · How It Works Under the Hood
18.8 Correctness Proof of Dijkstra's Algorithm
Theorem: For graphs with non-negative edge weights, Dijkstra's algorithm correctly computes the shortest distances to all nodes reachable from the source.
Proof (by induction):
Let S be the set of nodes whose shortest distances have been finalized. We prove: each time a node u is added to S, dist[u] is indeed the shortest distance from s to u.
Base case: When source s is added to S, dist[s] = 0, which is correct.
Inductive step: Assume all nodes in S have correctly computed distances. Let u be the next selected node (the unfinalized node with smallest dist value).
Proof by contradiction: Suppose there exists a shorter path P: s → ... → x → y → ... → u, where x ∈ S, y ∉ S.
- Since y is the first node on P not in S, the length of s → ... → x → y ≤ total length of P.
- Since all edge weights are non-negative, length of s → ... → x → y ≤ length of s → ... → x → y → ... → u.
- By inductive hypothesis, dist[x] is correct, so when processing x, y was relaxed: dist[y] ≤ dist[x] + w(x,y) = length of s→...→x→y.
- But u is the unfinalized node with minimum dist, so dist[u] ≤ dist[y].
- Combining: dist[u] ≤ dist[y] ≤ length of P < dist[u]. Contradiction!
Therefore dist[u] is the shortest distance. ∎
Why do negative edges break correctness? If there's a negative edge on the path (y, ..., u), then the length of "s → ... → x → y → ... → u" could be less than "s → ... → x → y" (later edges reduce total distance). In this case, dist[y] ≤ dist[u] no longer holds, and the inductive hypothesis fails.
18.9 SPFA: Queue-Optimized Bellman-Ford
SPFA (Shortest Path Faster Algorithm) is a practical optimization of Bellman-Ford. Core observation: if node u's dist value was not updated in some round, then u's outgoing edges cannot produce new relaxations in the next round. So we only relax from "recently updated nodes," maintained in a queue.
from collections import deque
def spfa(graph: Dict[int, List[Tuple[int, int]]], source: int, n: int) -> Tuple[List[float], bool]:
"""
SPFA (Shortest Path Faster Algorithm)
Queue-optimized Bellman-Ford
Returns:
(dist, has_negative_cycle)
"""
dist = [float('inf')] * n
dist[source] = 0
in_queue = [False] * n
count = [0] * n # count[v] = times v enqueued; > n-1 indicates negative cycle
queue = deque([source])
in_queue[source] = True
while queue:
u = queue.popleft()
in_queue[u] = False
for v, w in graph[u]:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
if not in_queue[v]:
queue.append(v)
in_queue[v] = True
count[v] += 1
if count[v] >= n:
return dist, True # Negative cycle
return dist, False
Complexity analysis:
- Worst case: O(VE), same as Bellman-Ford (carefully constructed graphs can make SPFA degenerate).
- Average case: O(kE), where k is a small constant (empirically around 2).
- Warning: Do not blindly trust SPFA's "average complexity" in competitive programming with negative-weight graphs — problem setters can construct adversarial inputs.
SLF (Shortest Label First) optimization: Compare the newly enqueued node with the front of the queue; if its dist is smaller, insert at front instead of back.
# SLF optimization
if not in_queue[v]:
if queue and dist[v] < dist[queue[0]]:
queue.appendleft(v) # Insert at front
else:
queue.append(v) # Insert at back
in_queue[v] = True
18.10 Deep Dive into Negative Cycle Detection
Method 1: Bellman-Ford V-th round detection (shown earlier)
Method 2: SPFA enqueue count — If a node is enqueued more than V-1 times, a negative cycle exists.
Method 3: Virtual source node (detect negative cycles anywhere in the graph)
def detect_negative_cycle(graph: Dict[int, List[Tuple[int, int]]], n: int) -> bool:
"""
Detect negative cycles anywhere in the graph.
Strategy: Add a virtual source node connected to all nodes with weight 0.
"""
edges = []
for u in range(n):
edges.append((n, u, 0)) # Virtual source -> all nodes
for v, w in graph[u]:
edges.append((u, v, w))
dist, has_neg_cycle = bellman_ford(edges, n, n + 1)
return has_neg_cycle
Practical application: Arbitrage detection. In foreign exchange markets, exchange rates can be modeled as edge weights (taking logarithms converts multiplication to addition). A negative cycle means an arbitrage opportunity exists — converting currencies around the cycle yields more of the starting currency.
import math
def detect_arbitrage(currencies: List[str], rates: List[List[float]]) -> bool:
"""
Detect foreign exchange arbitrage opportunities.
rates[i][j] = how much of currency j you get for 1 unit of currency i
Taking -log: multiplication becomes addition, rate > 1 becomes negative weight.
Arbitrage ≡ negative cycle.
"""
n = len(currencies)
edges = []
for i in range(n):
for j in range(n):
if i != j and rates[i][j] > 0:
edges.append((i, j, -math.log(rates[i][j])))
# Use Bellman-Ford to detect negative cycles
dist = [0.0] * n # Start from all nodes simultaneously (equivalent to virtual source)
for _ in range(n - 1):
for u, v, w in edges:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
for u, v, w in edges:
if dist[u] + w < dist[v]:
return True # Arbitrage opportunity exists
return False
18.11 Prim vs Kruskal: In-Depth Comparison
Theoretical foundation — Cut Property:
For any cut of a graph (partitioning vertices into two non-empty sets S and V\S), the minimum-weight edge crossing the cut must belong to some MST.
- Prim uses the cut property: S = nodes already in MST. Each step selects the minimum edge between S and V\S.
- Kruskal uses the cut property: Each step selects the globally minimum non-cycle-forming edge, which is equivalent to selecting "the minimum crossing edge of some current cut."
Why are both correct? Because both exploit the cut property. Each selected edge must belong to an MST. This is the mathematical guarantee of greedy correctness.
Implementation complexity comparison:
| Operation | Prim (binary heap) | Prim (Fibonacci heap) | Kruskal |
|---|---|---|---|
| Total time | O((V+E) log V) | O(E + V log V) | O(E log E) |
| Best for | Dense graphs | Theoretical optimal | Sparse graphs |
| Code complexity | Medium | High | Low |
| Space | O(V) | O(V) | O(V+E) |
Key insight: For dense graphs (E ≈ V²), Prim is O(V² log V) and Kruskal is O(V² log V), roughly equal. But with an adjacency matrix Prim implementation (no heap, linear scan for minimum), Prim achieves O(V²), clearly superior to Kruskal.
def prim_dense(adj_matrix: List[List[float]], n: int) -> int:
"""
Prim's with adjacency matrix - for dense graphs
Time O(V²), no heap needed
"""
INF = float('inf')
in_mst = [False] * n
min_edge = [INF] * n # min_edge[v] = minimum edge weight from v to MST set
min_edge[0] = 0
total = 0
for _ in range(n):
# Find non-MST node with minimum min_edge
u = -1
for v in range(n):
if not in_mst[v] and (u == -1 or min_edge[v] < min_edge[u]):
u = v
in_mst[u] = True
total += min_edge[u]
# Update neighbors' min_edge
for v in range(n):
if not in_mst[v] and adj_matrix[u][v] < min_edge[v]:
min_edge[v] = adj_matrix[u][v]
return total
18.12 Path Reconstruction
Algorithms often need to output the actual path, not just the distance. The method is to record each node's predecessor.
def dijkstra_with_path(graph: Dict[int, List[Tuple[int, int]]], source: int, n: int):
"""Dijkstra + path reconstruction"""
dist = [float('inf')] * n
dist[source] = 0
prev = [-1] * n # Predecessor array
pq = [(0, source)]
while pq:
d, u = heapq.heappop(pq)
if d > dist[u]:
continue
for v, w in graph[u]:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
prev[v] = u
heapq.heappush(pq, (dist[v], v))
return dist, prev
def reconstruct_path(prev: List[int], target: int) -> List[int]:
"""Reconstruct path from predecessor array"""
path = []
node = target
while node != -1:
path.append(node)
node = prev[node]
path.reverse()
return path
# Usage
dist, prev = dijkstra_with_path(graph, 0, 6)
path = reconstruct_path(prev, 5)
print(f"Shortest path from 0 to 5: {' -> '.join(map(str, path))}")
print(f"Distance: {dist[5]}")
18.13 Algorithm Selection Decision Tree
When facing a shortest path problem, decide in this order:
1. Need all-pairs shortest paths?
├─ Yes → V ≤ a few hundred?
│ ├─ Yes → Floyd-Warshall O(V³)
│ └─ No → Run V times Dijkstra O(VE log V) or Johnson's algorithm
└─ No → Single-source shortest path
├─ Negative edges?
│ ├─ Yes → Bellman-Ford O(VE) or SPFA
│ └─ No → Dijkstra O((V+E) log V)
└─ All edge weights = 1?
└─ BFS O(V+E) (simplest and fastest)
Level 3 · What the Specifications Say
18.14 Dijkstra's 1959 Paper
Paper: Edsger W. Dijkstra, "A Note on Two Problems in Connexion with Graphs", Numerische Mathematik, 1959.
This three-page paper solved two problems: (1) shortest paths, (2) minimum spanning trees. Dijkstra described the algorithm concisely without a formal proof — he considered the correctness "obvious."
Historical context: Dijkstra was working at the Mathematical Center in Amsterdam. The algorithm was inspired by a practical problem — designing a demonstration program for the ARMAC computer that would compute the shortest railway routes between 64 Dutch cities.
The algorithm as described in the original (restated in modern terms):
- Mark all nodes as "unvisited," set distances to ∞, source distance to 0.
- Select the unvisited node u with minimum distance.
- Relax all neighbors of u.
- Mark u as "visited."
- Repeat until all nodes are visited.
Dijkstra's original did not use a priority queue — he used linear scanning to find the minimum, giving O(V²) time complexity. The priority queue optimization came later (Johnson 1977, Fredman & Tarjan 1987 with Fibonacci heaps).
Fun fact: Dijkstra later said in an interview: "The algorithm took about twenty minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee, and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path."
18.15 Bellman 1958 / Ford 1956
Richard Bellman described this algorithm in his 1958 paper "On a Routing Problem," as part of his development of dynamic programming theory.
Lester R. Ford Jr. independently described the same algorithm in a 1956 RAND Corporation memorandum.
In fact, Alfonso Shimbel described a similar algorithm in 1955. The algorithm is sometimes called the Bellman-Ford-Moore algorithm (Edward F. Moore also independently discovered it in 1959).
Bellman's dynamic programming perspective: Define d_k(v) = shortest path from s to v using at most k edges. Recurrence:
d_k(v) = min(d_{k-1}(v), min_{(u,v)∈E} (d_{k-1}(u) + w(u,v)))
This is the essence of Bellman-Ford — a dynamic programming "relax by stages" process.
Principle of Optimality (Bellman, 1957): "An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy." Applied to shortest paths: if s→...→u→v is the shortest path from s to v, then s→...→u must be the shortest path from s to u. This is the foundation of correctness for all shortest path algorithms.
18.16 Floyd 1962
Paper: Robert W. Floyd, "Algorithm 97: Shortest Path", Communications of the ACM, 1962.
Floyd's paper is remarkably brief (less than one page), presenting only the triple-loop implementation. But the underlying idea — progressively relaxing "which intermediate nodes are allowed" — is a classic application of dynamic programming in graph theory.
Relationship to Warshall: Stephen Warshall's 1962 paper "A Theorem on Boolean Matrices" solved the transitive closure problem (determining reachability between nodes) using the exact same triple-loop structure. Floyd extended Warshall's Boolean matrices to weighted matrices, generalizing "reachability" to "shortest distance."
Relationship to Kleene's algorithm: At a deeper level, Floyd-Warshall is essentially Kleene's algorithm from regular expression/finite automata theory generalized over semirings. The Kleene algorithm on the (min, +) semiring gives Floyd-Warshall; on the (∨, ∧) Boolean semiring gives Warshall's transitive closure.
18.17 Kruskal 1956 and Prim 1957
Joseph B. Kruskal, "On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem", Proceedings of the American Mathematical Society, 1956.
Kruskal's paper not only proposed the MST algorithm but also discussed its relationship to the Traveling Salesman Problem. He proved the correctness of the greedy strategy: selecting edges in order of increasing weight that don't form cycles.
Robert C. Prim, "Shortest Connection Networks And Some Generalizations", Bell System Technical Journal, 1957.
Prim independently discovered this algorithm while working at Bell Labs. Interestingly, the Czech mathematician Vojtěch Jarník described essentially the same algorithm in 1930 in his paper "O jistém problému minimálním." The algorithm is therefore sometimes called the Jarník-Prim algorithm or DJP algorithm (Dijkstra-Jarník-Prim, since Dijkstra's 1959 paper also independently described it).
Formal proof of the Cut Property (cornerstone of MST correctness):
Theorem (Cut Property): Let (S, V\S) be any cut of graph G, and let e be the unique minimum-weight edge crossing the cut. Then e belongs to every MST of G.
Proof: By contradiction. Suppose T is an MST and e = (u,v) ∉ T. In T, there is a path P between u and v (since T is connected). P must cross the cut (since u ∈ S, v ∈ V\S), so P contains a crossing edge e' ≠ e. Since e is the unique minimum crossing edge, w(e) < w(e'). Remove e' from T and add e, obtaining a new spanning tree T'. w(T') = w(T) - w(e') + w(e) < w(T), contradicting T being an MST. ∎
Generalization: If the minimum crossing edge is not unique (ties exist), at least one MST exists that includes that edge.
18.18 Complexity Lower Bounds and Optimal Algorithms
Shortest path complexity lower bounds:
- The comparison-model lower bound for SSSP is Ω(V + E) (must examine all edges at least once).
- Dijkstra + Fibonacci heap achieves O(E + V log V), which is optimal for dense graphs (E = Θ(V²)).
- Whether a deterministic O(V + E) algorithm exists for sparse graphs remains an open problem.
MST complexity lower bounds:
- Comparison-model lower bound Ω(E) (must examine all edges).
- Chazelle (2000) gave an O(E α(V)) algorithm (α is the inverse Ackermann function, practically a constant).
- Pettie & Ramachandran (2002) gave the optimal deterministic algorithm, but its complexity cannot be expressed using standard functions.
- In the randomized model, Karger, Klein & Tarjan (1995) gave an expected O(V + E) linear-time algorithm.
Level 4 · Edge Cases and Pitfalls
18.19 LeetCode 743: Network Delay Time
Problem: There are n network nodes. Given directed edges times[i] = [u, v, w] (transmission time from u to v is w), a signal is sent from node k. Return the minimum time for all nodes to receive the signal. If any node is unreachable, return -1.
Analysis: Classic SSSP problem. All edge weights are positive (transmission times), so use Dijkstra directly.
def networkDelayTime(times: List[List[int]], n: int, k: int) -> int:
"""
LeetCode 743: Network Delay Time
Straightforward Dijkstra application
"""
# Build graph (note: nodes numbered from 1)
graph = {i: [] for i in range(1, n + 1)}
for u, v, w in times:
graph[u].append((v, w))
# Dijkstra
dist = {i: float('inf') for i in range(1, n + 1)}
dist[k] = 0
pq = [(0, k)]
while pq:
d, u = heapq.heappop(pq)
if d > dist[u]:
continue
for v, w in graph[u]:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
heapq.heappush(pq, (dist[v], v))
result = max(dist.values())
return result if result < float('inf') else -1
# Test
times = [[2,1,1],[2,3,1],[3,4,1]]
print(networkDelayTime(times, 4, 2)) # 2
# From node 2: 2->1(1), 2->3(1), 2->3->4(2)
# Maximum distance to any node = 2
Complexity: O((V + E) log V), where V = n, E = len(times).
Interview follow-ups:
- Q: "What if edges can be negative?" → Use Bellman-Ford.
- Q: "What if you need to return the actual propagation path?" → Record prev array.
- Q: "What if the graph is very large (millions of nodes)?" → If only need path to a specific target, use A* or bidirectional Dijkstra.
18.20 LeetCode 1631: Path With Minimum Effort
Problem: Given a rows×columns height matrix, find a path from top-left (0,0) to bottom-right (rows-1, columns-1). The "effort" of a path is the maximum absolute height difference between consecutive cells. Minimize this effort.
Analysis: This is not a traditional "minimize sum of path weights" problem, but a "minimize the maximum edge weight on a path" (minimax path). The core idea can still be solved with a Dijkstra variant!
Key insight: Redefine dist[v] from "sum of path weights" to "maximum edge weight on path," and change relaxation from dist[u] + w to max(dist[u], w). Dijkstra's greedy strategy still holds — the node with minimum dist extracted from the priority queue cannot achieve a smaller "maximum edge weight" through any other path.
def minimumEffortPath(heights: List[List[int]]) -> int:
"""
LeetCode 1631: Path With Minimum Effort
Dijkstra variant: dist = maximum edge weight on path
"""
rows, cols = len(heights), len(heights[0])
dist = [[float('inf')] * cols for _ in range(rows)]
dist[0][0] = 0
# (effort, row, col)
pq = [(0, 0, 0)]
directions = [(0, 1), (0, -1), (1, 0), (-1, 0)]
while pq:
effort, r, c = heapq.heappop(pq)
if r == rows - 1 and c == cols - 1:
return effort
if effort > dist[r][c]:
continue
for dr, dc in directions:
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols:
edge_weight = abs(heights[nr][nc] - heights[r][c])
new_effort = max(effort, edge_weight)
if new_effort < dist[nr][nc]:
dist[nr][nc] = new_effort
heapq.heappush(pq, (new_effort, nr, nc))
return 0
# Test
heights = [[1,2,2],[3,8,2],[5,3,5]]
print(minimumEffortPath(heights)) # 2
Alternative solution — Binary search + BFS:
def minimumEffortPath_binary_search(heights: List[List[int]]) -> int:
"""Binary search on answer: guess max effort mid, BFS to verify feasibility"""
rows, cols = len(heights), len(heights[0])
def can_reach(max_effort: int) -> bool:
visited = [[False] * cols for _ in range(rows)]
queue = deque([(0, 0)])
visited[0][0] = True
while queue:
r, c = queue.popleft()
if r == rows - 1 and c == cols - 1:
return True
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols and not visited[nr][nc]:
if abs(heights[nr][nc] - heights[r][c]) <= max_effort:
visited[nr][nc] = True
queue.append((nr, nc))
return False
lo, hi = 0, 10**6
while lo < hi:
mid = (lo + hi) // 2
if can_reach(mid):
hi = mid
else:
lo = mid + 1
return lo
Alternative solution — Union-Find + edge sorting (Kruskal-style):
def minimumEffortPath_kruskal(heights: List[List[int]]) -> int:
"""
Kruskal approach: sort all edges by weight, add until source and target are connected
"""
rows, cols = len(heights), len(heights[0])
edges = []
for r in range(rows):
for c in range(cols):
idx = r * cols + c
if c + 1 < cols:
w = abs(heights[r][c+1] - heights[r][c])
edges.append((w, idx, idx + 1))
if r + 1 < rows:
w = abs(heights[r+1][c] - heights[r][c])
edges.append((w, idx, idx + cols))
edges.sort()
uf = UnionFind(rows * cols)
source = 0
target = rows * cols - 1
for w, u, v in edges:
uf.union(u, v)
if uf.find(source) == uf.find(target):
return w
return 0
This problem perfectly demonstrates how three different algorithmic paradigms can solve the same problem: Dijkstra variant, binary search + BFS, and Kruskal variant. Being able to present multiple solutions and analyze trade-offs is an interview differentiator.
18.21 LeetCode 1584: Min Cost to Connect All Points
Problem: Given n points points[i] = [xi, yi], the connection cost between two points is the Manhattan distance |xi-xj| + |yi-yj|. Return the minimum cost to connect all points.
Analysis: Classic MST problem. Complete graph (any two points can be connected), n points yield n(n-1)/2 edges.
def minCostConnectPoints(points: List[List[int]]) -> int:
"""
LeetCode 1584: Min Cost to Connect All Points
Method 1: Kruskal (sort all edges)
"""
n = len(points)
edges = []
for i in range(n):
for j in range(i + 1, n):
dist = abs(points[i][0] - points[j][0]) + abs(points[i][1] - points[j][1])
edges.append((dist, i, j))
edges.sort()
uf = UnionFind(n)
total = 0
count = 0
for w, u, v in edges:
if uf.union(u, v):
total += w
count += 1
if count == n - 1:
break
return total
def minCostConnectPoints_prim(points: List[List[int]]) -> int:
"""
Method 2: Prim (no heap), optimal for complete graphs O(n²)
"""
n = len(points)
in_mst = [False] * n
min_dist = [float('inf')] * n
min_dist[0] = 0
total = 0
for _ in range(n):
u = -1
for v in range(n):
if not in_mst[v] and (u == -1 or min_dist[v] < min_dist[u]):
u = v
in_mst[u] = True
total += min_dist[u]
for v in range(n):
if not in_mst[v]:
d = abs(points[u][0] - points[v][0]) + abs(points[u][1] - points[v][1])
min_dist[v] = min(min_dist[v], d)
return total
# Test
points = [[0,0],[2,2],[3,10],[5,2],[7,0]]
print(minCostConnectPoints(points)) # 20
print(minCostConnectPoints_prim(points)) # 20
Performance comparison:
- Kruskal: O(n² log n) time, O(n²) space (store all edges)
- Prim (no heap): O(n²) time, O(n) space
- For this problem with n ≤ 1000, both pass. But Prim without heap is optimal for complete graphs.
18.22 Common Pitfalls and Interview Tips Summary
Pitfall 1: Dijkstra with negative edges
# Classic counterexample
# 0 -> 1 (weight 1)
# 0 -> 2 (weight 5)
# 1 -> 2 (weight -10)
# Dijkstra: finalizes dist[1]=1, then dist[2]=5 is finalized
# Actual shortest: 0->1->2 = 1+(-10) = -9
In interviews: if the problem guarantees non-negative weights, use Dijkstra; otherwise use Bellman-Ford.
Pitfall 2: Floyd-Warshall loop order
# Correct: k is the outermost loop
for k in range(n):
for i in range(n):
for j in range(n):
...
# Wrong: k not outermost produces incorrect results
# Because dp[k] depends on dp[k-1], k must be the outer loop
Pitfall 3: MST ≠ Shortest Path Tree
A --1-- B --1-- C
| |
3 3
| |
D ------1------ E
MST includes A-B, B-C, D-E, and A-D or C-E (total weight 6)
But shortest path from A to E is A-B-C-E (weight 5), not following MST edges
Pitfall 4: Graph not connected → MST doesn't exist
def kruskal_safe(n, edges):
total, mst = kruskal(n, edges)
if len(mst) < n - 1:
return -1 # Graph not connected, MST doesn't exist
return total
Interview template summary:
| Problem Type | Algorithm | Key Code |
|---|---|---|
| SSSP (positive weights) | Dijkstra | heapq + dist array |
| SSSP (negative weights) | Bellman-Ford/SPFA | V-1 rounds relaxation |
| APSP | Floyd-Warshall | Triple loop, k outermost |
| MST | Kruskal | Sort + Union-Find |
| MST (dense graph) | Prim | min_dist array |
| Minimax path | Dijkstra variant | max instead of + |
| Connectivity check | Union-Find/BFS | union + find |
Time complexity quick reference:
| Algorithm | Time | Space | Applicability |
|---|---|---|---|
| Dijkstra (heap) | O((V+E) log V) | O(V+E) | Non-negative weights |
| Bellman-Ford | O(VE) | O(V) | Any weights (detects negative cycles) |
| SPFA | Avg O(kE), worst O(VE) | O(V+E) | Any weights |
| Floyd-Warshall | O(V³) | O(V²) | APSP, small V |
| Prim (heap) | O((V+E) log V) | O(V+E) | Undirected connected |
| Prim (no heap) | O(V²) | O(V) | Dense graphs |
| Kruskal | O(E log E) | O(V+E) | Undirected connected |