Complexity Theory: P, NP and NP-Complete
Chapter 37: Complexity Theory — P, NP and NP-Complete
You are asked to find the shortest route visiting 100 cities (the Traveling Salesman Problem). You tried greedy, dynamic programming, various heuristics, but found no polynomial-time exact algorithm. You suspect "maybe this problem simply has no efficient algorithm." But how do you prove something "doesn't exist"?
This is the core question complexity theory aims to answer: how hard is a problem? Specifically — are some problems inherently without efficient algorithms, rather than "we just haven't found one yet"?
P = NP is the central open problem in computer science and one of the seven Millennium Prize Problems (Clay Mathematics Institute, $1 million prize). This chapter is not about solving this problem — it's about understanding what it says, why it matters, and how to respond in interviews when you encounter NP-hard problems.
Level 1 · What You Need to Know
1.1 Class P
Definition: P (Polynomial time) is the set of decision problems that can be solved in polynomial time.
A "decision problem" has a YES or NO answer. For example, "Does this graph have a path from s to t?" rather than "What is the shortest path length?" (though the latter can usually be converted to the former via binary search).
"Polynomial time" means there exists an algorithm with running time O(n^k), where n is input size and k is some constant.
Typical problems in P:
| Problem | Algorithm | Time |
|---|---|---|
| Sorting | Merge sort | O(n log n) |
| Shortest path | Dijkstra | O((V+E) log V) |
| Maximum flow | Ford-Fulkerson | O(VE^2) |
| Matching (bipartite) | Hungarian | O(V^3) |
| Primality testing | AKS | O(log^12 n) |
| Linear programming | Ellipsoid/Interior point | Polynomial |
| 2-SAT | Strongly connected components | O(V+E) |
Intuition: P problems are "tractable" — solvable exactly in reasonable time. Even O(n^5) may be slow in practice, but it at least remains "manageable" as hardware improves and inputs grow.
1.2 Class NP
Definition: NP (Nondeterministic Polynomial time) is the set of decision problems whose YES answers can be verified in polynomial time.
Specifically: if the answer is YES, there exists a "certificate" (witness) such that a polynomial-time verifier can check that this certificate indeed proves the answer is YES.
Key distinction:
- P: "Can find the answer in polynomial time"
- NP: "Can verify the answer in polynomial time"
Clearly P is a subset of NP (if you can find the answer quickly, you can certainly verify it quickly — find it, then verify). The question is whether P = NP or P is a strict subset of NP.
Typical problems in NP:
| Problem | Certificate | Verification |
|---|---|---|
| Graph coloring (k colors) | A specific coloring | Check each edge has different-colored endpoints |
| Hamiltonian cycle | A specific cycle | Check it visits each vertex exactly once |
| Subset sum | A specific subset | Check the subset sums to target |
| SAT | A specific assignment | Substitute and check all clauses are true |
| TSP (decision version) | A specific tour | Check total distance <= target |
Sudoku intuition:
- Solving sudoku: might be very hard (generalized n x n sudoku is NP-Complete)
- Verifying sudoku: given a filled grid, check each row/column/box is valid — O(n^2) suffices
1.3 NP-Complete (NPC)
Informal definition: NP-Complete problems are the "hardest" in NP — if any NPC problem has a polynomial-time algorithm, then ALL problems in NP have polynomial-time algorithms (i.e., P = NP).
Formal definition: Problem L is NP-Complete if and only if:
- L is in NP (answers can be verified in polynomial time)
- Every problem in NP can be polynomial-time reduced to L (L is at least as hard as anything in NP)
Classic NPC problems:
- SAT (Boolean Satisfiability)
- 3-SAT
- Graph Coloring (3+ colors)
- Hamiltonian Cycle/Path
- Subset Sum
- Knapsack (decision version)
- Traveling Salesman (decision version)
- Vertex Cover
- Set Cover
- Independent Set
Intuition: NPC problems form an "equivalence class" — either they ALL have polynomial algorithms, or NONE of them do. This is because a polynomial algorithm for any one NPC problem can be transferred to all others via reductions.
1.4 NP-Hard
Definition: Problem H is NP-Hard if and only if every problem in NP can be polynomial-time reduced to H.
Note: NP-Hard problems are not necessarily in NP! NP-Complete = NP-Hard intersect NP.
Examples of NP-Hard but not in NP:
- The Halting Problem (undecidable, not even in NP)
- Optimization version of TSP ("what is the shortest tour?" rather than "is there a tour of length <= k?") — answer is not YES/NO, not a decision problem
Relationships:
P is a subset of NP
NPC = NP-Hard intersect NP (the hardest problems in NP)
NP-Hard contains NPC (problems possibly even harder than NP count too)
If P = NP, then P = NP = NPC (all NP problems are tractable)
If P != NP, then P is strict subset of NP, and NPC not subset of P
1.5 Intuitive Understanding: Finding vs Verifying
Analogy: Imagine finding your car in a massive parking lot.
- P problem: The lot is well-organized (alphabetical order), you can find your car quickly
- NP problem: The lot is chaotic, you don't know how to find your car quickly, but if someone tells you "your car is at E-37," you can immediately go verify
- NP-Complete: If you invent a method to quickly find cars in chaotic lots, that method can solve ALL "search in chaotic environments" problems
- NP-Hard: Not only is the lot chaotic, but your "car" might not even exist (verifying an answer may itself be difficult)
1.6 Common Misconceptions
Misconception 1: "NP means 'Non-Polynomial'"
Wrong! NP stands for "Nondeterministic Polynomial" — problems solvable in polynomial time on a nondeterministic Turing machine. Many NP problems do have polynomial algorithms (all P problems are in NP).
Misconception 2: "NP problems are always hard"
Wrong! P is a subset of NP, so sorting (O(n log n)) is also an NP problem. NP is just an upper bound — it contains problems ranging from "easy" to "hard." The truly "hard" ones are NPC problems.
Misconception 3: "If a problem is NP-Hard, it's completely hopeless"
Wrong! NP-Hard means no known polynomial-time exact algorithm, but:
- Approximation algorithms (e.g., 2-approximation for TSP)
- Heuristics (e.g., genetic algorithms, simulated annealing)
- For small inputs, exponential algorithms are acceptable
- Some special cases have polynomial algorithms (e.g., independent set on trees)
Misconception 4: "All instances of NP-Hard problems are hard"
Wrong! NP-Hard is a worst-case concept. Many NPC problems' "typical" instances may be easy. For example:
- SAT in practice (industrial SAT solvers like MiniSat) can quickly solve most instances
- Knapsack with DP is pseudo-polynomial, fast when values are not too large
Level 2 · How It Works Under the Hood
2.1 The Concept of Reduction
Core idea: If problem A can be "reduced" to problem B (written A <=_p B), it means: if we have an algorithm for B, we can use it to solve A.
Polynomial-time reduction (Karp reduction): There exists a polynomial-time computable function f such that for any input x:
- x is a YES instance of A if and only if f(x) is a YES instance of B
# Conceptual example: Reducing Independent Set to Vertex Cover
#
# Independent Set: Does the graph have an independent set of size >= k?
# Vertex Cover: Does the graph have a vertex cover of size <= n-k?
#
# Reduction: S is an independent set iff V\S is a vertex cover
# So "is there independent set >= k" iff "is there vertex cover <= n-k"
def reduce_independent_set_to_vertex_cover(graph, k):
"""Reduce Independent Set to Vertex Cover
Input: Graph G=(V,E), parameter k
Output: Same graph G, parameter n-k
Original: Does G have independent set of size >= k?
Transformed: Does G have vertex cover of size <= |V|-k?
"""
n = len(graph.vertices)
return graph, n - k
# Correctness:
# If S is an independent set (no adjacent nodes in S),
# then every edge has at least one endpoint in V\S
# (otherwise both endpoints in S, contradiction)
# So V\S is a vertex cover. The reverse is similar.
Direction of reduction matters:
A <=_p B means B is at least as hard as A. If A is NPC and A <=_p B and B is in NP, then B is also NPC.
Intuition: Reduction says "the ability to solve B implies the ability to solve A." So if A is hard (no polynomial algorithm), then B is at least as hard.
Transitivity of reductions:
If A <=_p B and B <=_p C, then A <=_p C.
This is why we only need to show a known NPC problem reduces to a new problem to prove the new problem is NPC — by transitivity, all NP problems (through the known NPC problem as a bridge) reduce to the new problem.
2.2 The SAT Problem
SAT (Boolean Satisfiability): Given a Boolean formula (usually in Conjunctive Normal Form, CNF), is there a variable assignment that makes the formula true?
CNF form: (x_1 OR NOT x_2 OR x_3) AND (NOT x_1 OR x_2) AND (x_2 OR NOT x_3)
- Each parenthesized group is a "clause" — a disjunction (OR) of literals
- The entire formula is a conjunction (AND) of all clauses
- Need to find an assignment making all clauses satisfied
SAT was the first problem proved NPC (Cook-Levin Theorem, 1971).
3-SAT: Special case where each clause has exactly 3 literals. Also NPC (SAT reduces to 3-SAT).
2-SAT: Each clause has exactly 2 literals. 2-SAT is in P! Solvable in O(V+E) using strongly connected components.
# 2-SAT solver (polynomial time!)
def solve_2sat(num_vars: int, clauses: list) -> list:
"""2-SAT Solver
Each clause is (a, b), representing (a OR b)
Variables are positive integers, negations are negative
Time: O(V + E) where V = 2n, E = 2m
Key insight: (a OR b) is equivalent to (NOT a -> b) AND (NOT b -> a)
Build implication graph, check for contradictions using SCC
"""
def var_idx(x):
if x > 0:
return 2 * (x - 1)
else:
return 2 * (-x - 1) + 1
def neg_idx(idx):
return idx ^ 1
n = 2 * num_vars
graph = [[] for _ in range(n)]
reverse_graph = [[] for _ in range(n)]
for a, b in clauses:
na = neg_idx(var_idx(a))
nb = neg_idx(var_idx(b))
va = var_idx(a)
vb = var_idx(b)
graph[na].append(vb) # NOT a -> b
graph[nb].append(va) # NOT b -> a
reverse_graph[vb].append(na)
reverse_graph[va].append(nb)
# Kosaraju's algorithm for SCC
visited = [False] * n
order = []
def dfs1(v):
stack = [(v, False)]
while stack:
node, processed = stack.pop()
if processed:
order.append(node)
continue
if visited[node]:
continue
visited[node] = True
stack.append((node, True))
for u in graph[node]:
if not visited[u]:
stack.append((u, False))
for i in range(n):
if not visited[i]:
dfs1(i)
comp = [-1] * n
comp_id = 0
visited = [False] * n
def dfs2(v, c):
stack = [v]
while stack:
node = stack.pop()
if visited[node]:
continue
visited[node] = True
comp[node] = c
for u in reverse_graph[node]:
if not visited[u]:
stack.append(u)
for v in reversed(order):
if not visited[v]:
dfs2(v, comp_id)
comp_id += 1
# Check contradiction: x and NOT x in same SCC -> no solution
for i in range(num_vars):
if comp[2 * i] == comp[2 * i + 1]:
return None
# Assignment: comp[x] > comp[NOT x] -> x = True
result = [False] * (num_vars + 1)
for i in range(num_vars):
result[i + 1] = comp[2 * i] > comp[2 * i + 1]
return result
The "phase transition" between 2-SAT and 3-SAT:
- 2-SAT: Polynomial (in P)
- 3-SAT: NP-Complete
- The only difference: 2 literals per clause vs 3
This seemingly minor difference causes a qualitative leap in complexity. Similar "phase transitions":
- 2-coloring vs 3-coloring (P vs NPC)
- Matching vs 3-dimensional matching (P vs NPC)
2.3 Why P != NP Is a Millennium Problem
If P = NP (almost all experts disbelieve this):
- All modern cryptography collapses (RSA, elliptic curves are based on NP-Hard problems)
- All creative activity can be automated (verification equals creation)
- Mathematical proofs can be automatically discovered
- Optimization problems (logistics, scheduling, design) can be solved exactly and efficiently
Why proving P != NP is so hard:
Proving P != NP requires showing some NPC problem has no polynomial algorithm. This is a "non-existence" proof — you must rule out all possible algorithms, not construct one specific algorithm.
Known barriers:
- Relativization barrier (Baker, Gill, Solovay, 1975): There exist oracles A with P^A = NP^A and B with P^B != NP^B. So diagonalization-style methods cannot resolve this.
- Natural Proofs barrier (Razborov, Rudich, 1994): If one-way functions exist (a basic cryptographic assumption), then certain "natural" proof methods cannot prove P != NP.
- Algebrization barrier (Aaronson, Wigderson, 2009): Even stronger — arithmetization methods are also insufficient.
These barriers don't say P != NP is unprovable — they say entirely new techniques are needed, beyond all known methods developed over decades of complexity theory.
2.4 Practical Examples of Reduction
Example: Proving Vertex Cover is NPC
Known: 3-SAT is NPC. Construct a reduction from 3-SAT to Vertex Cover.
Given a 3-SAT instance with n variables x_1,...,x_n and m clauses C_1,...,C_m:
Construct graph G:
- For each variable x_i, create edge (x_i, NOT x_i) ("variable gadget")
- For each clause C_j = (l_1 OR l_2 OR l_3), create a triangle (a, b, c) and edges (a,l_1), (b,l_2), (c,l_3) ("clause gadget")
Set k = n + 2m.
Proof: 3-SAT is satisfiable if and only if G has a vertex cover of size <= k.
- (=>) Satisfying assignment -> for each variable gadget, select the TRUE literal's node (n total); for each clause gadget, select two triangle nodes not connected to the "satisfied literal" (2m total). Total: n+2m = k.
- (<=) Cover of size <= k -> each variable gadget needs at least 1 (total n), each triangle needs at least 2 (total 2m), so exactly k. From the selection, extract a consistent satisfying assignment.
2.5 Recognizing NP-Complete Problems in Practice
In interviews or actual work, how do you recognize that a problem is NP-Hard?
Pattern recognition:
| Problem feature | Likely NPC | P variant |
|---|---|---|
| Graph coloring | k >= 3 colors | 2 colors (bipartiteness) |
| SAT | 3-SAT and above | 2-SAT, Horn-SAT |
| Path/cycle | Hamiltonian cycle | Eulerian cycle (O(V+E)) |
| Subset selection | Subset sum, knapsack | Greedy-solvable variants |
| Scheduling | General scheduling | Special structure (e.g., single machine EDD) |
| Matching | 3D matching | 2D matching (Hungarian) |
The key difference is often the "dimension" of constraints:
- 2 choices vs 3 choices
- 2D vs 3D
- Special structure vs general structure
# In practice: quick heuristic check for NP-Hardness
def might_be_np_hard(problem_description: str) -> str:
"""Heuristic judgment (not rigorous)"""
np_hard_keywords = [
"all subsets", "all permutations", "all assignments",
"optimal partition", "minimum colors", "longest path",
"exact cover", "minimum set cover",
"traveling salesman", "knapsack", "scheduling"
]
p_keywords = [
"shortest path", "maximum flow", "minimum spanning tree",
"sorting", "binary search", "bipartite matching",
"topological sort", "strongly connected components"
]
for keyword in np_hard_keywords:
if keyword in problem_description:
return "Likely NP-Hard, consider approximation or heuristics"
for keyword in p_keywords:
if keyword in problem_description:
return "Likely in P, look for polynomial algorithm"
return "Needs further analysis"
2.6 Pseudo-polynomial Time and Strong NPC
Pseudo-polynomial time: Running time is polynomial in the numerical value of the input (not the encoding length).
Classic example: Knapsack DP solution, time O(nW). Here W is the knapsack capacity value. If W is encoded in b bits, then W can be up to 2^b, so O(nW) = O(n * 2^b) — exponential in input length.
Weak NPC vs Strong NPC:
- Weakly NP-Complete: Has a pseudo-polynomial time algorithm. Examples: Subset Sum, Knapsack
- Strongly NP-Complete: Even if all numerical values are unary-encoded (size only poly(n)), the problem remains NPC. Examples: 3-SAT, Graph Coloring, TSP
# Knapsack: Weakly NPC (has pseudo-polynomial algorithm)
def knapsack_dp(weights: list, values: list, capacity: int) -> int:
"""0/1 Knapsack - Dynamic Programming
Time: O(n * W) — pseudo-polynomial
Polynomial when W is poly(n), but W can be exponentially large
"""
n = len(weights)
dp = [0] * (capacity + 1)
for i in range(n):
for w in range(capacity, weights[i] - 1, -1):
dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
return dp[capacity]
# Contrast: 3-SAT has no known pseudo-polynomial algorithm (strongly NPC)
Level 3 · What the Specifications Say
3.1 Cook-Levin Theorem (1971)
Theorem (Stephen Cook, 1971; Leonid Levin, 1973 independently):
SAT is NP-Complete. That is, every problem in NP can be reduced to SAT in polynomial time.
Proof sketch (highly simplified):
Any NP problem L has a polynomial-time verifier V. Given input x and certificate y, V(x, y) decides to accept or reject in polynomial steps.
V is a Turing machine whose computation can be represented as a "computation tableau":
- Rows = time steps t (at most p(|x|) steps)
- Columns = tape positions i (at most p(|x|))
- Each cell encodes the current state and tape content
Key insight: The Turing machine's transition function is local — each cell's value depends only on 3 adjacent cells in the previous row. Such "local constraints" can be encoded as CNF clauses.
Construction:
- Introduce Boolean variables for each cell (encoding state/symbol)
- Initial conditions: row 0 encodes input x
- Transition rules: each cell's relationship to 3 cells above encoded in CNF
- Termination: last row is in accept state
Total formula size is O(p(|x|)^2 * log|Sigma|) = polynomial.
Significance: This establishes a "benchmark" — once SAT is proved NPC, subsequent NPC proofs only need reductions from SAT (or other known NPC problems), without repeating this complex "Turing machine simulation" argument.
3.2 Karp's 21 NP-Complete Problems
Richard Karp's 1972 landmark paper "Reducibility Among Combinatorial Problems" proved 21 classic combinatorial optimization problems are NPC through a chain of reductions from SAT:
SAT
├── 3-SAT
│ ├── Graph Coloring
│ ├── Exact Cover
│ │ ├── 3D Matching
│ │ └── Steiner Tree
│ ├── Vertex Cover
│ │ ├── Independent Set
│ │ └── Clique
│ ├── Hamiltonian Cycle
│ │ └── TSP
│ └── Subset Sum
│ └── Knapsack
│ └── Job Scheduling
├── Integer Programming
└── ...
Karp's paper established the practical foundation of NPC theory. Before it, Cook-Levin only proved SAT is NPC — one problem. Karp demonstrated NPC problems pervade all areas of combinatorial optimization, suggesting "computational hardness" is a universal phenomenon, not a peculiarity of individual problems.
Complete reduction chain example:
Proving Independent Set is NPC:
1. Known: 3-SAT is NPC (Cook-Levin)
2. Construction: polynomial reduction from 3-SAT to Independent Set
- For each clause, create a 3-node group (corresponding to 3 literals)
- Nodes within same clause are interconnected (triangle)
- Complementary literals connected by edges (x_i and NOT x_i)
- Question: is there an independent set of size = m (number of clauses)?
3. Correctness:
- 3-SAT satisfiable -> pick one TRUE literal per clause
-> these nodes form size-m independent set
- Size-m independent set exists -> exactly one node per clause group
-> extract consistent satisfying assignment
3.3 The Turing Machine Model
Formal definition: A Turing machine M = (Q, Sigma, Gamma, delta, q_0, q_accept, q_reject) where:
- Q: finite set of states
- Sigma: input alphabet
- Gamma: tape alphabet (Sigma subset of Gamma, Gamma includes blank symbol)
- delta: Q x Gamma -> Q x Gamma x {L, R} (transition function)
- q_0: initial state
- q_accept, q_reject: accept/reject states
Nondeterministic Turing Machine (NTM): delta becomes Q x Gamma -> P(Q x Gamma x {L, R}) — multiple possible transitions per step. NTM accepts input x if and only if there exists some computation path reaching q_accept.
Equivalent definitions of NP:
- Exists a polynomial-time verifier (the definition above)
- Exists a nondeterministic Turing machine that accepts in polynomial time
Equivalence intuition: NTM's "guess" is the certificate, "verification" is the deterministic computation along the guessed path.
Why Turing machines instead of "real computers"?
The Church-Turing thesis tells us: all "reasonable" computational models (RAM model, lambda calculus, recursive functions, etc.) are equivalent at the computability level. At the polynomial-time level, the "Extended Church-Turing thesis" states: all reasonable models differ by at most polynomial factors. So defining complexity classes via Turing machines is universal.
3.4 coNP and Asymmetry of Evidence
coNP: The complement of NP — if the answer is NO, it can be verified in polynomial time.
Examples:
- "Is this formula unsatisfiable?" (UNSAT) is in coNP
- If the formula is unsatisfiable, what's the certificate? May require exponentially long proofs
- But if satisfiable, a satisfying assignment is the certificate (that's NP)
NP intersect coNP: Problems where both YES and NO can be efficiently verified.
- Primality testing was proved in P (AKS, 2002), hence also in NP intersect coNP
- Linear programming is in P
- Does NP intersect coNP = P? Unknown, but widely conjectured yes
3.5 Exponential Time Hypothesis (ETH)
ETH (Impagliazzo & Paturi, 2001):
3-SAT has no 2^o(n) time algorithm (n = number of variables).
That is: solving 3-SAT requires exponential time in the worst case; only the base of the exponential might be less than 2.
Current best algorithms:
- Brute force: O(2^n)
- Randomized PPSZ (Paturi, Pudlak, Saks, Zane, 2005): O(1.308^n) for 3-SAT
- Schoning 1999: O((2(k-1)/k)^n) for k-SAT
Significance of ETH: It provides finer lower bounds. If ETH holds:
- No O(2^o(n)) algorithm for 3-SAT
- No O(n^o(k)) algorithm for k-Clique
- No 2^o(n) algorithm for TSP
These "fine-grained complexity" results provide practical guidance for algorithm design.
Level 4 · Edge Cases and Interview Problems
4.1 Approximation Algorithm Ideas
When facing NP-Hard problems, give up exact optimal solutions and accept "near-optimal" ones.
Approximation Ratio: For a minimization problem, if algorithm outputs A(I) and optimal is OPT(I):
- Approximation ratio alpha >= 1 satisfies A(I) / OPT(I) <= alpha for all inputs I
Classic approximation algorithms:
# Vertex Cover 2-approximation
def vertex_cover_2approx(graph: dict) -> set:
"""Vertex Cover 2-approximation
Greedy: each time pick an edge, add both endpoints to cover
Approximation ratio: 2 (at most twice optimal)
Time: O(V + E)
Why 2-approximation?
We selected 2k vertices (k edges x 2)
Optimal needs at least k (each selected edge needs one endpoint covered)
So |A| / OPT <= 2k / k = 2
"""
cover = set()
remaining_edges = set()
for u in graph:
for v in graph[u]:
if (u, v) not in remaining_edges and (v, u) not in remaining_edges:
remaining_edges.add((u, v))
for u, v in remaining_edges:
if u not in cover and v not in cover:
cover.add(u)
cover.add(v)
return cover
# TSP 2-approximation (requires triangle inequality)
def tsp_2approx(dist_matrix: list) -> list:
"""TSP 2-approximation (metric TSP)
Algorithm:
1. Construct Minimum Spanning Tree (MST)
2. DFS on MST, record visit order
3. Skip already-visited nodes (take shortcuts)
Ratio: 2 (requires triangle inequality)
Why? MST <= OPT (deleting one TSP tour edge gives a spanning tree)
DFS traverses each MST edge twice = 2*MST <= 2*OPT
Shortcuts only make it shorter (triangle inequality)
Time: O(n^2 log n) (Prim + DFS)
"""
n = len(dist_matrix)
# 1. Prim's MST
import heapq
visited = [False] * n
mst_adj = [[] for _ in range(n)]
heap = [(0, 0, -1)]
while heap:
cost, u, parent = heapq.heappop(heap)
if visited[u]:
continue
visited[u] = True
if parent >= 0:
mst_adj[u].append(parent)
mst_adj[parent].append(u)
for v in range(n):
if not visited[v]:
heapq.heappush(heap, (dist_matrix[u][v], v, u))
# 2. DFS preorder traversal of MST
tour = []
visited = [False] * n
stack = [0]
while stack:
node = stack.pop()
if visited[node]:
continue
visited[node] = True
tour.append(node)
for neighbor in mst_adj[node]:
if not visited[neighbor]:
stack.append(neighbor)
tour.append(tour[0]) # return to start
return tour
Inapproximability:
Some NP-Hard problems have no good approximation algorithms (unless P=NP). Examples:
- General TSP (no triangle inequality): no constant approximation ratio exists
- MAX-3SAT: optimal ratio is 7/8 (Hastad, 1997) — random assignment already achieves this
- Set Cover: optimal ratio is Theta(log n) (Feige, 1998)
4.2 Heuristic Algorithms
When approximation ratios are insufficient or problem structure doesn't allow guarantees:
import random
import math
# Simulated Annealing for TSP
def simulated_annealing_tsp(dist_matrix: list,
initial_temp: float = 10000,
cooling_rate: float = 0.9999,
min_temp: float = 1e-8) -> tuple:
"""Simulated Annealing for TSP
No optimality guarantee, but works well in practice
Core idea:
- High temperature: accept more "worsening" moves, explore search space
- Low temperature: almost only accept improvements, converge to local optimum
"""
n = len(dist_matrix)
current = list(range(n))
random.shuffle(current)
def tour_cost(tour):
return sum(dist_matrix[tour[i]][tour[(i+1) % n]] for i in range(n))
current_cost = tour_cost(current)
best = current[:]
best_cost = current_cost
temp = initial_temp
while temp > min_temp:
# Generate neighbor: 2-opt (reverse a segment)
i = random.randint(0, n - 2)
j = random.randint(i + 1, n - 1)
new_tour = current[:i+1] + current[i+1:j+1][::-1] + current[j+1:]
new_cost = tour_cost(new_tour)
delta = new_cost - current_cost
# Metropolis criterion
if delta < 0 or random.random() < math.exp(-delta / temp):
current = new_tour
current_cost = new_cost
if current_cost < best_cost:
best = current[:]
best_cost = current_cost
temp *= cooling_rate
return best, best_cost
# Genetic Algorithm framework
def genetic_algorithm_framework(population_size: int = 100,
generations: int = 1000,
mutation_rate: float = 0.01):
"""Genetic Algorithm general framework
Applicable to various NP-Hard optimization problems
Core operations:
1. Selection: higher fitness individuals more likely to reproduce
2. Crossover: two parents combine to produce offspring
3. Mutation: random small changes to maintain diversity
"""
# Pseudocode framework
"""
population = initialize_random_population(population_size)
for gen in range(generations):
fitness = evaluate_all(population)
new_population = []
elite = select_top(population, fitness, k=2)
new_population.extend(elite)
while len(new_population) < population_size:
parent1 = tournament_select(population, fitness)
parent2 = tournament_select(population, fitness)
child = crossover(parent1, parent2)
if random.random() < mutation_rate:
child = mutate(child)
new_population.append(child)
population = new_population
return best_individual(population)
"""
pass
4.3 How to Answer "This Problem Is NP-Hard" in Interviews
The interviewer gives you an optimization problem that you believe is NP-Hard. How to respond?
Step 1: Identify and declare
"This problem appears to be NP-Hard. It's similar to [known NPC problem] because [shared structure]."
Example responses:
- "This looks like a variant of TSP / Set Cover / Graph Coloring..."
- "It requires enumerating all subsets/permutations, suggesting exponential complexity..."
Step 2: Verify understanding
"Let me confirm the constraints: [restate constraints]. If we relax certain constraints, are there special cases solvable efficiently?"
Possible simplifications:
- Small input (n <= 20) -> bitmask DP (O(2^n * n))
- Special structure (tree, DAG) -> might have DP solution
- Only approximate solution needed -> greedy / approximation algorithm
Step 3: Propose feasible approaches
# Common strategies for NP-Hard problems in interviews
def interview_np_hard_strategy(problem_type: str, n: int):
"""Approach for NP-Hard problems in interviews"""
if n <= 20:
return "Bitmask DP (O(2^n * n))"
elif n <= 30:
return "Meet in the Middle (O(2^(n/2)))"
elif n <= 50:
return "Branch and Bound"
else:
strategies = [
"Greedy heuristic + prove approximation ratio",
"Dynamic programming (if pseudo-polynomial exists)",
"Special case analysis (what structure does input have?)",
"Simulated annealing / genetic algorithm (engineering practice)",
"Known approximation algorithm"
]
return strategies
Practical interview example:
Interviewer: "Given n tasks and k machines, each task has a processing time. Assign tasks to machines to minimize the maximum completion time (makespan)."
Answer: "This is the P||Cmax problem, known to be NP-Hard (reducible from Partition).
For practical solutions:
- If n and k are small (n <= 20): bitmask DP
- Greedy approximation: LPT (Longest Processing Time first) — sort by processing time descending, assign each to the lightest-loaded machine. Ratio: 4/3 - 1/(3k) (Graham, 1969)
- For better approximation, a PTAS exists (Hochbaum & Shmoys, 1987) but is complex to implement"
def lpt_scheduling(tasks: list, k: int) -> list:
"""LPT (Longest Processing Time First) Scheduling
Assign n tasks to k machines, minimize makespan
Approximation ratio: 4/3 - 1/(3k)
Time: O(n log n)
"""
import heapq
sorted_tasks = sorted(enumerate(tasks), key=lambda x: -x[1])
machines = [(0, i, []) for i in range(k)]
heapq.heapify(machines)
for task_id, time in sorted_tasks:
load, machine_id, task_list = heapq.heappop(machines)
task_list.append(task_id)
heapq.heappush(machines, (load + time, machine_id, task_list))
return machines
4.4 Practical Impact of P vs NP
Impact on cryptography:
If P = NP is proved:
- RSA: Based on difficulty of factoring (not known to be NPC, but widely believed not in P)
- AES: Security not directly based on NPC, but P=NP would shake all hardness assumptions
- Zero-knowledge proofs: Their existence is equivalent to P != NP
If P != NP is proved:
- Cryptography gains theoretical foundation
- But specific problem hardness proofs still needed (NPC != "cryptographically secure")
Impact on machine learning:
- Many learning problems are NP-Hard (e.g., optimal neural network training)
- But in practice, SGD and other heuristics work well — suggesting "typical instances" are much easier than worst case
- Some PAC learning hardness results rely on P != NP assumption
4.5 Quantum Computing and Complexity
BQP (Bounded-Error Quantum Polynomial Time): Problems efficiently solvable by quantum computers.
Known relationships: P is a subset of BQP, which is a subset of PSPACE.
Key question: Is NP a subset of BQP? Almost all experts say no.
- Shor's algorithm (1994): Polynomial-time integer factorization (breaks RSA) — but factoring is not known to be NPC
- Grover's algorithm (1996): Unstructured search speedup from O(N) to O(sqrt(N)) — quadratic, not exponential
- If NP were a subset of BQP, quantum computers could solve all NPC problems — no evidence supports this
Intuition: Quantum computing provides "limited speedup" rather than a "universal key." It significantly accelerates certain structured problems (factoring, search) but is unlikely to solve general NPC problems.
4.6 Handling NP-Hard Problems in Real Systems
Industrial SAT Solvers (e.g., MiniSat, Z3, CaDiCaL):
Although SAT is NPC, modern SAT solvers can solve industrial instances with millions of variables in seconds. Why:
- DPLL + CDCL (Conflict-Driven Clause Learning)
- Heuristic variable selection (VSIDS)
- Clause database management
- Random restarts
This is not a contradiction — NPC is a worst-case notion; industrial instances have exploitable structure.
Integer Linear Programming (ILP) solvers (e.g., Gurobi, CPLEX):
ILP is NPC, but modern solvers combine:
- LP relaxation + branch and bound
- Cutting planes
- Preprocessing / Presolve
- Heuristics (large neighborhood search, etc.)
For practical-scale scheduling, routing, and resource allocation problems, they typically find optimal or near-optimal solutions in reasonable time.
4.7 Interview Summary: Complexity Theory Practical Checklist
| Scenario | Strategy |
|---|---|
| "What's this problem's complexity?" | First analyze if known polynomial algorithms apply |
| You notice similarity to NPC problem | Name the similar NPC problem, explain similarity |
| "Assume n is small" | Brute force / bitmask DP / backtracking with pruning |
| "Don't need optimal" | Greedy approximation, state the ratio |
| "Can you prove it's NP-Hard?" | Reduce from known NPC (full proof rarely required in interview) |
| "How is this handled in real systems?" | SAT solver / ILP solver / heuristics + practical experience |
Memory aid:
P = can solve it; NP = can verify it
NPC = hardest in NP; NP-Hard = at least as hard (or harder)
Reduction direction matters: from known-hard TO new problem (proves new is hard)
P != NP unproven, but almost everyone believes it
Facing NP-Hard: recognize -> declare -> propose alternatives