Chapter 37

Complexity Theory: P, NP and NP-Complete

Chapter 37: Complexity Theory — P, NP and NP-Complete

You are asked to find the shortest route visiting 100 cities (the Traveling Salesman Problem). You tried greedy, dynamic programming, various heuristics, but found no polynomial-time exact algorithm. You suspect "maybe this problem simply has no efficient algorithm." But how do you prove something "doesn't exist"?

This is the core question complexity theory aims to answer: how hard is a problem? Specifically — are some problems inherently without efficient algorithms, rather than "we just haven't found one yet"?

P = NP is the central open problem in computer science and one of the seven Millennium Prize Problems (Clay Mathematics Institute, $1 million prize). This chapter is not about solving this problem — it's about understanding what it says, why it matters, and how to respond in interviews when you encounter NP-hard problems.

Level 1 · What You Need to Know

1.1 Class P

Definition: P (Polynomial time) is the set of decision problems that can be solved in polynomial time.

A "decision problem" has a YES or NO answer. For example, "Does this graph have a path from s to t?" rather than "What is the shortest path length?" (though the latter can usually be converted to the former via binary search).

"Polynomial time" means there exists an algorithm with running time O(n^k), where n is input size and k is some constant.

Typical problems in P:

Problem	Algorithm	Time
Sorting	Merge sort	O(n log n)
Shortest path	Dijkstra	O((V+E) log V)
Maximum flow	Ford-Fulkerson	O(VE^2)
Matching (bipartite)	Hungarian	O(V^3)
Primality testing	AKS	O(log^12 n)
Linear programming	Ellipsoid/Interior point	Polynomial
2-SAT	Strongly connected components	O(V+E)

Intuition: P problems are "tractable" — solvable exactly in reasonable time. Even O(n^5) may be slow in practice, but it at least remains "manageable" as hardware improves and inputs grow.

1.2 Class NP

Definition: NP (Nondeterministic Polynomial time) is the set of decision problems whose YES answers can be verified in polynomial time.

Specifically: if the answer is YES, there exists a "certificate" (witness) such that a polynomial-time verifier can check that this certificate indeed proves the answer is YES.

Key distinction:

P: "Can find the answer in polynomial time"
NP: "Can verify the answer in polynomial time"

Clearly P is a subset of NP (if you can find the answer quickly, you can certainly verify it quickly — find it, then verify). The question is whether P = NP or P is a strict subset of NP.

Typical problems in NP:

Problem	Certificate	Verification
Graph coloring (k colors)	A specific coloring	Check each edge has different-colored endpoints
Hamiltonian cycle	A specific cycle	Check it visits each vertex exactly once
Subset sum	A specific subset	Check the subset sums to target
SAT	A specific assignment	Substitute and check all clauses are true
TSP (decision version)	A specific tour	Check total distance <= target

Sudoku intuition:

Solving sudoku: might be very hard (generalized n x n sudoku is NP-Complete)
Verifying sudoku: given a filled grid, check each row/column/box is valid — O(n^2) suffices

1.3 NP-Complete (NPC)

Informal definition: NP-Complete problems are the "hardest" in NP — if any NPC problem has a polynomial-time algorithm, then ALL problems in NP have polynomial-time algorithms (i.e., P = NP).

Formal definition: Problem L is NP-Complete if and only if:

L is in NP (answers can be verified in polynomial time)
Every problem in NP can be polynomial-time reduced to L (L is at least as hard as anything in NP)

Classic NPC problems:

SAT (Boolean Satisfiability)
3-SAT
Graph Coloring (3+ colors)
Hamiltonian Cycle/Path
Subset Sum
Knapsack (decision version)
Traveling Salesman (decision version)
Vertex Cover
Set Cover
Independent Set

Intuition: NPC problems form an "equivalence class" — either they ALL have polynomial algorithms, or NONE of them do. This is because a polynomial algorithm for any one NPC problem can be transferred to all others via reductions.

1.4 NP-Hard

Definition: Problem H is NP-Hard if and only if every problem in NP can be polynomial-time reduced to H.

Note: NP-Hard problems are not necessarily in NP! NP-Complete = NP-Hard intersect NP.

Examples of NP-Hard but not in NP:

The Halting Problem (undecidable, not even in NP)
Optimization version of TSP ("what is the shortest tour?" rather than "is there a tour of length <= k?") — answer is not YES/NO, not a decision problem

Relationships:

P is a subset of NP
NPC = NP-Hard intersect NP (the hardest problems in NP)
NP-Hard contains NPC (problems possibly even harder than NP count too)

If P = NP, then P = NP = NPC (all NP problems are tractable)
If P != NP, then P is strict subset of NP, and NPC not subset of P

1.5 Intuitive Understanding: Finding vs Verifying

Analogy: Imagine finding your car in a massive parking lot.

P problem: The lot is well-organized (alphabetical order), you can find your car quickly
NP problem: The lot is chaotic, you don't know how to find your car quickly, but if someone tells you "your car is at E-37," you can immediately go verify
NP-Complete: If you invent a method to quickly find cars in chaotic lots, that method can solve ALL "search in chaotic environments" problems
NP-Hard: Not only is the lot chaotic, but your "car" might not even exist (verifying an answer may itself be difficult)

1.6 Common Misconceptions

Misconception 1: "NP means 'Non-Polynomial'"

Wrong! NP stands for "Nondeterministic Polynomial" — problems solvable in polynomial time on a nondeterministic Turing machine. Many NP problems do have polynomial algorithms (all P problems are in NP).

Misconception 2: "NP problems are always hard"

Wrong! P is a subset of NP, so sorting (O(n log n)) is also an NP problem. NP is just an upper bound — it contains problems ranging from "easy" to "hard." The truly "hard" ones are NPC problems.

Misconception 3: "If a problem is NP-Hard, it's completely hopeless"

Wrong! NP-Hard means no known polynomial-time exact algorithm, but:

Approximation algorithms (e.g., 2-approximation for TSP)
Heuristics (e.g., genetic algorithms, simulated annealing)
For small inputs, exponential algorithms are acceptable
Some special cases have polynomial algorithms (e.g., independent set on trees)

Misconception 4: "All instances of NP-Hard problems are hard"

Wrong! NP-Hard is a worst-case concept. Many NPC problems' "typical" instances may be easy. For example:

SAT in practice (industrial SAT solvers like MiniSat) can quickly solve most instances
Knapsack with DP is pseudo-polynomial, fast when values are not too large

Level 2 · How It Works Under the Hood

2.1 The Concept of Reduction

Core idea: If problem A can be "reduced" to problem B (written A <=_p B), it means: if we have an algorithm for B, we can use it to solve A.

Polynomial-time reduction (Karp reduction): There exists a polynomial-time computable function f such that for any input x:

x is a YES instance of A if and only if f(x) is a YES instance of B

# Conceptual example: Reducing Independent Set to Vertex Cover
#
# Independent Set: Does the graph have an independent set of size >= k?
# Vertex Cover: Does the graph have a vertex cover of size <= n-k?
#
# Reduction: S is an independent set iff V\S is a vertex cover
# So "is there independent set >= k" iff "is there vertex cover <= n-k"

def reduce_independent_set_to_vertex_cover(graph, k):
    """Reduce Independent Set to Vertex Cover
    
    Input: Graph G=(V,E), parameter k
    Output: Same graph G, parameter n-k
    
    Original: Does G have independent set of size >= k?
    Transformed: Does G have vertex cover of size <= |V|-k?
    """
    n = len(graph.vertices)
    return graph, n - k

# Correctness:
# If S is an independent set (no adjacent nodes in S),
# then every edge has at least one endpoint in V\S
# (otherwise both endpoints in S, contradiction)
# So V\S is a vertex cover. The reverse is similar.

Direction of reduction matters:

A <=_p B means B is at least as hard as A. If A is NPC and A <=_p B and B is in NP, then B is also NPC.

Intuition: Reduction says "the ability to solve B implies the ability to solve A." So if A is hard (no polynomial algorithm), then B is at least as hard.

Transitivity of reductions:

If A <=_p B and B <=_p C, then A <=_p C.

This is why we only need to show a known NPC problem reduces to a new problem to prove the new problem is NPC — by transitivity, all NP problems (through the known NPC problem as a bridge) reduce to the new problem.

2.2 The SAT Problem

SAT (Boolean Satisfiability): Given a Boolean formula (usually in Conjunctive Normal Form, CNF), is there a variable assignment that makes the formula true?

CNF form: (x_1 OR NOT x_2 OR x_3) AND (NOT x_1 OR x_2) AND (x_2 OR NOT x_3)

Each parenthesized group is a "clause" — a disjunction (OR) of literals
The entire formula is a conjunction (AND) of all clauses
Need to find an assignment making all clauses satisfied

SAT was the first problem proved NPC (Cook-Levin Theorem, 1971).

3-SAT: Special case where each clause has exactly 3 literals. Also NPC (SAT reduces to 3-SAT).

2-SAT: Each clause has exactly 2 literals. 2-SAT is in P! Solvable in O(V+E) using strongly connected components.

# 2-SAT solver (polynomial time!)
def solve_2sat(num_vars: int, clauses: list) -> list:
    """2-SAT Solver
    
    Each clause is (a, b), representing (a OR b)
    Variables are positive integers, negations are negative
    
    Time: O(V + E) where V = 2n, E = 2m
    
    Key insight: (a OR b) is equivalent to (NOT a -> b) AND (NOT b -> a)
    Build implication graph, check for contradictions using SCC
    """
    def var_idx(x):
        if x > 0:
            return 2 * (x - 1)
        else:
            return 2 * (-x - 1) + 1
    
    def neg_idx(idx):
        return idx ^ 1
    
    n = 2 * num_vars
    graph = [[] for _ in range(n)]
    reverse_graph = [[] for _ in range(n)]
    
    for a, b in clauses:
        na = neg_idx(var_idx(a))
        nb = neg_idx(var_idx(b))
        va = var_idx(a)
        vb = var_idx(b)
        
        graph[na].append(vb)  # NOT a -> b
        graph[nb].append(va)  # NOT b -> a
        reverse_graph[vb].append(na)
        reverse_graph[va].append(nb)
    
    # Kosaraju's algorithm for SCC
    visited = [False] * n
    order = []
    
    def dfs1(v):
        stack = [(v, False)]
        while stack:
            node, processed = stack.pop()
            if processed:
                order.append(node)
                continue
            if visited[node]:
                continue
            visited[node] = True
            stack.append((node, True))
            for u in graph[node]:
                if not visited[u]:
                    stack.append((u, False))
    
    for i in range(n):
        if not visited[i]:
            dfs1(i)
    
    comp = [-1] * n
    comp_id = 0
    visited = [False] * n
    
    def dfs2(v, c):
        stack = [v]
        while stack:
            node = stack.pop()
            if visited[node]:
                continue
            visited[node] = True
            comp[node] = c
            for u in reverse_graph[node]:
                if not visited[u]:
                    stack.append(u)
    
    for v in reversed(order):
        if not visited[v]:
            dfs2(v, comp_id)
            comp_id += 1
    
    # Check contradiction: x and NOT x in same SCC -> no solution
    for i in range(num_vars):
        if comp[2 * i] == comp[2 * i + 1]:
            return None
    
    # Assignment: comp[x] > comp[NOT x] -> x = True
    result = [False] * (num_vars + 1)
    for i in range(num_vars):
        result[i + 1] = comp[2 * i] > comp[2 * i + 1]
    
    return result

The "phase transition" between 2-SAT and 3-SAT:

2-SAT: Polynomial (in P)
3-SAT: NP-Complete
The only difference: 2 literals per clause vs 3

This seemingly minor difference causes a qualitative leap in complexity. Similar "phase transitions":

2-coloring vs 3-coloring (P vs NPC)
Matching vs 3-dimensional matching (P vs NPC)

2.3 Why P != NP Is a Millennium Problem

If P = NP (almost all experts disbelieve this):

All modern cryptography collapses (RSA, elliptic curves are based on NP-Hard problems)
All creative activity can be automated (verification equals creation)
Mathematical proofs can be automatically discovered
Optimization problems (logistics, scheduling, design) can be solved exactly and efficiently

Why proving P != NP is so hard:

Proving P != NP requires showing some NPC problem has no polynomial algorithm. This is a "non-existence" proof — you must rule out all possible algorithms, not construct one specific algorithm.

Known barriers:

Relativization barrier (Baker, Gill, Solovay, 1975): There exist oracles A with P^A = NP^A and B with P^B != NP^B. So diagonalization-style methods cannot resolve this.
Natural Proofs barrier (Razborov, Rudich, 1994): If one-way functions exist (a basic cryptographic assumption), then certain "natural" proof methods cannot prove P != NP.
Algebrization barrier (Aaronson, Wigderson, 2009): Even stronger — arithmetization methods are also insufficient.

These barriers don't say P != NP is unprovable — they say entirely new techniques are needed, beyond all known methods developed over decades of complexity theory.

2.4 Practical Examples of Reduction

Example: Proving Vertex Cover is NPC

Known: 3-SAT is NPC. Construct a reduction from 3-SAT to Vertex Cover.

Given a 3-SAT instance with n variables x_1,...,x_n and m clauses C_1,...,C_m:

Construct graph G:

For each variable x_i, create edge (x_i, NOT x_i) ("variable gadget")
For each clause C_j = (l_1 OR l_2 OR l_3), create a triangle (a, b, c) and edges (a,l_1), (b,l_2), (c,l_3) ("clause gadget")

Set k = n + 2m.

Proof: 3-SAT is satisfiable if and only if G has a vertex cover of size <= k.

(=>) Satisfying assignment -> for each variable gadget, select the TRUE literal's node (n total); for each clause gadget, select two triangle nodes not connected to the "satisfied literal" (2m total). Total: n+2m = k.
(<=) Cover of size <= k -> each variable gadget needs at least 1 (total n), each triangle needs at least 2 (total 2m), so exactly k. From the selection, extract a consistent satisfying assignment.

2.5 Recognizing NP-Complete Problems in Practice

In interviews or actual work, how do you recognize that a problem is NP-Hard?

Pattern recognition:

Problem feature	Likely NPC	P variant
Graph coloring	k >= 3 colors	2 colors (bipartiteness)
SAT	3-SAT and above	2-SAT, Horn-SAT
Path/cycle	Hamiltonian cycle	Eulerian cycle (O(V+E))
Subset selection	Subset sum, knapsack	Greedy-solvable variants
Scheduling	General scheduling	Special structure (e.g., single machine EDD)
Matching	3D matching	2D matching (Hungarian)

The key difference is often the "dimension" of constraints:

2 choices vs 3 choices
2D vs 3D
Special structure vs general structure

# In practice: quick heuristic check for NP-Hardness
def might_be_np_hard(problem_description: str) -> str:
    """Heuristic judgment (not rigorous)"""
    np_hard_keywords = [
        "all subsets", "all permutations", "all assignments",
        "optimal partition", "minimum colors", "longest path",
        "exact cover", "minimum set cover",
        "traveling salesman", "knapsack", "scheduling"
    ]
    
    p_keywords = [
        "shortest path", "maximum flow", "minimum spanning tree",
        "sorting", "binary search", "bipartite matching",
        "topological sort", "strongly connected components"
    ]
    
    for keyword in np_hard_keywords:
        if keyword in problem_description:
            return "Likely NP-Hard, consider approximation or heuristics"
    
    for keyword in p_keywords:
        if keyword in problem_description:
            return "Likely in P, look for polynomial algorithm"
    
    return "Needs further analysis"

2.6 Pseudo-polynomial Time and Strong NPC

Pseudo-polynomial time: Running time is polynomial in the numerical value of the input (not the encoding length).

Classic example: Knapsack DP solution, time O(nW). Here W is the knapsack capacity value. If W is encoded in b bits, then W can be up to 2^b, so O(nW) = O(n * 2^b) — exponential in input length.

Weak NPC vs Strong NPC:

Weakly NP-Complete: Has a pseudo-polynomial time algorithm. Examples: Subset Sum, Knapsack
Strongly NP-Complete: Even if all numerical values are unary-encoded (size only poly(n)), the problem remains NPC. Examples: 3-SAT, Graph Coloring, TSP

# Knapsack: Weakly NPC (has pseudo-polynomial algorithm)
def knapsack_dp(weights: list, values: list, capacity: int) -> int:
    """0/1 Knapsack - Dynamic Programming
    
    Time: O(n * W) — pseudo-polynomial
    Polynomial when W is poly(n), but W can be exponentially large
    """
    n = len(weights)
    dp = [0] * (capacity + 1)
    
    for i in range(n):
        for w in range(capacity, weights[i] - 1, -1):
            dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
    
    return dp[capacity]

# Contrast: 3-SAT has no known pseudo-polynomial algorithm (strongly NPC)

Level 3 · What the Specifications Say

3.1 Cook-Levin Theorem (1971)

Theorem (Stephen Cook, 1971; Leonid Levin, 1973 independently):

SAT is NP-Complete. That is, every problem in NP can be reduced to SAT in polynomial time.

Proof sketch (highly simplified):

Any NP problem L has a polynomial-time verifier V. Given input x and certificate y, V(x, y) decides to accept or reject in polynomial steps.

V is a Turing machine whose computation can be represented as a "computation tableau":

Rows = time steps t (at most p(|x|) steps)
Columns = tape positions i (at most p(|x|))
Each cell encodes the current state and tape content

Key insight: The Turing machine's transition function is local — each cell's value depends only on 3 adjacent cells in the previous row. Such "local constraints" can be encoded as CNF clauses.

Construction:

Introduce Boolean variables for each cell (encoding state/symbol)
Initial conditions: row 0 encodes input x
Transition rules: each cell's relationship to 3 cells above encoded in CNF
Termination: last row is in accept state

Total formula size is O(p(|x|)^2 * log|Sigma|) = polynomial.

Significance: This establishes a "benchmark" — once SAT is proved NPC, subsequent NPC proofs only need reductions from SAT (or other known NPC problems), without repeating this complex "Turing machine simulation" argument.

3.2 Karp's 21 NP-Complete Problems

Richard Karp's 1972 landmark paper "Reducibility Among Combinatorial Problems" proved 21 classic combinatorial optimization problems are NPC through a chain of reductions from SAT:

SAT
├── 3-SAT
│   ├── Graph Coloring
│   ├── Exact Cover
│   │   ├── 3D Matching
│   │   └── Steiner Tree
│   ├── Vertex Cover
│   │   ├── Independent Set
│   │   └── Clique
│   ├── Hamiltonian Cycle
│   │   └── TSP
│   └── Subset Sum
│       └── Knapsack
│       └── Job Scheduling
├── Integer Programming
└── ...

Karp's paper established the practical foundation of NPC theory. Before it, Cook-Levin only proved SAT is NPC — one problem. Karp demonstrated NPC problems pervade all areas of combinatorial optimization, suggesting "computational hardness" is a universal phenomenon, not a peculiarity of individual problems.

Complete reduction chain example:

Proving Independent Set is NPC:
1. Known: 3-SAT is NPC (Cook-Levin)
2. Construction: polynomial reduction from 3-SAT to Independent Set
   - For each clause, create a 3-node group (corresponding to 3 literals)
   - Nodes within same clause are interconnected (triangle)
   - Complementary literals connected by edges (x_i and NOT x_i)
   - Question: is there an independent set of size = m (number of clauses)?
3. Correctness:
   - 3-SAT satisfiable -> pick one TRUE literal per clause
   -> these nodes form size-m independent set
   - Size-m independent set exists -> exactly one node per clause group
   -> extract consistent satisfying assignment

3.3 The Turing Machine Model

Formal definition: A Turing machine M = (Q, Sigma, Gamma, delta, q_0, q_accept, q_reject) where:

Q: finite set of states
Sigma: input alphabet
Gamma: tape alphabet (Sigma subset of Gamma, Gamma includes blank symbol)
delta: Q x Gamma -> Q x Gamma x {L, R} (transition function)
q_0: initial state
q_accept, q_reject: accept/reject states

Nondeterministic Turing Machine (NTM): delta becomes Q x Gamma -> P(Q x Gamma x {L, R}) — multiple possible transitions per step. NTM accepts input x if and only if there exists some computation path reaching q_accept.

Equivalent definitions of NP:

Exists a polynomial-time verifier (the definition above)
Exists a nondeterministic Turing machine that accepts in polynomial time

Equivalence intuition: NTM's "guess" is the certificate, "verification" is the deterministic computation along the guessed path.

Why Turing machines instead of "real computers"?

The Church-Turing thesis tells us: all "reasonable" computational models (RAM model, lambda calculus, recursive functions, etc.) are equivalent at the computability level. At the polynomial-time level, the "Extended Church-Turing thesis" states: all reasonable models differ by at most polynomial factors. So defining complexity classes via Turing machines is universal.

3.4 coNP and Asymmetry of Evidence

coNP: The complement of NP — if the answer is NO, it can be verified in polynomial time.

Examples:

"Is this formula unsatisfiable?" (UNSAT) is in coNP
If the formula is unsatisfiable, what's the certificate? May require exponentially long proofs
But if satisfiable, a satisfying assignment is the certificate (that's NP)

NP intersect coNP: Problems where both YES and NO can be efficiently verified.

Primality testing was proved in P (AKS, 2002), hence also in NP intersect coNP
Linear programming is in P
Does NP intersect coNP = P? Unknown, but widely conjectured yes

3.5 Exponential Time Hypothesis (ETH)

ETH (Impagliazzo & Paturi, 2001):

3-SAT has no 2^o(n) time algorithm (n = number of variables).

That is: solving 3-SAT requires exponential time in the worst case; only the base of the exponential might be less than 2.

Current best algorithms:

Brute force: O(2^n)
Randomized PPSZ (Paturi, Pudlak, Saks, Zane, 2005): O(1.308^n) for 3-SAT
Schoning 1999: O((2(k-1)/k)^n) for k-SAT

Significance of ETH: It provides finer lower bounds. If ETH holds:

No O(2^o(n)) algorithm for 3-SAT
No O(n^o(k)) algorithm for k-Clique
No 2^o(n) algorithm for TSP

These "fine-grained complexity" results provide practical guidance for algorithm design.

Level 4 · Edge Cases and Interview Problems

4.1 Approximation Algorithm Ideas

When facing NP-Hard problems, give up exact optimal solutions and accept "near-optimal" ones.

Approximation Ratio: For a minimization problem, if algorithm outputs A(I) and optimal is OPT(I):

Approximation ratio alpha >= 1 satisfies A(I) / OPT(I) <= alpha for all inputs I

Classic approximation algorithms:

# Vertex Cover 2-approximation
def vertex_cover_2approx(graph: dict) -> set:
    """Vertex Cover 2-approximation
    
    Greedy: each time pick an edge, add both endpoints to cover
    Approximation ratio: 2 (at most twice optimal)
    Time: O(V + E)
    
    Why 2-approximation?
    We selected 2k vertices (k edges x 2)
    Optimal needs at least k (each selected edge needs one endpoint covered)
    So |A| / OPT <= 2k / k = 2
    """
    cover = set()
    remaining_edges = set()
    
    for u in graph:
        for v in graph[u]:
            if (u, v) not in remaining_edges and (v, u) not in remaining_edges:
                remaining_edges.add((u, v))
    
    for u, v in remaining_edges:
        if u not in cover and v not in cover:
            cover.add(u)
            cover.add(v)
    
    return cover


# TSP 2-approximation (requires triangle inequality)
def tsp_2approx(dist_matrix: list) -> list:
    """TSP 2-approximation (metric TSP)
    
    Algorithm:
    1. Construct Minimum Spanning Tree (MST)
    2. DFS on MST, record visit order
    3. Skip already-visited nodes (take shortcuts)
    
    Ratio: 2 (requires triangle inequality)
    Why? MST <= OPT (deleting one TSP tour edge gives a spanning tree)
         DFS traverses each MST edge twice = 2*MST <= 2*OPT
         Shortcuts only make it shorter (triangle inequality)
    
    Time: O(n^2 log n) (Prim + DFS)
    """
    n = len(dist_matrix)
    
    # 1. Prim's MST
    import heapq
    visited = [False] * n
    mst_adj = [[] for _ in range(n)]
    heap = [(0, 0, -1)]
    
    while heap:
        cost, u, parent = heapq.heappop(heap)
        if visited[u]:
            continue
        visited[u] = True
        if parent >= 0:
            mst_adj[u].append(parent)
            mst_adj[parent].append(u)
        
        for v in range(n):
            if not visited[v]:
                heapq.heappush(heap, (dist_matrix[u][v], v, u))
    
    # 2. DFS preorder traversal of MST
    tour = []
    visited = [False] * n
    stack = [0]
    while stack:
        node = stack.pop()
        if visited[node]:
            continue
        visited[node] = True
        tour.append(node)
        for neighbor in mst_adj[node]:
            if not visited[neighbor]:
                stack.append(neighbor)
    
    tour.append(tour[0])  # return to start
    return tour

Inapproximability:

Some NP-Hard problems have no good approximation algorithms (unless P=NP). Examples:

General TSP (no triangle inequality): no constant approximation ratio exists
MAX-3SAT: optimal ratio is 7/8 (Hastad, 1997) — random assignment already achieves this
Set Cover: optimal ratio is Theta(log n) (Feige, 1998)

4.2 Heuristic Algorithms

When approximation ratios are insufficient or problem structure doesn't allow guarantees:

import random
import math

# Simulated Annealing for TSP
def simulated_annealing_tsp(dist_matrix: list,
                             initial_temp: float = 10000,
                             cooling_rate: float = 0.9999,
                             min_temp: float = 1e-8) -> tuple:
    """Simulated Annealing for TSP
    
    No optimality guarantee, but works well in practice
    
    Core idea:
    - High temperature: accept more "worsening" moves, explore search space
    - Low temperature: almost only accept improvements, converge to local optimum
    """
    n = len(dist_matrix)
    
    current = list(range(n))
    random.shuffle(current)
    
    def tour_cost(tour):
        return sum(dist_matrix[tour[i]][tour[(i+1) % n]] for i in range(n))
    
    current_cost = tour_cost(current)
    best = current[:]
    best_cost = current_cost
    temp = initial_temp
    
    while temp > min_temp:
        # Generate neighbor: 2-opt (reverse a segment)
        i = random.randint(0, n - 2)
        j = random.randint(i + 1, n - 1)
        
        new_tour = current[:i+1] + current[i+1:j+1][::-1] + current[j+1:]
        new_cost = tour_cost(new_tour)
        
        delta = new_cost - current_cost
        
        # Metropolis criterion
        if delta < 0 or random.random() < math.exp(-delta / temp):
            current = new_tour
            current_cost = new_cost
            
            if current_cost < best_cost:
                best = current[:]
                best_cost = current_cost
        
        temp *= cooling_rate
    
    return best, best_cost


# Genetic Algorithm framework
def genetic_algorithm_framework(population_size: int = 100,
                                 generations: int = 1000,
                                 mutation_rate: float = 0.01):
    """Genetic Algorithm general framework
    
    Applicable to various NP-Hard optimization problems
    
    Core operations:
    1. Selection: higher fitness individuals more likely to reproduce
    2. Crossover: two parents combine to produce offspring
    3. Mutation: random small changes to maintain diversity
    """
    # Pseudocode framework
    """
    population = initialize_random_population(population_size)
    
    for gen in range(generations):
        fitness = evaluate_all(population)
        
        new_population = []
        elite = select_top(population, fitness, k=2)
        new_population.extend(elite)
        
        while len(new_population) < population_size:
            parent1 = tournament_select(population, fitness)
            parent2 = tournament_select(population, fitness)
            child = crossover(parent1, parent2)
            if random.random() < mutation_rate:
                child = mutate(child)
            new_population.append(child)
        
        population = new_population
    
    return best_individual(population)
    """
    pass

4.3 How to Answer "This Problem Is NP-Hard" in Interviews

The interviewer gives you an optimization problem that you believe is NP-Hard. How to respond?

Step 1: Identify and declare

"This problem appears to be NP-Hard. It's similar to [known NPC problem] because [shared structure]."

Example responses:

"This looks like a variant of TSP / Set Cover / Graph Coloring..."
"It requires enumerating all subsets/permutations, suggesting exponential complexity..."

Step 2: Verify understanding

"Let me confirm the constraints: [restate constraints]. If we relax certain constraints, are there special cases solvable efficiently?"

Possible simplifications:

Small input (n <= 20) -> bitmask DP (O(2^n * n))
Special structure (tree, DAG) -> might have DP solution
Only approximate solution needed -> greedy / approximation algorithm

Step 3: Propose feasible approaches

# Common strategies for NP-Hard problems in interviews

def interview_np_hard_strategy(problem_type: str, n: int):
    """Approach for NP-Hard problems in interviews"""
    
    if n <= 20:
        return "Bitmask DP (O(2^n * n))"
    elif n <= 30:
        return "Meet in the Middle (O(2^(n/2)))"
    elif n <= 50:
        return "Branch and Bound"
    else:
        strategies = [
            "Greedy heuristic + prove approximation ratio",
            "Dynamic programming (if pseudo-polynomial exists)",
            "Special case analysis (what structure does input have?)",
            "Simulated annealing / genetic algorithm (engineering practice)",
            "Known approximation algorithm"
        ]
        return strategies

Practical interview example:

Interviewer: "Given n tasks and k machines, each task has a processing time. Assign tasks to machines to minimize the maximum completion time (makespan)."

Answer: "This is the P||Cmax problem, known to be NP-Hard (reducible from Partition).

For practical solutions:

If n and k are small (n <= 20): bitmask DP
Greedy approximation: LPT (Longest Processing Time first) — sort by processing time descending, assign each to the lightest-loaded machine. Ratio: 4/3 - 1/(3k) (Graham, 1969)
For better approximation, a PTAS exists (Hochbaum & Shmoys, 1987) but is complex to implement"

def lpt_scheduling(tasks: list, k: int) -> list:
    """LPT (Longest Processing Time First) Scheduling
    
    Assign n tasks to k machines, minimize makespan
    Approximation ratio: 4/3 - 1/(3k)
    Time: O(n log n)
    """
    import heapq
    
    sorted_tasks = sorted(enumerate(tasks), key=lambda x: -x[1])
    
    machines = [(0, i, []) for i in range(k)]
    heapq.heapify(machines)
    
    for task_id, time in sorted_tasks:
        load, machine_id, task_list = heapq.heappop(machines)
        task_list.append(task_id)
        heapq.heappush(machines, (load + time, machine_id, task_list))
    
    return machines

4.4 Practical Impact of P vs NP

Impact on cryptography:

If P = NP is proved:

RSA: Based on difficulty of factoring (not known to be NPC, but widely believed not in P)
AES: Security not directly based on NPC, but P=NP would shake all hardness assumptions
Zero-knowledge proofs: Their existence is equivalent to P != NP

If P != NP is proved:

Cryptography gains theoretical foundation
But specific problem hardness proofs still needed (NPC != "cryptographically secure")

Impact on machine learning:

Many learning problems are NP-Hard (e.g., optimal neural network training)
But in practice, SGD and other heuristics work well — suggesting "typical instances" are much easier than worst case
Some PAC learning hardness results rely on P != NP assumption

4.5 Quantum Computing and Complexity

BQP (Bounded-Error Quantum Polynomial Time): Problems efficiently solvable by quantum computers.

Known relationships: P is a subset of BQP, which is a subset of PSPACE.

Key question: Is NP a subset of BQP? Almost all experts say no.

Shor's algorithm (1994): Polynomial-time integer factorization (breaks RSA) — but factoring is not known to be NPC
Grover's algorithm (1996): Unstructured search speedup from O(N) to O(sqrt(N)) — quadratic, not exponential
If NP were a subset of BQP, quantum computers could solve all NPC problems — no evidence supports this

Intuition: Quantum computing provides "limited speedup" rather than a "universal key." It significantly accelerates certain structured problems (factoring, search) but is unlikely to solve general NPC problems.

4.6 Handling NP-Hard Problems in Real Systems

Industrial SAT Solvers (e.g., MiniSat, Z3, CaDiCaL):

Although SAT is NPC, modern SAT solvers can solve industrial instances with millions of variables in seconds. Why:

DPLL + CDCL (Conflict-Driven Clause Learning)
Heuristic variable selection (VSIDS)
Clause database management
Random restarts

This is not a contradiction — NPC is a worst-case notion; industrial instances have exploitable structure.

Integer Linear Programming (ILP) solvers (e.g., Gurobi, CPLEX):

ILP is NPC, but modern solvers combine:

LP relaxation + branch and bound
Cutting planes
Preprocessing / Presolve
Heuristics (large neighborhood search, etc.)

For practical-scale scheduling, routing, and resource allocation problems, they typically find optimal or near-optimal solutions in reasonable time.

4.7 Interview Summary: Complexity Theory Practical Checklist

Scenario	Strategy
"What's this problem's complexity?"	First analyze if known polynomial algorithms apply
You notice similarity to NPC problem	Name the similar NPC problem, explain similarity
"Assume n is small"	Brute force / bitmask DP / backtracking with pruning
"Don't need optimal"	Greedy approximation, state the ratio
"Can you prove it's NP-Hard?"	Reduce from known NPC (full proof rarely required in interview)
"How is this handled in real systems?"	SAT solver / ILP solver / heuristics + practical experience

Memory aid:

P = can solve it; NP = can verify it
NPC = hardest in NP; NP-Hard = at least as hard (or harder)
Reduction direction matters: from known-hard TO new problem (proves new is hard)
P != NP unproven, but almost everyone believes it
Facing NP-Hard: recognize -> declare -> propose alternatives

Rate this chapter

4.7 / 5 (3 ratings)