Backtracking: Elegant Brute Force
Chapter 25: Backtracking — Elegant Brute Force
You face a combinatorial search space. Perhaps it is all permutations, all subsets, all valid board configurations, or all possible password combinations. The naive approach enumerates every possibility, but when the search space is exponential or factorial, blind enumeration is infeasible in both time and space.
The core insight of backtracking is this: you do not need to enumerate all possibilities and then check which ones are valid — you can abandon a partial solution the moment you discover it cannot lead to a valid complete solution. It is like navigating a maze by turning back at dead ends rather than mapping the entire maze before deciding which path leads to the exit.
The elegance of backtracking lies in its remarkably uniform template. Whether the problem is permutations, combinations, subsets, N-Queens, Sudoku, or regex matching, the underlying framework is identical: choose → explore → unchoose. Once you internalize this framework, every backtracking problem becomes a fill-in-the-blank exercise: "In this problem, what is the choice? What is the constraint? What is the goal?"
This chapter begins with the fundamental template, systematically covers the three classical families (permutations, combinations, subsets), dives into constraint satisfaction problems like N-Queens and Sudoku, and ascends to the theoretical relationship between backtracking and NP-completeness.
Level 1 · What You Need to Know
1.1 The Core Idea of Backtracking
Backtracking is essentially a depth-first search over a decision tree. You maintain a path from the root to the current node (the "current sequence of choices"). At each node you face several branches. When you discover that a branch violates constraints, you "backtrack" to the parent node and try the next branch.
The Three-Step Dance:
- Choose: Pick an element from the candidate set and add it to the current solution.
- Explore: Recursively search the next level based on the current choice.
- Unchoose: Remove the element from the current solution, restoring the state.
Why must you "unchoose"? Because after the recursion returns, you need to try other branches at the same level. If you do not undo the choice, the current solution retains the previous selection, causing subsequent branches to build on a corrupted state.
Consider a concrete example: generating all permutations of {1, 2, 3}. The decision tree looks like:
[]
/ | \
[1] [2] [3]
/ \ / \ / \
[1,2] [1,3] [2,1] [2,3] [3,1] [3,2]
| | | | | |
[1,2,3][1,3,2][2,1,3][2,3,1][3,1,2][3,2,1]
Each level represents a "position" choice. Level 1 chooses the first number, level 2 the second, and so on. At any point, numbers already chosen cannot be chosen again (the permutation constraint).
1.2 The Universal Backtracking Template
def backtrack(candidates, path, result, **constraints):
"""Universal backtracking template.
Args:
candidates: The set of available choices.
path: The current sequence of choices made.
result: Accumulator for all valid solutions.
constraints: Problem-specific constraints.
"""
# Base case: current path forms a complete solution
if is_solution(path):
result.append(path[:]) # Must copy path!
return
for choice in candidates:
# Pruning: skip choices that violate constraints
if not is_valid(choice, path, constraints):
continue
# 1. Choose
path.append(choice)
# 2. Explore
backtrack(next_candidates(choice), path, result, **constraints)
# 3. Unchoose
path.pop()
Key points:
path[:]instead ofpath: In Python, lists are reference types. If you appendpathdirectly toresult, subsequent modifications topathaffect the saved result. You must make a copy (for 1D lists, slicing suffices).is_validis the heart of pruning: It determines the effective search space. Good pruning can reduce exponential to manageable.next_candidatesdistinguishes permutations from combinations: In permutation problems, the next level can still choose from all unused elements; in combination problems, only elements after the current position (to avoid duplicates).
1.3 Permutation Problems
1.3.1 Permutations (LeetCode #46)
Problem: Given an array nums of distinct integers, return all possible permutations.
Analysis:
- Search space: n! permutations
- Constraint: each number may only be used once
- Termination: path length equals array length
from typing import List
def permute(nums: List[int]) -> List[List[int]]:
"""Generate all permutations.
Time: O(n! × n) Generate n! permutations, each copy costs O(n).
Space: O(n) Recursion depth + used array.
"""
result = []
used = [False] * len(nums)
def backtrack(path: List[int]):
# Base case
if len(path) == len(nums):
result.append(path[:])
return
for i in range(len(nums)):
# Prune: skip used elements
if used[i]:
continue
# Choose
path.append(nums[i])
used[i] = True
# Explore
backtrack(path)
# Unchoose
path.pop()
used[i] = False
backtrack([])
return result
# Test
print(permute([1, 2, 3]))
# [[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]
Why use a used array rather than if nums[i] not in path?
Because in on a list is O(n), while used[i] is O(1). For large n, this optimization matters significantly. Furthermore, using indices rather than values to determine "already used" correctly handles arrays with duplicate values (though this problem guarantees distinctness).
1.3.2 Permutations II (LeetCode #47, with Duplicates)
Problem: Given a collection of numbers nums that might contain duplicates, return all unique permutations.
Analysis: Compared to #46, the challenge is deduplication. For example, nums = [1, 1, 2] — without deduplication, you get two copies of [1, 1, 2] (first 1 in front vs. second 1 in front).
Deduplication strategy: Sort first. Then, at the same recursion level, if the current element equals the previous one and the previous one was not used (meaning it was unchoosen at this level), skip the current element.
def permute_unique(nums: List[int]) -> List[List[int]]:
"""Permutations with duplicates.
Core dedup logic: after sorting, at the same level, only pick the
first occurrence of each value.
Time: O(n! × n) worst case (when all distinct).
Space: O(n).
"""
result = []
nums.sort() # Sorting is prerequisite for dedup
used = [False] * len(nums)
def backtrack(path: List[int]):
if len(path) == len(nums):
result.append(path[:])
return
for i in range(len(nums)):
# Basic prune: skip used
if used[i]:
continue
# Dedup prune: at the same level, skip duplicate values
# Interpretation: nums[i] == nums[i-1] means same value.
# not used[i-1] means nums[i-1] was unchoosen at this level.
# This means we're trying the same value at the same level -> skip.
if i > 0 and nums[i] == nums[i - 1] and not used[i - 1]:
continue
path.append(nums[i])
used[i] = True
backtrack(path)
path.pop()
used[i] = False
backtrack([])
return result
# Test
print(permute_unique([1, 1, 2]))
# [[1,1,2],[1,2,1],[2,1,1]]
Understanding the dedup condition not used[i-1]:
An alternative way to think about it: we impose the rule that among elements with the same value, they must be used in their sorted order. That is, if nums[i-1] and nums[i] are the same, then nums[i] can only be chosen if nums[i-1] is already in the path. This ensures no duplicate permutations are generated from identical-value elements.
1.4 Combination Problems
1.4.1 Combinations (LeetCode #77)
Problem: Given two integers n and k, return all possible combinations of k numbers chosen from the range [1, n].
Analysis:
- Key difference from permutations: combinations are unordered — [1,2] and [2,1] are the same combination.
- Implementation: restrict the starting position of choices. Each level only picks elements after the current position.
def combine(n: int, k: int) -> List[List[int]]:
"""Combinations.
The start parameter ensures we only pick forward, avoiding duplicates.
Time: O(C(n,k) × k) Generate C(n,k) combinations, each copy costs O(k).
Space: O(k) Recursion depth.
"""
result = []
def backtrack(start: int, path: List[int]):
# Base case: combination complete
if len(path) == k:
result.append(path[:])
return
# Pruning: if remaining elements can't fill k slots, abort
# Need k - len(path) more, but only n - start + 1 available
if n - start + 1 < k - len(path):
return
for i in range(start, n + 1):
path.append(i)
backtrack(i + 1, path) # Note: i+1, not start+1
path.pop()
backtrack(1, [])
return result
# Test
print(combine(4, 2))
# [[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Pruning effectiveness: The check if n - start + 1 < k - len(path): return looks simple but is remarkably effective. For n=20, k=10, without this check we visit about 1 million nodes; with it, only about 180,000 — over 80% reduction in wasted work.
1.4.2 Combination Sum (LeetCode #39)
Problem: Given an array of distinct positive integers candidates and a target integer target, return all unique combinations where the chosen numbers sum to target. Each number may be used unlimited times.
Analysis:
- Difference from #77: (1) termination condition is sum equals target; (2) the same element can be reused.
- Reusability means passing
i(noti+1) in the recursive call.
def combination_sum(candidates: List[int], target: int) -> List[List[int]]:
"""Combination Sum (elements reusable).
Time: O(n^(target/min)) worst case.
Space: O(target/min) recursion depth.
"""
result = []
candidates.sort() # Sort enables pruning
def backtrack(start: int, path: List[int], remaining: int):
# Found a valid combination
if remaining == 0:
result.append(path[:])
return
for i in range(start, len(candidates)):
# Pruning: current exceeds remaining, all subsequent are larger
if candidates[i] > remaining:
break
path.append(candidates[i])
# Pass i (not i+1) because the same number can be reused
backtrack(i, path, remaining - candidates[i])
path.pop()
backtrack(0, [], target)
return result
# Test
print(combination_sum([2, 3, 6, 7], 7))
# [[2,2,3],[7]]
The power of sort + break: If candidates = [2,3,6,7] and target = 7, when we try 6 with remaining = 1, since 6 > 1 we break immediately. Without sorting, we would recurse deeper until remaining goes negative — wasting substantial time.
1.5 Subset Problems
1.5.1 Subsets (LeetCode #78)
Problem: Given an integer array nums of unique elements, return all possible subsets.
Analysis: The key difference from combinations — there is no fixed-length termination condition; every intermediate state is itself a valid subset.
def subsets(nums: List[int]) -> List[List[int]]:
"""Subsets.
Strategy: collect result at every node (including empty set).
Time: O(2^n × n) 2^n subsets, each copy costs O(n).
Space: O(n) Recursion depth.
"""
result = []
def backtrack(start: int, path: List[int]):
# Every node is a valid subset — collect immediately
result.append(path[:])
for i in range(start, len(nums)):
path.append(nums[i])
backtrack(i + 1, path)
path.pop()
backtrack(0, [])
return result
# Test
print(subsets([1, 2, 3]))
# [[], [1], [1,2], [1,2,3], [1,3], [2], [2,3], [3]]
Alternative perspective — bitmask enumeration: For an array of n elements, each element is either included or not — 2^n possibilities. We can use each bit of an n-bit integer to represent inclusion:
def subsets_bitmask(nums: List[int]) -> List[List[int]]:
"""Subsets via bitmask."""
n = len(nums)
result = []
for mask in range(1 << n): # 0 to 2^n - 1
subset = []
for i in range(n):
if mask & (1 << i):
subset.append(nums[i])
result.append(subset)
return result
Both methods produce identical results, but backtracking is easier to augment with pruning conditions and is more efficient when the search space is large.
1.5.2 Subsets II (LeetCode #90, with Duplicates)
Problem: Given an integer array nums that may contain duplicates, return all possible unique subsets.
def subsets_with_dup(nums: List[int]) -> List[List[int]]:
"""Subsets with duplicates.
Dedup strategy: sort + skip same elements at the same level.
"""
result = []
nums.sort()
def backtrack(start: int, path: List[int]):
result.append(path[:])
for i in range(start, len(nums)):
# Dedup: skip duplicate elements at the same level
if i > start and nums[i] == nums[i - 1]:
continue
path.append(nums[i])
backtrack(i + 1, path)
path.pop()
backtrack(0, [])
return result
# Test
print(subsets_with_dup([1, 2, 2]))
# [[], [1], [1,2], [1,2,2], [2], [2,2]]
1.6 Permutations vs. Combinations vs. Subsets
| Dimension | Permutations | Combinations | Subsets |
|---|---|---|---|
| Order matters? | Yes ([1,2] != [2,1]) | No ([1,2] = [2,1]) | No |
| Result length | Fixed at n | Fixed at k | 0 to n |
| Avoid duplicates how | used array | start position | start position |
| When to collect | path length == n | path length == k | At every node |
| Search space | n! | C(n,k) | 2^n |
| Next-level candidates | All unused elements | Elements after start | Elements after start |
Intuition for the core differences:
- Permutations care about order: at each level you choose from all elements, using
usedto prevent reuse. - Combinations do not care about order: at each level you only choose from elements "after" the current one, using
startto avoid going backward. - Subsets generalize combinations: no fixed length required, every intermediate state is a valid solution.
Understanding the unity and differences among these three gives you 80% of all backtracking problems.
Level 2 · Deeper Understanding
2.1 The N-Queens Problem
The N-Queens problem is one of the most iconic applications of backtracking: place N queens on an N×N chessboard such that no two queens threaten each other (same row, column, or diagonal).
2.1.1 Basic Implementation
def solve_n_queens(n: int) -> List[List[str]]:
"""N-Queens solver.
Strategy: place one queen per row, top to bottom.
Constraint checks: column conflict, main diagonal, anti-diagonal.
Time: O(n!) with pruning, far less in practice.
Space: O(n).
"""
result = []
cols = set() # Occupied columns
diag1 = set() # Occupied main diagonals (row - col is constant)
diag2 = set() # Occupied anti-diagonals (row + col is constant)
board = [['.'] * n for _ in range(n)]
def backtrack(row: int):
# All queens placed
if row == n:
result.append([''.join(r) for r in board])
return
for col in range(n):
# Prune: check column and diagonal conflicts
if col in cols or (row - col) in diag1 or (row + col) in diag2:
continue
# Choose: place queen
board[row][col] = 'Q'
cols.add(col)
diag1.add(row - col)
diag2.add(row + col)
# Explore next row
backtrack(row + 1)
# Unchoose: remove queen
board[row][col] = '.'
cols.remove(col)
diag1.remove(row - col)
diag2.remove(row + col)
backtrack(0)
return result
# Test
solutions = solve_n_queens(4)
for sol in solutions:
for row in sol:
print(row)
print()
# .Q..
# ...Q
# Q...
# ..Q.
#
# ..Q.
# Q...
# ...Q
# .Q..
Why row - col and row + col represent diagonals?
On the board, all cells on the same main diagonal (top-left to bottom-right) share the same value of row - col. For example, (0,0), (1,1), (2,2) all have row - col = 0. Similarly, all cells on the same anti-diagonal (top-right to bottom-left) share the same row + col.
2.1.2 Bitwise Optimization
For larger N (e.g., N=15), set operations become a bottleneck. Bitwise operations compress three sets into integer bit operations, dramatically improving speed.
def total_n_queens(n: int) -> int:
"""N-Queens with bitwise optimization (count-only).
Core idea: use n-bit integers where each bit represents
whether that column/diagonal is occupied.
~10x faster than set-based version at N=15.
"""
count = 0
all_ones = (1 << n) - 1 # n bits all set to 1
def backtrack(cols: int, diag1: int, diag2: int):
nonlocal count
if cols == all_ones:
# All columns occupied = n queens placed
count += 1
return
# Available positions: bits that are 0 in all three masks
available = all_ones & ~(cols | diag1 | diag2)
while available:
# Extract lowest set bit (lowbit trick)
position = available & (-available)
available -= position
# Recurse:
# cols | position: mark this column
# (diag1 | position) << 1: main diagonal propagates left
# (diag2 | position) >> 1: anti-diagonal propagates right
backtrack(
cols | position,
(diag1 | position) << 1,
(diag2 | position) >> 1
)
backtrack(0, 0, 0)
return count
# Test
print(total_n_queens(8)) # 92
print(total_n_queens(12)) # 14200
The elegance of the bitwise version:
available & (-available)extracts the lowest set bit — a property of two's complement:-x = ~x + 1.- Diagonal propagation via shifts: the main diagonal shifts left by one each row down; the anti-diagonal shifts right. This perfectly models how diagonal attack ranges expand downward.
- No explicit "unchoose" needed — state is passed entirely through function parameters, so recursion return naturally restores state.
2.2 Sudoku Solver
Sudoku is another classic Constraint Satisfaction Problem (CSP).
def solve_sudoku(board: List[List[str]]) -> None:
"""Sudoku solver (modifies board in-place).
Strategy: fill cells one by one with 1-9, prune via row/col/box constraints.
Enhancement: prioritize cells with fewest candidates (MRV heuristic).
Time: Worst case O(9^81), far less with pruning in practice.
"""
rows = [set() for _ in range(9)]
cols = [set() for _ in range(9)]
boxes = [set() for _ in range(9)]
empty = []
# Initialize constraint sets
for i in range(9):
for j in range(9):
if board[i][j] != '.':
num = board[i][j]
rows[i].add(num)
cols[j].add(num)
boxes[(i // 3) * 3 + j // 3].add(num)
else:
empty.append((i, j))
def backtrack(idx: int) -> bool:
# All empty cells filled
if idx == len(empty):
return True
i, j = empty[idx]
box_id = (i // 3) * 3 + j // 3
for num in '123456789':
# Constraint check
if num in rows[i] or num in cols[j] or num in boxes[box_id]:
continue
# Choose
board[i][j] = num
rows[i].add(num)
cols[j].add(num)
boxes[box_id].add(num)
# Explore
if backtrack(idx + 1):
return True # Solution found
# Unchoose
board[i][j] = '.'
rows[i].remove(num)
cols[j].remove(num)
boxes[box_id].remove(num)
return False # No valid digit for this cell — backtrack
backtrack(0)
Sudoku vs. N-Queens comparison:
| Aspect | N-Queens | Sudoku |
|---|---|---|
| Variables | Column position of queen in each row | Digit in each empty cell |
| Domain | 0 to N-1 | 1 to 9 |
| Constraints | Column, two diagonals | Row, column, 3×3 box |
| Number of solutions | May have many | Valid Sudoku has exactly one |
| Search strategy | Row-by-row | Cell-by-cell (or MRV) |
2.3 Pruning Strategies
Pruning is the soul of backtracking. Good pruning makes exponential problems solvable in practice.
2.3.1 Feasibility Pruning
Check constraints at the point of choice, not at the termination condition. This is the most basic yet most effective pruning.
# Bad: check only at termination
def backtrack_bad(path):
if is_complete(path):
if is_valid(path): # Too late!
result.append(path[:])
return
for choice in candidates:
path.append(choice)
backtrack_bad(path)
path.pop()
# Good: check at each step
def backtrack_good(path):
if is_complete(path):
result.append(path[:])
return
for choice in candidates:
if not is_valid_choice(choice, path): # Early check
continue
path.append(choice)
backtrack_good(path)
path.pop()
2.3.2 Sort-Based Pruning
For "sum equals target" problems, sorting enables break (not just continue) when the current element exceeds the remaining needed. break is stronger than continue: it skips not just the current element but all subsequent larger elements.
def combination_sum2(candidates: List[int], target: int) -> List[List[int]]:
"""Combination Sum II (each element used at most once, with duplicates)."""
result = []
candidates.sort()
def backtrack(start: int, path: List[int], remaining: int):
if remaining == 0:
result.append(path[:])
return
for i in range(start, len(candidates)):
# Sort-based pruning: current exceeds remaining, all after are larger
if candidates[i] > remaining:
break # Note: break, not continue!
# Dedup: skip duplicates at same level
if i > start and candidates[i] == candidates[i - 1]:
continue
path.append(candidates[i])
backtrack(i + 1, path, remaining - candidates[i])
path.pop()
backtrack(0, [], target)
return result
2.3.3 Optimality Pruning
If the problem asks for the optimal solution (max/min), maintain the best known value. When a branch cannot possibly improve upon it, terminate early.
def min_cost_backtrack(graph, n, current_cost, best_cost, path, visited):
"""TSP via backtracking with optimality pruning."""
if len(path) == n:
total = current_cost + graph[path[-1]][path[0]]
return min(best_cost, total)
for next_city in range(n):
if visited[next_city]:
continue
new_cost = current_cost + graph[path[-1]][next_city]
# Optimality pruning: current cost already exceeds best known
if new_cost >= best_cost:
continue
visited[next_city] = True
path.append(next_city)
best_cost = min_cost_backtrack(graph, n, new_cost, best_cost, path, visited)
path.pop()
visited[next_city] = False
return best_cost
2.4 Backtracking vs. DFS
Backtracking and DFS (Depth-First Search) are often conflated. They are closely related but subtly different:
| Concept | DFS | Backtracking |
|---|---|---|
| Essence | Graph/tree traversal strategy | Problem-solving methodology |
| Focus | Visitation order | Decisions + undoing |
| State management | Usually no modification (or visited marks) | Explicit choose and unchoose |
| Use case | Visit all nodes | Search for solutions satisfying constraints |
| Search space | Explicit graph/tree | Implicit search tree (constructed by decisions) |
Analogy: DFS is "the way you traverse a map"; backtracking is "the method to find the exit in a maze." Navigating the maze uses DFS's directional strategy, augmented with backtracking's "dead end → turn back" logic.
From an implementation perspective, backtracking = DFS + state restoration. If your DFS traversal includes the pattern "choose → recurse → unchoose," it is backtracking.
2.5 Time Complexity Analysis
Backtracking algorithms typically have exponential or factorial time complexity. This is not an implementation flaw — it is inherent to the problem's search space.
Common problems and their search spaces:
| Problem | Search Space | Analysis |
|---|---|---|
| Permutations | O(n!) | Level 1: n choices, level 2: n-1, ... |
| Subsets | O(2^n) | Each element: include or exclude |
| Combinations C(n,k) | O(C(n,k)) | Binomial coefficient |
| N-Queens | O(n!) | Row 1: n cols, row 2: at most n-1, ... |
| Sudoku | O(9^m) | m = empty cells, each has at most 9 choices |
Why doesn't pruning change the asymptotic complexity?
Theoretically, in the worst case all branches require exploration, so pruning cannot improve the asymptotic upper bound. But in practice, pruning can reduce the average search size by several orders of magnitude. This is why N-Queens, theoretically O(n!), can be solved for N=15 in seconds.
Analysis technique — node counting:
Total time = number of search tree nodes × work per node. For combination problems, the node count equals the answer count (each leaf produces one solution, internal nodes are paths toward solutions). For permutations, the search tree has n! leaves and approximately e × n! internal nodes (from the exponential generating function).
Level 3 · Theory and History
3.1 History of Backtracking
The formalization of backtracking dates to the 1950s. Here are key milestones:
1850s: Manual solutions to N-Queens
The eight-queens problem was first posed by chess player Max Bezzel in 1848. Mathematician Franz Nauck found all 92 solutions in 1850. Without computers, solutions relied entirely on manual reasoning and mathematical symmetry.
1960s: Golomb & Baumert's systematization
Solomon Golomb and Leonard Baumert published "Backtrack Programming" (Journal of the ACM, 12(4):516-524, 1965), the first systematic description of backtracking as a general algorithm design paradigm. They showed it applies to all problems expressible as "extending partial solutions."
1970s: Constraint Satisfaction Problem (CSP) framework
With the rise of AI research, backtracking was incorporated into the unified CSP framework. A CSP consists of three elements: a set of variables, a domain for each variable, and constraints between variables. Backtracking is the most basic method for solving CSPs — assign values to variables one by one, checking constraints at each assignment.
1970-80s: Improvement techniques
- Forward Checking: After assignment, immediately check whether any unassigned variable's domain becomes empty.
- Arc Consistency: Propagate constraints to shrink domains.
- MRV (Minimum Remaining Values): Prioritize the variable with the smallest remaining domain (fail-first heuristic).
- Conflict-Directed Backjumping: Instead of backtracking to the immediately previous variable, jump directly to the variable that caused the conflict.
These techniques improved naive backtracking by orders of magnitude, making large-scale CSPs (scheduling, timetabling) tractable.
3.2 Backtracking and Branch-and-Bound
Branch and Bound (B&B) is backtracking's "optimization-focused" variant, designed specifically for optimization problems.
Core differences:
| Aspect | Backtracking | Branch and Bound |
|---|---|---|
| Goal | Find all solutions / any solution | Find the optimal solution |
| Search order | Typically DFS | BFS, best-first search |
| Pruning basis | Feasibility constraints | Optimality bounds |
| Typical applications | CSP, enumeration | Integer programming, combinatorial optimization |
The key idea of B&B — bounds:
For each search tree node, compute an "optimistic estimate" (upper or lower bound): if the best possible solution from this node cannot beat the current best known solution, prune the entire subtree.
def branch_and_bound_knapsack(weights, values, capacity):
"""0-1 Knapsack via Branch and Bound.
Uses greedy fractional relaxation to compute upper bounds.
"""
n = len(weights)
# Sort by value density (descending)
items = sorted(range(n), key=lambda i: values[i] / weights[i], reverse=True)
best_value = 0
def upper_bound(idx, current_weight, current_value):
"""Greedy upper bound: allow fractional items."""
bound = current_value
w = current_weight
for i in range(idx, n):
item = items[i]
if w + weights[item] <= capacity:
w += weights[item]
bound += values[item]
else:
bound += values[item] * (capacity - w) / weights[item]
break
return bound
def solve(idx, current_weight, current_value):
nonlocal best_value
if idx == n:
best_value = max(best_value, current_value)
return
item = items[idx]
# Branch 1: take current item
if current_weight + weights[item] <= capacity:
solve(idx + 1, current_weight + weights[item],
current_value + values[item])
# Branch 2: skip current item (check bound first)
if upper_bound(idx + 1, current_weight, current_value) > best_value:
solve(idx + 1, current_weight, current_value)
solve(0, 0, 0)
return best_value
B&B in industry: Virtually all commercial Integer Linear Programming (ILP) solvers — CPLEX, Gurobi, SCIP — use B&B as their core framework. They combine LP relaxation for computing bounds, cutting planes for tightening relaxations, and sophisticated branching strategies for variable selection.
3.3 Backtracking in Compilers — Regex Engines
You might not expect it, but one of the most pervasive everyday applications of backtracking is regular expression matching. Most programming languages (Python, JavaScript, Java, Ruby, PHP) implement regex engines as backtracking-based NFA (Nondeterministic Finite Automaton) simulators.
How regex backtracking works:
When the regex engine encounters a branch (e.g., a|b) or quantifier (e.g., a*), it picks one direction and attempts to match. If it fails, it backtracks to the choice point and tries the other direction.
def regex_match(pattern: str, text: str) -> bool:
"""Simplified regex engine (supports . and *).
This is fundamentally backtracking: try one matching strategy,
fail -> backtrack and try another.
"""
def match(p_idx: int, t_idx: int) -> bool:
# Pattern exhausted — check if text is too
if p_idx == len(pattern):
return t_idx == len(text)
# Does current character match?
first_match = (t_idx < len(text) and
(pattern[p_idx] == text[t_idx] or pattern[p_idx] == '.'))
# Handle * quantifier
if p_idx + 1 < len(pattern) and pattern[p_idx + 1] == '*':
# Two choices:
# 1. * matches 0 times: skip x* in pattern
# 2. * matches 1+ times: consume one char, keep pattern
return (match(p_idx + 2, t_idx) or # Match 0 times
(first_match and match(p_idx, t_idx + 1))) # Match 1+ times
# Ordinary match
if first_match:
return match(p_idx + 1, t_idx + 1)
return False
return match(0, 0)
Catastrophic Backtracking:
Backtracking-based regex engines have a critical vulnerability: for certain malicious patterns and inputs, backtracking may grow exponentially, hanging the program. This is known as ReDoS (Regular Expression Denial of Service).
Classic example: pattern (a+)+$ against string "a" * 25 + "b". The engine tries 2^n grouping strategies before determining no match exists.
import re
import time
# DANGER! The following may hang your program:
# pattern = r"(a+)+$"
# text = "a" * 25 + "b"
# re.match(pattern, text) # May take tens of seconds or more
# Safe alternative: use non-backtracking engines (Go's regexp, RE2)
# Or rewrite: r"a+$" is semantically equivalent but cannot catastrophically backtrack
Why not use a DFA? Thompson's NFA construction (1968) or DFA can match any regular expression in O(n) time with no catastrophic backtracking. However, DFAs cannot support backreferences — patterns like (.)\1 that reference previously captured content exceed the expressiveness of regular languages. Because backreferences are widely used in practice, most languages chose backtracking engines as a tradeoff. Go is one of the few languages that insists on RE2 (DFA implementation) at the cost of dropping backreference support.
3.4 NP-Complete Problems and Backtracking
Why can many NP problems only be solved by backtracking/brute force?
P vs NP is one of the most important open problems in computer science. NP-Complete problems are the "hardest" in NP — if any NP-Complete problem has a polynomial-time algorithm, then all NP problems do. To date, no one has found a polynomial algorithm for any NP-Complete problem (nor proven none exists).
The relationship between backtracking and NP-Completeness:
NP is defined as the class of problems whose solutions can be verified in polynomial time. This means:
- Verifying a solution is easy (polynomial time)
- Finding a solution may be hard (only exponential algorithms are known)
Backtracking fits this pattern perfectly: it searches for solutions and uses constraint checking for verification and pruning. For NP-Complete problems (TSP, graph coloring, Boolean satisfiability), backtracking with heuristics and pruning is the primary exact solving method.
Typical NP-Complete problems and backtracking approaches:
| Problem | Decision Version | Backtracking Strategy |
|---|---|---|
| SAT | Is a Boolean formula satisfiable? | DPLL (backtracking + unit propagation + pure literal elimination) |
| Graph Coloring | Can a graph be colored with k colors? | Assign colors vertex by vertex, check adjacent constraint |
| Hamiltonian Path | Path visiting all vertices exactly once? | Extend path step by step, check connectivity and uniqueness |
| Subset Sum | Subset summing to target? | Include/exclude each element, prune after sorting |
A key insight: While NP-Complete problems have no known polynomial exact algorithms, for many practical instances, backtracking with good pruning and heuristics is efficient enough. SAT solvers (MiniSat, CaDiCaL) can handle industrial instances with millions of variables — despite worst-case exponential behavior. This tells us: average-case and worst-case behavior can differ enormously.
Level 4 · Practice and Interviews
4.1 Correct Deduplication Patterns
Deduplication is the most error-prone aspect of backtracking problems. Let us systematically summarize the two approaches.
4.1.1 Method 1: Sort + Skip Same Elements
For combination/subset problems (skip at the same level):
# Inside the for loop:
if i > start and nums[i] == nums[i - 1]:
continue
For permutation problems (skip at same level + used check):
# Inside the for loop:
if i > 0 and nums[i] == nums[i - 1] and not used[i - 1]:
continue
Why does the permutation dedup condition include not used[i-1]?
Because the permutation loop always starts from 0 (unlike combinations which start from start), the i > start trick does not apply. We need another way to distinguish "duplicate at the same level" from "legitimate choice at a different level":
not used[i-1]is True: nums[i-1] was unchoosen at this level (same-level duplicate) → skip.not used[i-1]is False (i.e.,used[i-1]is True): nums[i-1] is in the path (different-level choice) → legitimate.
4.1.2 Method 2: Set to Record Values Chosen at Same Level
When the array cannot be sorted or sorting is too costly:
def permute_unique_set(nums: List[int]) -> List[List[int]]:
"""Dedup via set (no sorting required)."""
result = []
used = [False] * len(nums)
def backtrack(path):
if len(path) == len(nums):
result.append(path[:])
return
seen = set() # Values already tried at this level
for i in range(len(nums)):
if used[i]:
continue
if nums[i] in seen: # Same value tried at this level
continue
seen.add(nums[i])
path.append(nums[i])
used[i] = True
backtrack(path)
path.pop()
used[i] = False
backtrack([])
return result
Comparison of the two methods:
| Aspect | Sort + Skip | Set Dedup |
|---|---|---|
| Prerequisite | Sort needed O(n log n) | No sorting |
| Extra space | O(1) (excluding recursion stack) | O(n) per level |
| Best for | Sortable arrays | Arrays where order must be preserved |
| Constant factor | Smaller | Hash table overhead |
4.2 High-Frequency Interview Problems
4.2.1 Letter Combinations of a Phone Number (LeetCode #17)
Problem: Given a string containing digits 2-9, return all possible letter combinations it could represent.
def letter_combinations(digits: str) -> List[str]:
"""Letter combinations of a phone number.
Time: O(4^n × n) At most 4 letters per digit (7 and 9), n = len(digits).
Space: O(n) Recursion depth.
"""
if not digits:
return []
phone_map = {
'2': 'abc', '3': 'def', '4': 'ghi', '5': 'jkl',
'6': 'mno', '7': 'pqrs', '8': 'tuv', '9': 'wxyz'
}
result = []
def backtrack(idx: int, path: List[str]):
if idx == len(digits):
result.append(''.join(path))
return
for char in phone_map[digits[idx]]:
path.append(char)
backtrack(idx + 1, path)
path.pop()
backtrack(0, [])
return result
# Test
print(letter_combinations("23"))
# ["ad","ae","af","bd","be","bf","cd","ce","cf"]
Interview insight: This problem appears simple but tests whether you understand that it is none of permutation/combination/subset — it is a "multi-layer choice" problem. Each layer's candidate set differs (determined by each digit), and there is no deduplication needed.
4.2.2 Palindrome Partitioning (LeetCode #131)
Problem: Given a string s, partition it such that every substring is a palindrome. Return all possible partitions.
def partition(s: str) -> List[List[str]]:
"""Palindrome Partitioning.
Strategy: enumerate cut position -> if prefix is palindrome -> recurse on remainder.
Optimization: precompute palindrome table to avoid repeated checks.
Time: O(n × 2^n) worst case (all substrings are palindromes, e.g., "aaa").
Space: O(n^2) for palindrome table.
"""
n = len(s)
# Precompute: is_palindrome[i][j] = True iff s[i:j+1] is a palindrome
is_palindrome = [[False] * n for _ in range(n)]
for i in range(n - 1, -1, -1):
for j in range(i, n):
if s[i] == s[j] and (j - i <= 2 or is_palindrome[i + 1][j - 1]):
is_palindrome[i][j] = True
result = []
def backtrack(start: int, path: List[str]):
if start == n:
result.append(path[:])
return
for end in range(start, n):
# Prune: only continue if prefix is palindrome
if not is_palindrome[start][end]:
continue
path.append(s[start:end + 1])
backtrack(end + 1, path)
path.pop()
backtrack(0, [])
return result
# Test
print(partition("aab"))
# [["a","a","b"],["aa","b"]]
Key optimization: The palindrome table costs O(n^2) to precompute but reduces each palindrome check from O(n) to O(1). In the worst case ("aaaa...a"), this improves total time from O(n^2 × 2^n) to O(n × 2^n).
4.2.3 Word Search (LeetCode #79)
Problem: Given an m×n character grid board and a string word, determine if word exists in the grid (adjacent cells connected horizontally or vertically, each cell used at most once).
def exist(board: List[List[str]], word: str) -> bool:
"""Word Search.
Strategy: from each possible starting cell, DFS + backtrack in 4 directions.
Optimization: modify board in-place for visited marking (avoid extra space).
Time: O(m × n × 3^L) L = len(word), at most 3 directions per step (exclude source).
Space: O(L) Recursion depth.
"""
m, n = len(board), len(board[0])
def backtrack(i: int, j: int, k: int) -> bool:
"""At position (i,j), attempt to match word[k:]."""
# All characters matched
if k == len(word):
return True
# Boundary check + character match
if (i < 0 or i >= m or j < 0 or j >= n or
board[i][j] != word[k]):
return False
# Choose: mark as visited
temp = board[i][j]
board[i][j] = '#' # In-place modification, O(1) space
# Explore four directions
found = (backtrack(i + 1, j, k + 1) or
backtrack(i - 1, j, k + 1) or
backtrack(i, j + 1, k + 1) or
backtrack(i, j - 1, k + 1))
# Unchoose: restore original character
board[i][j] = temp
return found
# Try each cell as starting point
for i in range(m):
for j in range(n):
if backtrack(i, j, 0):
return True
return False
# Test
board = [
['A','B','C','E'],
['S','F','C','S'],
['A','D','E','E']
]
print(exist(board, "ABCCED")) # True
print(exist(board, "SEE")) # True
print(exist(board, "ABCB")) # False
Interview follow-ups:
Q: Why modify in-place rather than using a visited matrix? A: A visited matrix requires O(m×n) extra space. In-place modification replaces the character with a sentinel, restored on backtrack, reducing space from O(m×n) to O(L) (recursion stack only).
Q: Can we optimize further? A: Pre-check character frequencies — if word contains more of some character than the board does, return False immediately. Also compare the frequency of word's first and last characters in the board; start searching from the rarer end.
4.2.4 Generate Parentheses (LeetCode #22)
Problem: Generate all combinations of well-formed parentheses given n pairs.
def generate_parenthesis(n: int) -> List[str]:
"""Generate Parentheses.
Constraint: in any prefix, count('(') >= count(')').
Choices: add '(' if budget remains, or ')' if left > right.
Time: O(4^n / sqrt(n)) The nth Catalan number.
Space: O(n).
"""
result = []
def backtrack(path: List[str], left: int, right: int):
"""
left: number of '(' used so far.
right: number of ')' used so far.
"""
# Termination
if len(path) == 2 * n:
result.append(''.join(path))
return
# Choice 1: add '(' if budget allows
if left < n:
path.append('(')
backtrack(path, left + 1, right)
path.pop()
# Choice 2: add ')' if it won't create invalid prefix
if right < left:
path.append(')')
backtrack(path, left, right + 1)
path.pop()
backtrack([], 0, 0)
return result
# Test
print(generate_parenthesis(3))
# ["((()))","(()())","(())()","()(())","()()()"]
Mathematical background — Catalan numbers:
The number of valid parenthesizations with n pairs equals the nth Catalan number: C_n = C(2n, n) / (n+1). First values: 1, 1, 2, 5, 14, 42, 132, 429, ...
Catalan numbers also count: distinct binary search trees with n nodes, lattice paths from (0,0) to (n,n) not crossing the diagonal, and distinct ways to parenthesize a product of n+1 matrices.
4.3 Time Complexity Estimation Techniques
Estimating backtracking complexity in interviews is often challenging. Here is a practical framework:
Method 1: Search tree counting
- Determine the tree depth d (usually the solution length).
- Determine the branching factor b per level (usually candidate set size).
- Total nodes ≈ b^d (upper bound).
Examples:
- Phone number #17: d = len(digits), b ≤ 4 → O(4^n).
- N-Queens: d = n, b decreases from n → O(n!).
- Parentheses: d = 2n, but constraints make effective nodes far fewer than 2^(2n) → Catalan O(4^n/sqrt(n)).
Method 2: Output size lower bound
If the problem requires outputting all solutions, time complexity is at least the output size:
- All permutations: at least O(n! × n) (n! solutions of length n).
- All subsets: at least O(2^n × n).
- This lower bound cannot be optimized — you must spend time generating each solution.
Method 3: Empirical counting with pruning
For problems with strong pruning, theoretical upper bounds and actual runtime may differ by orders of magnitude. Experimentally counting search tree nodes is more realistic.
def count_nodes_example():
"""Count search tree nodes for empirical complexity estimation."""
count = 0
def backtrack_with_counting(path, ...):
nonlocal count
count += 1
# ... normal logic
backtrack_with_counting([], ...)
print(f"Searched {count} nodes")
4.4 Backtracking vs. Dynamic Programming: How to Decide
One of the most common interview questions: should this problem use backtracking or DP?
Core distinction — overlapping subproblems:
- Overlapping subproblems exist → Dynamic Programming: the same subproblem is computed repeatedly (e.g., Fibonacci, knapsack).
- No overlapping subproblems → Backtracking: each search branch corresponds to a distinct state (e.g., permutations, N-Queens).
Determination method:
- Draw the search tree.
- Check if identical subtrees appear multiple times.
- If yes → memoization/DP; if no → backtracking.
The gray area — backtracking + memoization:
Some problems exhibit characteristics of both. For example, "Word Break II" (LeetCode #140) requires finding all valid segmentations — logically backtracking (enumerate all splits) — but subproblems overlap (the same suffix can reuse results), so memoization accelerates it.
def word_break_ii(s: str, word_dict: List[str]) -> List[str]:
"""Word Break II: a classic case of backtracking + memoization."""
word_set = set(word_dict)
memo = {}
def backtrack(start: int) -> List[str]:
if start in memo:
return memo[start]
if start == len(s):
return ['']
results = []
for end in range(start + 1, len(s) + 1):
word = s[start:end]
if word in word_set:
sub_sentences = backtrack(end)
for sub in sub_sentences:
if sub:
results.append(word + ' ' + sub)
else:
results.append(word)
memo[start] = results
return results
return backtrack(0)
Decision tree summary:
Does the problem ask for all solutions?
|-- Yes --> Overlapping subproblems?
| |-- Yes --> Backtracking + Memoization
| |-- No --> Pure Backtracking
|-- No (only count/optimal value) --> Optimal substructure?
|-- Yes --> Dynamic Programming
|-- No --> Backtracking / Greedy / Other
4.5 Comprehensive Practice: Quick-Reference Template
When facing a new backtracking problem, analyze it in these steps:
Step 1: Identify the problem type
- Permutation (order matters) → loop from 0, use
usedarray. - Combination (order irrelevant) → loop from
start. - Subset → collect at every node, loop from
start. - Search (e.g., word search, N-Queens) → customize based on problem structure.
Step 2: Determine the three elements
- Path: current choices made.
- Choice list: available choices at this step.
- Termination condition: when you have reached a leaf of the search tree.
Step 3: Consider pruning
- Does sorting enable early
break? - Are there duplicate elements requiring deduplication?
- Can constraints eliminate choices before making them?
Step 4: Confirm complexity
- Estimate using node count × work per node.
- Verify it is acceptable for the problem's input constraints.
# Ultimate template
def solve(problem_input):
result = []
# Preprocessing (sort, build tables, etc.)
candidates = preprocess(problem_input)
def backtrack(state, path):
# Termination
if is_goal(state):
result.append(format_solution(path))
return
for i, choice in enumerate(get_choices(state, candidates)):
# Pruning
if should_prune(choice, state):
continue # or break (if sorted)
# Dedup (if needed)
if is_duplicate(i, choice, state):
continue
# Choose
make_choice(state, choice, path)
# Explore
backtrack(next_state(state, choice), path)
# Unchoose
undo_choice(state, choice, path)
backtrack(initial_state(), [])
return result
Chapter Summary
Backtracking is one of the most fundamental algorithmic paradigms in computer science. Its elegance lies in applying a uniform framework (choose → explore → unchoose) to an enormous variety of search problems.
Key takeaways:
- Permutations/Combinations/Subsets are three foundational models. Understanding their differences (used array vs. start position) is essential.
- Pruning is the soul of backtracking — feasibility pruning, sort-based pruning, and optimality pruning form a progressive hierarchy.
- Deduplication is the most common source of bugs — sort + skip is the most reliable pattern.
- Backtracking is fundamentally DFS augmented with state management (choosing and unchoosing).
- NP-Complete problems have no known polynomial exact algorithms; backtracking + pruning is the primary exact solving approach.
- Backtracking vs. DP hinges on overlapping subproblems — if they exist, use memoization; if not, search directly.
When facing a new problem, do not rush to code. First sketch the search tree's first few levels, identify "what is the choice," "what is the constraint," and "when to collect results," then apply the template. Backtracking problems are not harder than DP — their templates are more uniform and their variations more regular. The only difficulties are recognizing "this is a backtracking problem" and designing effective pruning.