Chapter 25

Backtracking: Elegant Brute Force

Chapter 25: Backtracking — Elegant Brute Force

You face a combinatorial search space. Perhaps it is all permutations, all subsets, all valid board configurations, or all possible password combinations. The naive approach enumerates every possibility, but when the search space is exponential or factorial, blind enumeration is infeasible in both time and space.

The core insight of backtracking is this: you do not need to enumerate all possibilities and then check which ones are valid — you can abandon a partial solution the moment you discover it cannot lead to a valid complete solution. It is like navigating a maze by turning back at dead ends rather than mapping the entire maze before deciding which path leads to the exit.

The elegance of backtracking lies in its remarkably uniform template. Whether the problem is permutations, combinations, subsets, N-Queens, Sudoku, or regex matching, the underlying framework is identical: choose → explore → unchoose. Once you internalize this framework, every backtracking problem becomes a fill-in-the-blank exercise: "In this problem, what is the choice? What is the constraint? What is the goal?"

This chapter begins with the fundamental template, systematically covers the three classical families (permutations, combinations, subsets), dives into constraint satisfaction problems like N-Queens and Sudoku, and ascends to the theoretical relationship between backtracking and NP-completeness.

Level 1 · What You Need to Know

1.1 The Core Idea of Backtracking

Backtracking is essentially a depth-first search over a decision tree. You maintain a path from the root to the current node (the "current sequence of choices"). At each node you face several branches. When you discover that a branch violates constraints, you "backtrack" to the parent node and try the next branch.

The Three-Step Dance:

Choose: Pick an element from the candidate set and add it to the current solution.
Explore: Recursively search the next level based on the current choice.
Unchoose: Remove the element from the current solution, restoring the state.

Why must you "unchoose"? Because after the recursion returns, you need to try other branches at the same level. If you do not undo the choice, the current solution retains the previous selection, causing subsequent branches to build on a corrupted state.

Consider a concrete example: generating all permutations of {1, 2, 3}. The decision tree looks like:

                    []
           /        |        \
         [1]       [2]       [3]
        /   \     /   \     /   \
     [1,2] [1,3] [2,1] [2,3] [3,1] [3,2]
      |      |     |      |     |      |
  [1,2,3][1,3,2][2,1,3][2,3,1][3,1,2][3,2,1]

Each level represents a "position" choice. Level 1 chooses the first number, level 2 the second, and so on. At any point, numbers already chosen cannot be chosen again (the permutation constraint).

1.2 The Universal Backtracking Template

def backtrack(candidates, path, result, **constraints):
    """Universal backtracking template.
    
    Args:
        candidates: The set of available choices.
        path: The current sequence of choices made.
        result: Accumulator for all valid solutions.
        constraints: Problem-specific constraints.
    """
    # Base case: current path forms a complete solution
    if is_solution(path):
        result.append(path[:])  # Must copy path!
        return
    
    for choice in candidates:
        # Pruning: skip choices that violate constraints
        if not is_valid(choice, path, constraints):
            continue
        
        # 1. Choose
        path.append(choice)
        
        # 2. Explore
        backtrack(next_candidates(choice), path, result, **constraints)
        
        # 3. Unchoose
        path.pop()

Key points:

path[:] instead of path: In Python, lists are reference types. If you append path directly to result, subsequent modifications to path affect the saved result. You must make a copy (for 1D lists, slicing suffices).
is_valid is the heart of pruning: It determines the effective search space. Good pruning can reduce exponential to manageable.
next_candidates distinguishes permutations from combinations: In permutation problems, the next level can still choose from all unused elements; in combination problems, only elements after the current position (to avoid duplicates).

1.3 Permutation Problems

1.3.1 Permutations (LeetCode #46)

Problem: Given an array nums of distinct integers, return all possible permutations.

Analysis:

Search space: n! permutations
Constraint: each number may only be used once
Termination: path length equals array length

from typing import List

def permute(nums: List[int]) -> List[List[int]]:
    """Generate all permutations.
    
    Time: O(n! × n)  Generate n! permutations, each copy costs O(n).
    Space: O(n)  Recursion depth + used array.
    """
    result = []
    used = [False] * len(nums)
    
    def backtrack(path: List[int]):
        # Base case
        if len(path) == len(nums):
            result.append(path[:])
            return
        
        for i in range(len(nums)):
            # Prune: skip used elements
            if used[i]:
                continue
            
            # Choose
            path.append(nums[i])
            used[i] = True
            
            # Explore
            backtrack(path)
            
            # Unchoose
            path.pop()
            used[i] = False
    
    backtrack([])
    return result

# Test
print(permute([1, 2, 3]))
# [[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]

Why use a used array rather than if nums[i] not in path?

Because in on a list is O(n), while used[i] is O(1). For large n, this optimization matters significantly. Furthermore, using indices rather than values to determine "already used" correctly handles arrays with duplicate values (though this problem guarantees distinctness).

1.3.2 Permutations II (LeetCode #47, with Duplicates)

Problem: Given a collection of numbers nums that might contain duplicates, return all unique permutations.

Analysis: Compared to #46, the challenge is deduplication. For example, nums = [1, 1, 2] — without deduplication, you get two copies of [1, 1, 2] (first 1 in front vs. second 1 in front).

Deduplication strategy: Sort first. Then, at the same recursion level, if the current element equals the previous one and the previous one was not used (meaning it was unchoosen at this level), skip the current element.

def permute_unique(nums: List[int]) -> List[List[int]]:
    """Permutations with duplicates.
    
    Core dedup logic: after sorting, at the same level, only pick the
    first occurrence of each value.
    
    Time: O(n! × n) worst case (when all distinct).
    Space: O(n).
    """
    result = []
    nums.sort()  # Sorting is prerequisite for dedup
    used = [False] * len(nums)
    
    def backtrack(path: List[int]):
        if len(path) == len(nums):
            result.append(path[:])
            return
        
        for i in range(len(nums)):
            # Basic prune: skip used
            if used[i]:
                continue
            
            # Dedup prune: at the same level, skip duplicate values
            # Interpretation: nums[i] == nums[i-1] means same value.
            # not used[i-1] means nums[i-1] was unchoosen at this level.
            # This means we're trying the same value at the same level -> skip.
            if i > 0 and nums[i] == nums[i - 1] and not used[i - 1]:
                continue
            
            path.append(nums[i])
            used[i] = True
            backtrack(path)
            path.pop()
            used[i] = False
    
    backtrack([])
    return result

# Test
print(permute_unique([1, 1, 2]))
# [[1,1,2],[1,2,1],[2,1,1]]

Understanding the dedup condition not used[i-1]:

An alternative way to think about it: we impose the rule that among elements with the same value, they must be used in their sorted order. That is, if nums[i-1] and nums[i] are the same, then nums[i] can only be chosen if nums[i-1] is already in the path. This ensures no duplicate permutations are generated from identical-value elements.

1.4 Combination Problems

1.4.1 Combinations (LeetCode #77)

Problem: Given two integers n and k, return all possible combinations of k numbers chosen from the range [1, n].

Analysis:

Key difference from permutations: combinations are unordered — [1,2] and [2,1] are the same combination.
Implementation: restrict the starting position of choices. Each level only picks elements after the current position.

def combine(n: int, k: int) -> List[List[int]]:
    """Combinations.
    
    The start parameter ensures we only pick forward, avoiding duplicates.
    
    Time: O(C(n,k) × k)  Generate C(n,k) combinations, each copy costs O(k).
    Space: O(k)  Recursion depth.
    """
    result = []
    
    def backtrack(start: int, path: List[int]):
        # Base case: combination complete
        if len(path) == k:
            result.append(path[:])
            return
        
        # Pruning: if remaining elements can't fill k slots, abort
        # Need k - len(path) more, but only n - start + 1 available
        if n - start + 1 < k - len(path):
            return
        
        for i in range(start, n + 1):
            path.append(i)
            backtrack(i + 1, path)  # Note: i+1, not start+1
            path.pop()
    
    backtrack(1, [])
    return result

# Test
print(combine(4, 2))
# [[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]

Pruning effectiveness: The check if n - start + 1 < k - len(path): return looks simple but is remarkably effective. For n=20, k=10, without this check we visit about 1 million nodes; with it, only about 180,000 — over 80% reduction in wasted work.

1.4.2 Combination Sum (LeetCode #39)

Problem: Given an array of distinct positive integers candidates and a target integer target, return all unique combinations where the chosen numbers sum to target. Each number may be used unlimited times.

Analysis:

Difference from #77: (1) termination condition is sum equals target; (2) the same element can be reused.
Reusability means passing i (not i+1) in the recursive call.

def combination_sum(candidates: List[int], target: int) -> List[List[int]]:
    """Combination Sum (elements reusable).
    
    Time: O(n^(target/min)) worst case.
    Space: O(target/min) recursion depth.
    """
    result = []
    candidates.sort()  # Sort enables pruning
    
    def backtrack(start: int, path: List[int], remaining: int):
        # Found a valid combination
        if remaining == 0:
            result.append(path[:])
            return
        
        for i in range(start, len(candidates)):
            # Pruning: current exceeds remaining, all subsequent are larger
            if candidates[i] > remaining:
                break
            
            path.append(candidates[i])
            # Pass i (not i+1) because the same number can be reused
            backtrack(i, path, remaining - candidates[i])
            path.pop()
    
    backtrack(0, [], target)
    return result

# Test
print(combination_sum([2, 3, 6, 7], 7))
# [[2,2,3],[7]]

The power of sort + break: If candidates = [2,3,6,7] and target = 7, when we try 6 with remaining = 1, since 6 > 1 we break immediately. Without sorting, we would recurse deeper until remaining goes negative — wasting substantial time.

1.5 Subset Problems

1.5.1 Subsets (LeetCode #78)

Problem: Given an integer array nums of unique elements, return all possible subsets.

Analysis: The key difference from combinations — there is no fixed-length termination condition; every intermediate state is itself a valid subset.

def subsets(nums: List[int]) -> List[List[int]]:
    """Subsets.
    
    Strategy: collect result at every node (including empty set).
    
    Time: O(2^n × n)  2^n subsets, each copy costs O(n).
    Space: O(n)  Recursion depth.
    """
    result = []
    
    def backtrack(start: int, path: List[int]):
        # Every node is a valid subset — collect immediately
        result.append(path[:])
        
        for i in range(start, len(nums)):
            path.append(nums[i])
            backtrack(i + 1, path)
            path.pop()
    
    backtrack(0, [])
    return result

# Test
print(subsets([1, 2, 3]))
# [[], [1], [1,2], [1,2,3], [1,3], [2], [2,3], [3]]

Alternative perspective — bitmask enumeration: For an array of n elements, each element is either included or not — 2^n possibilities. We can use each bit of an n-bit integer to represent inclusion:

def subsets_bitmask(nums: List[int]) -> List[List[int]]:
    """Subsets via bitmask."""
    n = len(nums)
    result = []
    for mask in range(1 << n):  # 0 to 2^n - 1
        subset = []
        for i in range(n):
            if mask & (1 << i):
                subset.append(nums[i])
        result.append(subset)
    return result

Both methods produce identical results, but backtracking is easier to augment with pruning conditions and is more efficient when the search space is large.

1.5.2 Subsets II (LeetCode #90, with Duplicates)

Problem: Given an integer array nums that may contain duplicates, return all possible unique subsets.

def subsets_with_dup(nums: List[int]) -> List[List[int]]:
    """Subsets with duplicates.
    
    Dedup strategy: sort + skip same elements at the same level.
    """
    result = []
    nums.sort()
    
    def backtrack(start: int, path: List[int]):
        result.append(path[:])
        
        for i in range(start, len(nums)):
            # Dedup: skip duplicate elements at the same level
            if i > start and nums[i] == nums[i - 1]:
                continue
            
            path.append(nums[i])
            backtrack(i + 1, path)
            path.pop()
    
    backtrack(0, [])
    return result

# Test
print(subsets_with_dup([1, 2, 2]))
# [[], [1], [1,2], [1,2,2], [2], [2,2]]

1.6 Permutations vs. Combinations vs. Subsets

Dimension	Permutations	Combinations	Subsets
Order matters?	Yes ([1,2] != [2,1])	No ([1,2] = [2,1])	No
Result length	Fixed at n	Fixed at k	0 to n
Avoid duplicates how	used array	start position	start position
When to collect	path length == n	path length == k	At every node
Search space	n!	C(n,k)	2^n
Next-level candidates	All unused elements	Elements after start	Elements after start

Intuition for the core differences:

Permutations care about order: at each level you choose from all elements, using used to prevent reuse.
Combinations do not care about order: at each level you only choose from elements "after" the current one, using start to avoid going backward.
Subsets generalize combinations: no fixed length required, every intermediate state is a valid solution.

Understanding the unity and differences among these three gives you 80% of all backtracking problems.

Level 2 · Deeper Understanding

2.1 The N-Queens Problem

The N-Queens problem is one of the most iconic applications of backtracking: place N queens on an N×N chessboard such that no two queens threaten each other (same row, column, or diagonal).

2.1.1 Basic Implementation

def solve_n_queens(n: int) -> List[List[str]]:
    """N-Queens solver.
    
    Strategy: place one queen per row, top to bottom.
    Constraint checks: column conflict, main diagonal, anti-diagonal.
    
    Time: O(n!) with pruning, far less in practice.
    Space: O(n).
    """
    result = []
    cols = set()        # Occupied columns
    diag1 = set()       # Occupied main diagonals (row - col is constant)
    diag2 = set()       # Occupied anti-diagonals (row + col is constant)
    
    board = [['.'] * n for _ in range(n)]
    
    def backtrack(row: int):
        # All queens placed
        if row == n:
            result.append([''.join(r) for r in board])
            return
        
        for col in range(n):
            # Prune: check column and diagonal conflicts
            if col in cols or (row - col) in diag1 or (row + col) in diag2:
                continue
            
            # Choose: place queen
            board[row][col] = 'Q'
            cols.add(col)
            diag1.add(row - col)
            diag2.add(row + col)
            
            # Explore next row
            backtrack(row + 1)
            
            # Unchoose: remove queen
            board[row][col] = '.'
            cols.remove(col)
            diag1.remove(row - col)
            diag2.remove(row + col)
    
    backtrack(0)
    return result

# Test
solutions = solve_n_queens(4)
for sol in solutions:
    for row in sol:
        print(row)
    print()
# .Q..
# ...Q
# Q...
# ..Q.
#
# ..Q.
# Q...
# ...Q
# .Q..

Why row - col and row + col represent diagonals?

On the board, all cells on the same main diagonal (top-left to bottom-right) share the same value of row - col. For example, (0,0), (1,1), (2,2) all have row - col = 0. Similarly, all cells on the same anti-diagonal (top-right to bottom-left) share the same row + col.

2.1.2 Bitwise Optimization

For larger N (e.g., N=15), set operations become a bottleneck. Bitwise operations compress three sets into integer bit operations, dramatically improving speed.

def total_n_queens(n: int) -> int:
    """N-Queens with bitwise optimization (count-only).
    
    Core idea: use n-bit integers where each bit represents
    whether that column/diagonal is occupied.
    
    ~10x faster than set-based version at N=15.
    """
    count = 0
    all_ones = (1 << n) - 1  # n bits all set to 1
    
    def backtrack(cols: int, diag1: int, diag2: int):
        nonlocal count
        
        if cols == all_ones:
            # All columns occupied = n queens placed
            count += 1
            return
        
        # Available positions: bits that are 0 in all three masks
        available = all_ones & ~(cols | diag1 | diag2)
        
        while available:
            # Extract lowest set bit (lowbit trick)
            position = available & (-available)
            available -= position
            
            # Recurse:
            # cols | position: mark this column
            # (diag1 | position) << 1: main diagonal propagates left
            # (diag2 | position) >> 1: anti-diagonal propagates right
            backtrack(
                cols | position,
                (diag1 | position) << 1,
                (diag2 | position) >> 1
            )
    
    backtrack(0, 0, 0)
    return count

# Test
print(total_n_queens(8))   # 92
print(total_n_queens(12))  # 14200

The elegance of the bitwise version:

available & (-available) extracts the lowest set bit — a property of two's complement: -x = ~x + 1.
Diagonal propagation via shifts: the main diagonal shifts left by one each row down; the anti-diagonal shifts right. This perfectly models how diagonal attack ranges expand downward.
No explicit "unchoose" needed — state is passed entirely through function parameters, so recursion return naturally restores state.

2.2 Sudoku Solver

Sudoku is another classic Constraint Satisfaction Problem (CSP).

def solve_sudoku(board: List[List[str]]) -> None:
    """Sudoku solver (modifies board in-place).
    
    Strategy: fill cells one by one with 1-9, prune via row/col/box constraints.
    Enhancement: prioritize cells with fewest candidates (MRV heuristic).
    
    Time: Worst case O(9^81), far less with pruning in practice.
    """
    rows = [set() for _ in range(9)]
    cols = [set() for _ in range(9)]
    boxes = [set() for _ in range(9)]
    empty = []
    
    # Initialize constraint sets
    for i in range(9):
        for j in range(9):
            if board[i][j] != '.':
                num = board[i][j]
                rows[i].add(num)
                cols[j].add(num)
                boxes[(i // 3) * 3 + j // 3].add(num)
            else:
                empty.append((i, j))
    
    def backtrack(idx: int) -> bool:
        # All empty cells filled
        if idx == len(empty):
            return True
        
        i, j = empty[idx]
        box_id = (i // 3) * 3 + j // 3
        
        for num in '123456789':
            # Constraint check
            if num in rows[i] or num in cols[j] or num in boxes[box_id]:
                continue
            
            # Choose
            board[i][j] = num
            rows[i].add(num)
            cols[j].add(num)
            boxes[box_id].add(num)
            
            # Explore
            if backtrack(idx + 1):
                return True  # Solution found
            
            # Unchoose
            board[i][j] = '.'
            rows[i].remove(num)
            cols[j].remove(num)
            boxes[box_id].remove(num)
        
        return False  # No valid digit for this cell — backtrack
    
    backtrack(0)

Sudoku vs. N-Queens comparison:

Aspect	N-Queens	Sudoku
Variables	Column position of queen in each row	Digit in each empty cell
Domain	0 to N-1	1 to 9
Constraints	Column, two diagonals	Row, column, 3×3 box
Number of solutions	May have many	Valid Sudoku has exactly one
Search strategy	Row-by-row	Cell-by-cell (or MRV)

2.3 Pruning Strategies

Pruning is the soul of backtracking. Good pruning makes exponential problems solvable in practice.

2.3.1 Feasibility Pruning

Check constraints at the point of choice, not at the termination condition. This is the most basic yet most effective pruning.

# Bad: check only at termination
def backtrack_bad(path):
    if is_complete(path):
        if is_valid(path):  # Too late!
            result.append(path[:])
        return
    for choice in candidates:
        path.append(choice)
        backtrack_bad(path)
        path.pop()

# Good: check at each step
def backtrack_good(path):
    if is_complete(path):
        result.append(path[:])
        return
    for choice in candidates:
        if not is_valid_choice(choice, path):  # Early check
            continue
        path.append(choice)
        backtrack_good(path)
        path.pop()

2.3.2 Sort-Based Pruning

For "sum equals target" problems, sorting enables break (not just continue) when the current element exceeds the remaining needed. break is stronger than continue: it skips not just the current element but all subsequent larger elements.

def combination_sum2(candidates: List[int], target: int) -> List[List[int]]:
    """Combination Sum II (each element used at most once, with duplicates)."""
    result = []
    candidates.sort()
    
    def backtrack(start: int, path: List[int], remaining: int):
        if remaining == 0:
            result.append(path[:])
            return
        
        for i in range(start, len(candidates)):
            # Sort-based pruning: current exceeds remaining, all after are larger
            if candidates[i] > remaining:
                break  # Note: break, not continue!
            
            # Dedup: skip duplicates at same level
            if i > start and candidates[i] == candidates[i - 1]:
                continue
            
            path.append(candidates[i])
            backtrack(i + 1, path, remaining - candidates[i])
            path.pop()
    
    backtrack(0, [], target)
    return result

2.3.3 Optimality Pruning

If the problem asks for the optimal solution (max/min), maintain the best known value. When a branch cannot possibly improve upon it, terminate early.

def min_cost_backtrack(graph, n, current_cost, best_cost, path, visited):
    """TSP via backtracking with optimality pruning."""
    if len(path) == n:
        total = current_cost + graph[path[-1]][path[0]]
        return min(best_cost, total)
    
    for next_city in range(n):
        if visited[next_city]:
            continue
        
        new_cost = current_cost + graph[path[-1]][next_city]
        
        # Optimality pruning: current cost already exceeds best known
        if new_cost >= best_cost:
            continue
        
        visited[next_city] = True
        path.append(next_city)
        best_cost = min_cost_backtrack(graph, n, new_cost, best_cost, path, visited)
        path.pop()
        visited[next_city] = False
    
    return best_cost

2.4 Backtracking vs. DFS

Backtracking and DFS (Depth-First Search) are often conflated. They are closely related but subtly different:

Concept	DFS	Backtracking
Essence	Graph/tree traversal strategy	Problem-solving methodology
Focus	Visitation order	Decisions + undoing
State management	Usually no modification (or visited marks)	Explicit choose and unchoose
Use case	Visit all nodes	Search for solutions satisfying constraints
Search space	Explicit graph/tree	Implicit search tree (constructed by decisions)

Analogy: DFS is "the way you traverse a map"; backtracking is "the method to find the exit in a maze." Navigating the maze uses DFS's directional strategy, augmented with backtracking's "dead end → turn back" logic.

From an implementation perspective, backtracking = DFS + state restoration. If your DFS traversal includes the pattern "choose → recurse → unchoose," it is backtracking.

2.5 Time Complexity Analysis

Backtracking algorithms typically have exponential or factorial time complexity. This is not an implementation flaw — it is inherent to the problem's search space.

Common problems and their search spaces:

Problem	Search Space	Analysis
Permutations	O(n!)	Level 1: n choices, level 2: n-1, ...
Subsets	O(2^n)	Each element: include or exclude
Combinations C(n,k)	O(C(n,k))	Binomial coefficient
N-Queens	O(n!)	Row 1: n cols, row 2: at most n-1, ...
Sudoku	O(9^m)	m = empty cells, each has at most 9 choices

Why doesn't pruning change the asymptotic complexity?

Theoretically, in the worst case all branches require exploration, so pruning cannot improve the asymptotic upper bound. But in practice, pruning can reduce the average search size by several orders of magnitude. This is why N-Queens, theoretically O(n!), can be solved for N=15 in seconds.

Analysis technique — node counting:

Total time = number of search tree nodes × work per node. For combination problems, the node count equals the answer count (each leaf produces one solution, internal nodes are paths toward solutions). For permutations, the search tree has n! leaves and approximately e × n! internal nodes (from the exponential generating function).

Level 3 · Theory and History

3.1 History of Backtracking

The formalization of backtracking dates to the 1950s. Here are key milestones:

1850s: Manual solutions to N-Queens

The eight-queens problem was first posed by chess player Max Bezzel in 1848. Mathematician Franz Nauck found all 92 solutions in 1850. Without computers, solutions relied entirely on manual reasoning and mathematical symmetry.

1960s: Golomb & Baumert's systematization

Solomon Golomb and Leonard Baumert published "Backtrack Programming" (Journal of the ACM, 12(4):516-524, 1965), the first systematic description of backtracking as a general algorithm design paradigm. They showed it applies to all problems expressible as "extending partial solutions."

1970s: Constraint Satisfaction Problem (CSP) framework

With the rise of AI research, backtracking was incorporated into the unified CSP framework. A CSP consists of three elements: a set of variables, a domain for each variable, and constraints between variables. Backtracking is the most basic method for solving CSPs — assign values to variables one by one, checking constraints at each assignment.

1970-80s: Improvement techniques

Forward Checking: After assignment, immediately check whether any unassigned variable's domain becomes empty.
Arc Consistency: Propagate constraints to shrink domains.
MRV (Minimum Remaining Values): Prioritize the variable with the smallest remaining domain (fail-first heuristic).
Conflict-Directed Backjumping: Instead of backtracking to the immediately previous variable, jump directly to the variable that caused the conflict.

These techniques improved naive backtracking by orders of magnitude, making large-scale CSPs (scheduling, timetabling) tractable.

3.2 Backtracking and Branch-and-Bound

Branch and Bound (B&B) is backtracking's "optimization-focused" variant, designed specifically for optimization problems.

Core differences:

Aspect	Backtracking	Branch and Bound
Goal	Find all solutions / any solution	Find the optimal solution
Search order	Typically DFS	BFS, best-first search
Pruning basis	Feasibility constraints	Optimality bounds
Typical applications	CSP, enumeration	Integer programming, combinatorial optimization

The key idea of B&B — bounds:

For each search tree node, compute an "optimistic estimate" (upper or lower bound): if the best possible solution from this node cannot beat the current best known solution, prune the entire subtree.

def branch_and_bound_knapsack(weights, values, capacity):
    """0-1 Knapsack via Branch and Bound.
    
    Uses greedy fractional relaxation to compute upper bounds.
    """
    n = len(weights)
    # Sort by value density (descending)
    items = sorted(range(n), key=lambda i: values[i] / weights[i], reverse=True)
    best_value = 0
    
    def upper_bound(idx, current_weight, current_value):
        """Greedy upper bound: allow fractional items."""
        bound = current_value
        w = current_weight
        for i in range(idx, n):
            item = items[i]
            if w + weights[item] <= capacity:
                w += weights[item]
                bound += values[item]
            else:
                bound += values[item] * (capacity - w) / weights[item]
                break
        return bound
    
    def solve(idx, current_weight, current_value):
        nonlocal best_value
        
        if idx == n:
            best_value = max(best_value, current_value)
            return
        
        item = items[idx]
        
        # Branch 1: take current item
        if current_weight + weights[item] <= capacity:
            solve(idx + 1, current_weight + weights[item],
                  current_value + values[item])
        
        # Branch 2: skip current item (check bound first)
        if upper_bound(idx + 1, current_weight, current_value) > best_value:
            solve(idx + 1, current_weight, current_value)
    
    solve(0, 0, 0)
    return best_value

B&B in industry: Virtually all commercial Integer Linear Programming (ILP) solvers — CPLEX, Gurobi, SCIP — use B&B as their core framework. They combine LP relaxation for computing bounds, cutting planes for tightening relaxations, and sophisticated branching strategies for variable selection.

3.3 Backtracking in Compilers — Regex Engines

You might not expect it, but one of the most pervasive everyday applications of backtracking is regular expression matching. Most programming languages (Python, JavaScript, Java, Ruby, PHP) implement regex engines as backtracking-based NFA (Nondeterministic Finite Automaton) simulators.

How regex backtracking works:

When the regex engine encounters a branch (e.g., a|b) or quantifier (e.g., a*), it picks one direction and attempts to match. If it fails, it backtracks to the choice point and tries the other direction.

def regex_match(pattern: str, text: str) -> bool:
    """Simplified regex engine (supports . and *).
    
    This is fundamentally backtracking: try one matching strategy,
    fail -> backtrack and try another.
    """
    def match(p_idx: int, t_idx: int) -> bool:
        # Pattern exhausted — check if text is too
        if p_idx == len(pattern):
            return t_idx == len(text)
        
        # Does current character match?
        first_match = (t_idx < len(text) and
                       (pattern[p_idx] == text[t_idx] or pattern[p_idx] == '.'))
        
        # Handle * quantifier
        if p_idx + 1 < len(pattern) and pattern[p_idx + 1] == '*':
            # Two choices:
            # 1. * matches 0 times: skip x* in pattern
            # 2. * matches 1+ times: consume one char, keep pattern
            return (match(p_idx + 2, t_idx) or           # Match 0 times
                    (first_match and match(p_idx, t_idx + 1)))  # Match 1+ times
        
        # Ordinary match
        if first_match:
            return match(p_idx + 1, t_idx + 1)
        
        return False
    
    return match(0, 0)

Catastrophic Backtracking:

Backtracking-based regex engines have a critical vulnerability: for certain malicious patterns and inputs, backtracking may grow exponentially, hanging the program. This is known as ReDoS (Regular Expression Denial of Service).

Classic example: pattern (a+)+$ against string "a" * 25 + "b". The engine tries 2^n grouping strategies before determining no match exists.

import re
import time

# DANGER! The following may hang your program:
# pattern = r"(a+)+$"
# text = "a" * 25 + "b"
# re.match(pattern, text)  # May take tens of seconds or more

# Safe alternative: use non-backtracking engines (Go's regexp, RE2)
# Or rewrite: r"a+$" is semantically equivalent but cannot catastrophically backtrack

Why not use a DFA? Thompson's NFA construction (1968) or DFA can match any regular expression in O(n) time with no catastrophic backtracking. However, DFAs cannot support backreferences — patterns like (.)\1 that reference previously captured content exceed the expressiveness of regular languages. Because backreferences are widely used in practice, most languages chose backtracking engines as a tradeoff. Go is one of the few languages that insists on RE2 (DFA implementation) at the cost of dropping backreference support.

3.4 NP-Complete Problems and Backtracking

Why can many NP problems only be solved by backtracking/brute force?

P vs NP is one of the most important open problems in computer science. NP-Complete problems are the "hardest" in NP — if any NP-Complete problem has a polynomial-time algorithm, then all NP problems do. To date, no one has found a polynomial algorithm for any NP-Complete problem (nor proven none exists).

The relationship between backtracking and NP-Completeness:

NP is defined as the class of problems whose solutions can be verified in polynomial time. This means:

Verifying a solution is easy (polynomial time)
Finding a solution may be hard (only exponential algorithms are known)

Backtracking fits this pattern perfectly: it searches for solutions and uses constraint checking for verification and pruning. For NP-Complete problems (TSP, graph coloring, Boolean satisfiability), backtracking with heuristics and pruning is the primary exact solving method.

Typical NP-Complete problems and backtracking approaches:

Problem	Decision Version	Backtracking Strategy
SAT	Is a Boolean formula satisfiable?	DPLL (backtracking + unit propagation + pure literal elimination)
Graph Coloring	Can a graph be colored with k colors?	Assign colors vertex by vertex, check adjacent constraint
Hamiltonian Path	Path visiting all vertices exactly once?	Extend path step by step, check connectivity and uniqueness
Subset Sum	Subset summing to target?	Include/exclude each element, prune after sorting

A key insight: While NP-Complete problems have no known polynomial exact algorithms, for many practical instances, backtracking with good pruning and heuristics is efficient enough. SAT solvers (MiniSat, CaDiCaL) can handle industrial instances with millions of variables — despite worst-case exponential behavior. This tells us: average-case and worst-case behavior can differ enormously.

Level 4 · Practice and Interviews

4.1 Correct Deduplication Patterns

Deduplication is the most error-prone aspect of backtracking problems. Let us systematically summarize the two approaches.

4.1.1 Method 1: Sort + Skip Same Elements

For combination/subset problems (skip at the same level):

# Inside the for loop:
if i > start and nums[i] == nums[i - 1]:
    continue

For permutation problems (skip at same level + used check):

# Inside the for loop:
if i > 0 and nums[i] == nums[i - 1] and not used[i - 1]:
    continue

Why does the permutation dedup condition include not used[i-1]?

Because the permutation loop always starts from 0 (unlike combinations which start from start), the i > start trick does not apply. We need another way to distinguish "duplicate at the same level" from "legitimate choice at a different level":

not used[i-1] is True: nums[i-1] was unchoosen at this level (same-level duplicate) → skip.
not used[i-1] is False (i.e., used[i-1] is True): nums[i-1] is in the path (different-level choice) → legitimate.

4.1.2 Method 2: Set to Record Values Chosen at Same Level

When the array cannot be sorted or sorting is too costly:

def permute_unique_set(nums: List[int]) -> List[List[int]]:
    """Dedup via set (no sorting required)."""
    result = []
    used = [False] * len(nums)
    
    def backtrack(path):
        if len(path) == len(nums):
            result.append(path[:])
            return
        
        seen = set()  # Values already tried at this level
        for i in range(len(nums)):
            if used[i]:
                continue
            if nums[i] in seen:  # Same value tried at this level
                continue
            seen.add(nums[i])
            
            path.append(nums[i])
            used[i] = True
            backtrack(path)
            path.pop()
            used[i] = False
    
    backtrack([])
    return result

Comparison of the two methods:

Aspect	Sort + Skip	Set Dedup
Prerequisite	Sort needed O(n log n)	No sorting
Extra space	O(1) (excluding recursion stack)	O(n) per level
Best for	Sortable arrays	Arrays where order must be preserved
Constant factor	Smaller	Hash table overhead

4.2 High-Frequency Interview Problems

4.2.1 Letter Combinations of a Phone Number (LeetCode #17)

Problem: Given a string containing digits 2-9, return all possible letter combinations it could represent.

def letter_combinations(digits: str) -> List[str]:
    """Letter combinations of a phone number.
    
    Time: O(4^n × n)  At most 4 letters per digit (7 and 9), n = len(digits).
    Space: O(n)  Recursion depth.
    """
    if not digits:
        return []
    
    phone_map = {
        '2': 'abc', '3': 'def', '4': 'ghi', '5': 'jkl',
        '6': 'mno', '7': 'pqrs', '8': 'tuv', '9': 'wxyz'
    }
    
    result = []
    
    def backtrack(idx: int, path: List[str]):
        if idx == len(digits):
            result.append(''.join(path))
            return
        
        for char in phone_map[digits[idx]]:
            path.append(char)
            backtrack(idx + 1, path)
            path.pop()
    
    backtrack(0, [])
    return result

# Test
print(letter_combinations("23"))
# ["ad","ae","af","bd","be","bf","cd","ce","cf"]

Interview insight: This problem appears simple but tests whether you understand that it is none of permutation/combination/subset — it is a "multi-layer choice" problem. Each layer's candidate set differs (determined by each digit), and there is no deduplication needed.

4.2.2 Palindrome Partitioning (LeetCode #131)

Problem: Given a string s, partition it such that every substring is a palindrome. Return all possible partitions.

def partition(s: str) -> List[List[str]]:
    """Palindrome Partitioning.
    
    Strategy: enumerate cut position -> if prefix is palindrome -> recurse on remainder.
    Optimization: precompute palindrome table to avoid repeated checks.
    
    Time: O(n × 2^n) worst case (all substrings are palindromes, e.g., "aaa").
    Space: O(n^2) for palindrome table.
    """
    n = len(s)
    
    # Precompute: is_palindrome[i][j] = True iff s[i:j+1] is a palindrome
    is_palindrome = [[False] * n for _ in range(n)]
    for i in range(n - 1, -1, -1):
        for j in range(i, n):
            if s[i] == s[j] and (j - i <= 2 or is_palindrome[i + 1][j - 1]):
                is_palindrome[i][j] = True
    
    result = []
    
    def backtrack(start: int, path: List[str]):
        if start == n:
            result.append(path[:])
            return
        
        for end in range(start, n):
            # Prune: only continue if prefix is palindrome
            if not is_palindrome[start][end]:
                continue
            
            path.append(s[start:end + 1])
            backtrack(end + 1, path)
            path.pop()
    
    backtrack(0, [])
    return result

# Test
print(partition("aab"))
# [["a","a","b"],["aa","b"]]

Key optimization: The palindrome table costs O(n^2) to precompute but reduces each palindrome check from O(n) to O(1). In the worst case ("aaaa...a"), this improves total time from O(n^2 × 2^n) to O(n × 2^n).

4.2.3 Word Search (LeetCode #79)

Problem: Given an m×n character grid board and a string word, determine if word exists in the grid (adjacent cells connected horizontally or vertically, each cell used at most once).

def exist(board: List[List[str]], word: str) -> bool:
    """Word Search.
    
    Strategy: from each possible starting cell, DFS + backtrack in 4 directions.
    Optimization: modify board in-place for visited marking (avoid extra space).
    
    Time: O(m × n × 3^L)  L = len(word), at most 3 directions per step (exclude source).
    Space: O(L)  Recursion depth.
    """
    m, n = len(board), len(board[0])
    
    def backtrack(i: int, j: int, k: int) -> bool:
        """At position (i,j), attempt to match word[k:]."""
        # All characters matched
        if k == len(word):
            return True
        
        # Boundary check + character match
        if (i < 0 or i >= m or j < 0 or j >= n or
            board[i][j] != word[k]):
            return False
        
        # Choose: mark as visited
        temp = board[i][j]
        board[i][j] = '#'  # In-place modification, O(1) space
        
        # Explore four directions
        found = (backtrack(i + 1, j, k + 1) or
                 backtrack(i - 1, j, k + 1) or
                 backtrack(i, j + 1, k + 1) or
                 backtrack(i, j - 1, k + 1))
        
        # Unchoose: restore original character
        board[i][j] = temp
        
        return found
    
    # Try each cell as starting point
    for i in range(m):
        for j in range(n):
            if backtrack(i, j, 0):
                return True
    return False

# Test
board = [
    ['A','B','C','E'],
    ['S','F','C','S'],
    ['A','D','E','E']
]
print(exist(board, "ABCCED"))  # True
print(exist(board, "SEE"))     # True
print(exist(board, "ABCB"))    # False

Interview follow-ups:

Q: Why modify in-place rather than using a visited matrix? A: A visited matrix requires O(m×n) extra space. In-place modification replaces the character with a sentinel, restored on backtrack, reducing space from O(m×n) to O(L) (recursion stack only).

Q: Can we optimize further? A: Pre-check character frequencies — if word contains more of some character than the board does, return False immediately. Also compare the frequency of word's first and last characters in the board; start searching from the rarer end.

4.2.4 Generate Parentheses (LeetCode #22)

Problem: Generate all combinations of well-formed parentheses given n pairs.

def generate_parenthesis(n: int) -> List[str]:
    """Generate Parentheses.
    
    Constraint: in any prefix, count('(') >= count(')').
    Choices: add '(' if budget remains, or ')' if left > right.
    
    Time: O(4^n / sqrt(n))  The nth Catalan number.
    Space: O(n).
    """
    result = []
    
    def backtrack(path: List[str], left: int, right: int):
        """
        left: number of '(' used so far.
        right: number of ')' used so far.
        """
        # Termination
        if len(path) == 2 * n:
            result.append(''.join(path))
            return
        
        # Choice 1: add '(' if budget allows
        if left < n:
            path.append('(')
            backtrack(path, left + 1, right)
            path.pop()
        
        # Choice 2: add ')' if it won't create invalid prefix
        if right < left:
            path.append(')')
            backtrack(path, left, right + 1)
            path.pop()
    
    backtrack([], 0, 0)
    return result

# Test
print(generate_parenthesis(3))
# ["((()))","(()())","(())()","()(())","()()()"]

Mathematical background — Catalan numbers:

The number of valid parenthesizations with n pairs equals the nth Catalan number: C_n = C(2n, n) / (n+1). First values: 1, 1, 2, 5, 14, 42, 132, 429, ...

Catalan numbers also count: distinct binary search trees with n nodes, lattice paths from (0,0) to (n,n) not crossing the diagonal, and distinct ways to parenthesize a product of n+1 matrices.

4.3 Time Complexity Estimation Techniques

Estimating backtracking complexity in interviews is often challenging. Here is a practical framework:

Method 1: Search tree counting

Determine the tree depth d (usually the solution length).
Determine the branching factor b per level (usually candidate set size).
Total nodes ≈ b^d (upper bound).

Examples:

Phone number #17: d = len(digits), b ≤ 4 → O(4^n).
N-Queens: d = n, b decreases from n → O(n!).
Parentheses: d = 2n, but constraints make effective nodes far fewer than 2^(2n) → Catalan O(4^n/sqrt(n)).

Method 2: Output size lower bound

If the problem requires outputting all solutions, time complexity is at least the output size:

All permutations: at least O(n! × n) (n! solutions of length n).
All subsets: at least O(2^n × n).
This lower bound cannot be optimized — you must spend time generating each solution.

Method 3: Empirical counting with pruning

For problems with strong pruning, theoretical upper bounds and actual runtime may differ by orders of magnitude. Experimentally counting search tree nodes is more realistic.

def count_nodes_example():
    """Count search tree nodes for empirical complexity estimation."""
    count = 0
    
    def backtrack_with_counting(path, ...):
        nonlocal count
        count += 1
        # ... normal logic
    
    backtrack_with_counting([], ...)
    print(f"Searched {count} nodes")

4.4 Backtracking vs. Dynamic Programming: How to Decide

One of the most common interview questions: should this problem use backtracking or DP?

Core distinction — overlapping subproblems:

Overlapping subproblems exist → Dynamic Programming: the same subproblem is computed repeatedly (e.g., Fibonacci, knapsack).
No overlapping subproblems → Backtracking: each search branch corresponds to a distinct state (e.g., permutations, N-Queens).

Determination method:

Draw the search tree.
Check if identical subtrees appear multiple times.
If yes → memoization/DP; if no → backtracking.

The gray area — backtracking + memoization:

Some problems exhibit characteristics of both. For example, "Word Break II" (LeetCode #140) requires finding all valid segmentations — logically backtracking (enumerate all splits) — but subproblems overlap (the same suffix can reuse results), so memoization accelerates it.

def word_break_ii(s: str, word_dict: List[str]) -> List[str]:
    """Word Break II: a classic case of backtracking + memoization."""
    word_set = set(word_dict)
    memo = {}
    
    def backtrack(start: int) -> List[str]:
        if start in memo:
            return memo[start]
        
        if start == len(s):
            return ['']
        
        results = []
        for end in range(start + 1, len(s) + 1):
            word = s[start:end]
            if word in word_set:
                sub_sentences = backtrack(end)
                for sub in sub_sentences:
                    if sub:
                        results.append(word + ' ' + sub)
                    else:
                        results.append(word)
        
        memo[start] = results
        return results
    
    return backtrack(0)

Decision tree summary:

Does the problem ask for all solutions?
  |-- Yes --> Overlapping subproblems?
  |     |-- Yes --> Backtracking + Memoization
  |     |-- No  --> Pure Backtracking
  |-- No (only count/optimal value) --> Optimal substructure?
        |-- Yes --> Dynamic Programming
        |-- No  --> Backtracking / Greedy / Other

4.5 Comprehensive Practice: Quick-Reference Template

When facing a new backtracking problem, analyze it in these steps:

Step 1: Identify the problem type

Permutation (order matters) → loop from 0, use used array.
Combination (order irrelevant) → loop from start.
Subset → collect at every node, loop from start.
Search (e.g., word search, N-Queens) → customize based on problem structure.

Step 2: Determine the three elements

Path: current choices made.
Choice list: available choices at this step.
Termination condition: when you have reached a leaf of the search tree.

Step 3: Consider pruning

Does sorting enable early break?
Are there duplicate elements requiring deduplication?
Can constraints eliminate choices before making them?

Step 4: Confirm complexity

Estimate using node count × work per node.
Verify it is acceptable for the problem's input constraints.

# Ultimate template
def solve(problem_input):
    result = []
    
    # Preprocessing (sort, build tables, etc.)
    candidates = preprocess(problem_input)
    
    def backtrack(state, path):
        # Termination
        if is_goal(state):
            result.append(format_solution(path))
            return
        
        for i, choice in enumerate(get_choices(state, candidates)):
            # Pruning
            if should_prune(choice, state):
                continue  # or break (if sorted)
            
            # Dedup (if needed)
            if is_duplicate(i, choice, state):
                continue
            
            # Choose
            make_choice(state, choice, path)
            
            # Explore
            backtrack(next_state(state, choice), path)
            
            # Unchoose
            undo_choice(state, choice, path)
    
    backtrack(initial_state(), [])
    return result

Chapter Summary

Backtracking is one of the most fundamental algorithmic paradigms in computer science. Its elegance lies in applying a uniform framework (choose → explore → unchoose) to an enormous variety of search problems.

Key takeaways:

Permutations/Combinations/Subsets are three foundational models. Understanding their differences (used array vs. start position) is essential.
Pruning is the soul of backtracking — feasibility pruning, sort-based pruning, and optimality pruning form a progressive hierarchy.
Deduplication is the most common source of bugs — sort + skip is the most reliable pattern.
Backtracking is fundamentally DFS augmented with state management (choosing and unchoosing).
NP-Complete problems have no known polynomial exact algorithms; backtracking + pruning is the primary exact solving approach.
Backtracking vs. DP hinges on overlapping subproblems — if they exist, use memoization; if not, search directly.

When facing a new problem, do not rush to code. First sketch the search tree's first few levels, identify "what is the choice," "what is the constraint," and "when to collect results," then apply the template. Backtracking problems are not harder than DP — their templates are more uniform and their variations more regular. The only difficulties are recognizing "this is a backtracking problem" and designing effective pruning.

Rate this chapter

4.7 / 5 (5 ratings)

Backtracking: Elegant Brute Force

Chapter 25: Backtracking — Elegant Brute Force

Level 1 · What You Need to Know

1.1 The Core Idea of Backtracking

1.2 The Universal Backtracking Template

1.3 Permutation Problems

1.3.1 Permutations (LeetCode #46)

1.3.2 Permutations II (LeetCode #47, with Duplicates)

1.4 Combination Problems

1.4.1 Combinations (LeetCode #77)

1.4.2 Combination Sum (LeetCode #39)

1.5 Subset Problems

1.5.1 Subsets (LeetCode #78)

1.5.2 Subsets II (LeetCode #90, with Duplicates)

1.6 Permutations vs. Combinations vs. Subsets

Level 2 · Deeper Understanding

2.1 The N-Queens Problem

2.1.1 Basic Implementation

2.1.2 Bitwise Optimization

2.2 Sudoku Solver

2.3 Pruning Strategies

2.3.1 Feasibility Pruning

2.3.2 Sort-Based Pruning

2.3.3 Optimality Pruning

2.4 Backtracking vs. DFS

2.5 Time Complexity Analysis

Level 3 · Theory and History

3.1 History of Backtracking

3.2 Backtracking and Branch-and-Bound

3.3 Backtracking in Compilers — Regex Engines

3.4 NP-Complete Problems and Backtracking

Level 4 · Practice and Interviews

4.1 Correct Deduplication Patterns

4.1.1 Method 1: Sort + Skip Same Elements

4.1.2 Method 2: Set to Record Values Chosen at Same Level

4.2 High-Frequency Interview Problems

4.2.1 Letter Combinations of a Phone Number (LeetCode #17)

4.2.2 Palindrome Partitioning (LeetCode #131)

4.2.3 Word Search (LeetCode #79)

4.2.4 Generate Parentheses (LeetCode #22)

4.3 Time Complexity Estimation Techniques

4.4 Backtracking vs. Dynamic Programming: How to Decide

4.5 Comprehensive Practice: Quick-Reference Template

Chapter Summary

💬 Comments