Chapter 1

Complexity: The Ruler for Code Speed

Chapter 1: Complexity — The Ruler for Code Speed

You wrote two solutions to the same problem. One runs in 0.3 seconds, the other in 45 seconds. The first one is "faster," right? But what if the input grows from 10,000 to 100 million? The 0.3-second solution might become 30 seconds, while the 45-second one might become 45.001 seconds. Which is faster now?

This is why we need complexity analysis. It doesn't answer "how long does this code take" — it answers "how does the running time grow as input grows."

Level 1 · What You Need to Know

1.1 The Intuition Behind Big-O

Big-O notation describes growth trends, not exact times. When we say an algorithm is O(n), it means: when the input doubles, the running time roughly doubles too.

The most common complexities, from fastest to slowest:

Complexity	Name	n=10	n=100	n=1000	n=10⁶	Intuition
O(1)	Constant	1	1	1	1	Time doesn't change with input size
O(log n)	Logarithmic	3	7	10	20	Eliminate half each step
O(n)	Linear	10	100	1000	10⁶	Look at every element once
O(n log n)	Linearithmic	33	664	9966	2×10⁷	Good sorting algorithms
O(n²)	Quadratic	100	10⁴	10⁶	10¹²	Two nested loops
O(2ⁿ)	Exponential	1024	10³⁰	—	—	Enumerate all subsets
O(n!)	Factorial	3.6×10⁶	—	—	—	Enumerate all permutations

Key insight: The gap between O(n²) and O(n) is not 2x — it's nx. When n = 10⁶, that gap is one million times.

1.2 How to Quickly Determine Complexity

Rule 1: Count nested loop levels

# O(n) — single loop
for i in range(n):
    do_something()

# O(n²) — two nested loops
for i in range(n):
    for j in range(n):
        do_something()

# O(n³) — three nested loops
for i in range(n):
    for j in range(n):
        for k in range(n):
            do_something()

Rule 2: Watch how the loop variable changes

# O(log n) — doubles or halves each step
i = 1
while i < n:
    i *= 2  # doubles

# O(sqrt(n)) — loops up to the square root
for i in range(int(n**0.5) + 1):
    do_something()

Rule 3: Sequential → take max; Nested → multiply

# O(n) + O(n²) = O(n²)  ← take the max
for i in range(n):       # O(n)
    pass
for i in range(n):       # O(n²)
    for j in range(n):
        pass

# O(n) × O(m) = O(nm)  ← multiply
for i in range(n):       # outer: n times
    for j in range(m):   # inner: m times
        pass

Rule 4: For recursion, draw the recursion tree

# O(2ⁿ) — each call spawns two, size decreases by 1
def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)

# O(n log n) — each call spawns two, size halves
def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])    # T(n/2)
    right = merge_sort(arr[mid:])   # T(n/2)
    return merge(left, right)       # O(n)

1.3 Space Complexity

Space complexity measures how much extra memory an algorithm uses as input grows. "Extra" means we exclude the input itself.

# Space O(1) — only a few variables
def sum_array(arr):
    total = 0
    for x in arr:
        total += x
    return total

# Space O(n) — creates a new array
def double_array(arr):
    result = []
    for x in arr:
        result.append(x * 2)
    return result

# Space O(n) — recursive call stack
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)  # recursion depth is n

1.4 Common Mistakes

Mistake 1: Hidden loops

# Looks like O(n), actually O(n²)
for i in range(n):
    if i in some_list:  # `in` on list is O(n)!
        pass
# Fix: use a set
some_set = set(some_list)
for i in range(n):
    if i in some_set:   # `in` on set is O(1)
        pass

Mistake 2: String concatenation trap

# O(n²)! Each += creates a new string
s = ""
for i in range(n):
    s += str(i)  # copies the entire string

# O(n): use join
s = "".join(str(i) for i in range(n))

Mistake 3: Forgetting sort's complexity

# Looks simple, but sort is O(n log n)
arr.sort()
# So overall complexity can't be lower than O(n log n)

Mistake 4: Calling O(n) operations O(1)

# list.insert(0, x) is NOT O(1), it's O(n)
# because every element must shift right by one
arr.insert(0, 42)  # O(n)
arr.append(42)      # O(1) amortized

1.5 Python Operation Complexity Cheat Sheet

Operation	list	dict/set	Notes
Index `a[i]`	O(1)	O(1)
Append `append`	O(1) amortized	—	Occasional O(n) resize
Insert at front `insert(0,x)`	O(n)	—	Shifts all elements
Delete `del a[i]`	O(n)	O(1)	list needs to shift
Search `x in a`	O(n)	O(1)	Most common perf trap
Sort `sort()`	O(n log n)	—	TimSort
`len()`	O(1)	O(1)	Pre-stored length
Slice `a[i:j]`	O(j-i)	—	Copies elements

Level 2 · How It Works

2.1 The Mathematical Definition of Big-O

Big-O is not a rough estimate. It's a rigorous mathematical definition.

Definition: If there exist positive constants c and n₀ such that f(n) ≤ c·g(n) for all n ≥ n₀, then f(n) = O(g(n)).

In plain English: once n is large enough, f(n) never exceeds a constant multiple of g(n).

Example: Is f(n) = 3n² + 7n + 42 equal to O(n²)?

Choose c = 5, n₀ = 10. When n ≥ 10:

7n ≤ n² and 42 ≤ n², so 3n² + 7n + 42 ≤ 3n² + n² + n² = 5n²
The condition holds, so f(n) = O(n²). ✓

Big-Ω (lower bound): If there exist positive constants c and n₀ such that f(n) ≥ c·g(n) for all n ≥ n₀, then f(n) = Ω(g(n)).

Big-Θ (tight bound): If f(n) = O(g(n)) and f(n) = Ω(g(n)), then f(n) = Θ(g(n)).

In other words, Θ means the growth rate is exactly the same as g(n), up to constant factors.

Notation	Meaning	Analogy
f(n) = O(g(n))	f grows no faster than g	f ≤ g (ignoring constants)
f(n) = Ω(g(n))	f grows no slower than g	f ≥ g (ignoring constants)
f(n) = Θ(g(n))	f grows at the same rate as g	f = g (ignoring constants)

Common misuse in interviews: People say "the complexity is O(n)" when they really mean Θ(n). O(n) is just an upper bound — an O(1) algorithm is also O(n) (since O(1) ≤ O(n)), but calling it Θ(n) would be wrong. In interviews this distinction is usually not pressed, but in papers and rigorous analysis, it matters.

2.2 Best, Worst, and Average Case

The same algorithm can have different running times on different inputs.

Take linear search: find element x in an array of length n.

def linear_search(arr, x):
    for i, val in enumerate(arr):
        if val == x:
            return i
    return -1

Best case: x is at position 0 → O(1)
Worst case: x is not in the array → O(n)
Average case: assuming x is equally likely at any position → O(n/2) = O(n)

Convention: When we say "complexity is O(n)" without qualification, we usually mean the worst case. It's a conservative but safe estimate.

Quicksort is a notable exception:

Worst case: O(n²) (when the pivot is always the smallest or largest element)
Average case: O(n log n) (with random pivot selection)
In practice, we typically say quicksort is O(n log n) because the average case is more representative

2.3 Amortized Analysis

Some operations are fast most of the time but occasionally expensive. Amortized analysis spreads the occasional high cost across many cheap operations.

Classic example: Python list.append()

Python's list is backed by a dynamic array. When the array is full, append must:

Allocate a larger array (typically ~1.125x the current capacity)
Copy all elements to the new array
Free the old array

The copy is O(n). But this resize doesn't happen every append — most appends only cost O(1).

Suppose we perform n appends. How many total copies happen?

Resizing occurs at capacities 1, 2, 4, 8, 16, ..., n (assuming doubling), copying 1, 2, 4, 8, ..., n elements respectively.

Total copies = 1 + 2 + 4 + 8 + ... + n = 2n - 1 ≈ 2n

So n appends cost O(n) total, and each append costs O(n) / n = O(1) amortized.

Intuition: You save $1 a day for 99 days. On day 100, you pay $100 rent. Averaged out, you spend $2 a day. That's amortized analysis.

2.4 The Master Theorem

Many divide-and-conquer recurrences can be solved instantly with the Master Theorem.

Theorem: If T(n) = aT(n/b) + O(nᵈ), where a ≥ 1, b > 1, d ≥ 0, then:

Condition	Result
d > log_b(a)	T(n) = O(nᵈ)
d = log_b(a)	T(n) = O(nᵈ log n)
d < log_b(a)	T(n) = O(n^(log_b(a)))

Examples:

Merge sort: T(n) = 2T(n/2) + O(n)
  a=2, b=2, d=1 → log₂2 = 1 = d → O(n log n) ✓

Binary search: T(n) = T(n/2) + O(1)
  a=1, b=2, d=0 → log₂1 = 0 = d → O(log n) ✓

Karatsuba multiplication: T(n) = 3T(n/2) + O(n)
  a=3, b=2, d=1 → log₂3 ≈ 1.585 > d → O(n^1.585) ✓

Strassen matrix multiplication: T(n) = 7T(n/2) + O(n²)
  a=7, b=2, d=2 → log₂7 ≈ 2.807 > d → O(n^2.807) ✓

Memory aid: Compare d to log_b(a). If d is larger, the "merge work dominates" (result is nᵈ). If log_b(a) is larger, the "number of subproblems dominates" (result is n^(log_b(a))). If equal, multiply by log.

2.5 Recursion Tree Analysis

The Master Theorem only works for a specific form of recurrence. For irregular recursion, we draw a recursion tree.

Take the naive Fibonacci recursion:

                    fib(5)
                   /      \
              fib(4)        fib(3)
             /     \        /    \
         fib(3)   fib(2)  fib(2) fib(1)
         /   \     / \     / \
     fib(2) fib(1) ...   ...
      / \
  fib(1) fib(0)

Each level has roughly twice the nodes of the level above (not exact, since the left subtree is deeper than the right). Total nodes ≈ 2⁰ + 2¹ + ... + 2ⁿ ≈ 2ⁿ.

So naive recursive Fibonacci is O(2ⁿ). More precisely, it's O(φⁿ) where φ = (1 + √5) / 2 ≈ 1.618, the golden ratio (because the tree isn't a perfect binary tree).

Compare: Computing Fibonacci with dynamic programming (bottom-up loop) takes only O(n):

def fib_dp(n):
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n + 1):
        a, b = b, a + b
    return b

From O(1.618ⁿ) to O(n) — that's the power of algorithm design. At n = 50, naive recursion needs ~10¹⁰ operations (tens of seconds); DP needs just 48 additions (microseconds).

Level 3 · What the Spec Says

3.1 History of Bachmann-Landau Notation

Big-O notation was not invented by computer scientists.

In 1894, German mathematician Paul Bachmann first used the O notation in his book Die Analytische Zahlentheorie (Analytic Number Theory) to describe upper bounds on function growth. He wrote O(n) to mean "a quantity whose order does not exceed n."

In 1909, Edmund Landau systematized this notation in his textbooks and introduced little-o notation (strict upper bound). The notation system is therefore called Bachmann-Landau notation.

In 1976, Donald Knuth published the paper Big Omicron and Big Omega and Big Theta in SIGACT News, proposing the Θ notation and arguing why we need to distinguish between upper bounds (O), lower bounds (Ω), and tight bounds (Θ). He wrote:

"Many people have been using O-notation in a sloppy way... it is important to have a notation for exact order of growth."

Knuth's paper established the trio we use today: O, Ω, and Θ.

3.2 Three Methods of Amortized Analysis

Amortized analysis was systematized by Robert Tarjan in his 1985 paper Amortized Computational Complexity (published in SIAM Journal on Algebraic and Discrete Methods). He proposed three equivalent methods:

Method 1: Aggregate Method

Compute the total cost T(n) over n operations, then amortized cost = T(n) / n.

For dynamic array resizing:

Starting at capacity 1, doubling when full
Among n appends, resizing happens at the 1st, 2nd, 4th, 8th, ..., 2^k th append (where 2^k ≤ n)
Each resize copies i elements
Total copies = 1 + 2 + 4 + ... + 2^k ≤ 2n
Adding n regular writes: T(n) ≤ 3n
Amortized cost = 3n / n = O(1)

Method 2: Accounting Method

Assign each operation a "charge" upfront. If an operation's actual cost is below the charge, the surplus becomes "credit." If the actual cost exceeds the charge, use accumulated credit. As long as credit never goes negative, the charge is a valid amortized cost.

For dynamic arrays:

Charge 3 coins per append (instead of the actual 1)
1 coin pays for the current write
2 coins are saved as a "moving fund" for future resizing
When resizing occurs, n/2 "old elements" need copying (the other n/2 just joined and are already in place)
These n/2 old elements each have 2 coins saved, totaling n coins — exactly enough to pay for the copy
Credit is always ≥ 0, so amortized cost O(3) = O(1)

Method 3: Potential Method

Define a "potential function" Φ mapping the data structure's state to a non-negative real number. Amortized cost = actual cost + ΔΦ (change in potential).

For dynamic arrays:

Define Φ(D) = 2 × size - capacity
Regular append: actual cost = 1, Φ increases by 2 (size +1), amortized = 1 + 2 = 3
Resize append: actual cost = 1 + old_cap (copy), ΔΦ = (2 - old_cap) - old_cap = 2 - 2×old_cap
Amortized = (1 + old_cap) + (2 - 2×old_cap) = 3 - old_cap ≤ 3

All three methods yield the same conclusion: dynamic array append has O(1) amortized complexity.

3.3 Hash Table Amortized Analysis Pitfalls

Python's dict triggers a rehash when the load factor exceeds 2/3, redistributing all key-value pairs into a larger table. Like dynamic arrays, rehashing is O(n), but amortized over many insertions, each insert is still O(1).

But there's a catch: hash collisions in the worst case are not O(1).

If all keys map to the same bucket (e.g., crafted adversarial input), lookup degrades to chain traversal at O(n). CPython 3.6+ uses open addressing with perturbation to mitigate this, but the theoretical worst case remains O(n).

Java's HashMap made an interesting optimization in Java 8+: when a single bucket's chain exceeds 8 elements, it automatically converts to a red-black tree, improving worst-case from O(n) to O(log n). This is an engineering trade-off — linked lists are faster in the common case (saving the overhead of tree node space and rotations), and only "upgrade" to trees in extreme cases.

3.4 Little-o and Little-ω

Beyond Big-O/Ω/Θ, there are two less common but theoretically important notations:

Little-o: f(n) = o(g(n)) means f(n) grows strictly slower than g(n). Formally: lim(n→∞) f(n)/g(n) = 0.
- Example: n = o(n²), but n ≠ o(n).
Little-ω: f(n) = ω(g(n)) means f(n) grows strictly faster than g(n). Formally: lim(n→∞) f(n)/g(n) = ∞.

Notation	Inequality Analogy
O	≤
o	<
Ω	≥
ω	>
Θ	=

These five notations form the complete Bachmann-Landau system.

Level 4 · Edge Cases and Pitfalls

4.1 Why O(n) Can Be Faster Than O(log n)

Big-O discards constant factors. When the constant factor difference is large enough, the "better" complexity can actually be slower.

Case: Binary search vs. linear search on small arrays

import time

# Binary search O(log n)
def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

# Linear search O(n)
def linear_search(arr, target):
    for i, v in enumerate(arr):
        if v == target:
            return i
    return -1

# On arrays of size 10, linear search is often faster
arr = list(range(10))
# Linear search has a simpler loop, better branch prediction
# Binary search has more comparisons, mid calculation, conditional jumps

For n < 64, linear search frequently beats binary search. This is why many sorting algorithms (TimSort, introsort) switch to insertion sort (O(n²)) on small subarrays — for small arrays, O(n²) with a tiny constant beats O(n log n) with a larger one.

Case: Cache locality

n = 10000
matrix = [[0] * n for _ in range(n)]

# Row-major traversal (cache-friendly) — fast
for i in range(n):
    for j in range(n):
        matrix[i][j] = 1

# Column-major traversal (cache-unfriendly) — 5-10x slower
for j in range(n):
    for i in range(n):
        matrix[i][j] = 1

Both are O(n²), but row-major can be 5-10x faster. CPUs load memory in cache lines (typically 64 bytes). Row-major access hits consecutive memory, so one cache line load satisfies multiple accesses. Column-major access jumps between cache lines, causing nearly every access to miss cache.

This is why complexity analysis isn't everything. It tells you the growth trend but not the constant factor or cache behavior. In practice, these two factors often matter more than the O analysis.

4.2 Hidden Complexity in Python

Many Python operations look "lightweight" but have surprising complexity:

Operation	You'd think	Actually	Why
`x in list`	O(1)	O(n)	Linear scan
`list.insert(0, x)`	O(1)	O(n)	All elements shift right
`list.pop(0)`	O(1)	O(n)	All elements shift left
`list + list`	O(1)	O(n+m)	Creates new list and copies
`list * k`	O(1)	O(nk)	Copies k times
`str += str`	O(1)	O(n)	Creates new string
`min(list)` / `max(list)`	O(1)	O(n)	Linear scan
`list.count(x)`	O(1)	O(n)	Linear scan
`sorted(list)`	O(n)	O(n log n)	TimSort

The most dangerous is using these O(n) operations inside loops, creating O(n²):

# Disaster: O(n²)
result = []
for item in items:
    if item not in result:  # O(n) lookup
        result.append(item)

# Correct: O(n)
seen = set()
result = []
for item in items:
    if item not in seen:    # O(1) lookup
        seen.add(item)
        result.append(item)

4.3 The Stack Space Trap in Recursion

Recursive space complexity is easy to overlook. Each recursive call allocates a stack frame (local variables, return address, etc.) on the call stack.

# Space O(n) — recursion depth is n
def sum_recursive(arr, i=0):
    if i == len(arr):
        return 0
    return arr[i] + sum_recursive(arr, i + 1)

# Space O(1) — iterative
def sum_iterative(arr):
    total = 0
    for x in arr:
        total += x
    return total

Python's default recursion limit is 1000 (sys.getrecursionlimit()). Exceeding this depth raises RecursionError. For algorithms that might face large inputs, prefer loops over recursion.

But some algorithms are naturally recursive (tree traversal, divide-and-conquer). In those cases, recursion depth is typically O(log n) (the height of a balanced tree), which won't hit the limit. Only chain-like structures (linked list traversal via recursion, unbalanced binary trees) reach O(n) depth.

4.4 Interview Classic: Time vs. Space Trade-off

Many problems can be optimized by trading space for time:

# Two Sum — brute force: O(n²) time, O(1) space
def two_sum_brute(nums, target):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]

# Two Sum — hash map: O(n) time, O(n) space
def two_sum_hash(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i

How to answer in interviews: Give the brute-force solution first and analyze its complexity. Then say "we can achieve O(n) time with O(n) space by using a hash map," showing you understand the trade-off.

4.5 Competitive Programming Rules of Thumb

In contests, solutions typically need to finish within 1-2 seconds. Here's a rough guide for estimating acceptable complexity:

n range	Acceptable complexity	Typical algorithms
n ≤ 20	O(2ⁿ) or O(n!)	Brute force, bitmask DP
n ≤ 500	O(n³)	Floyd, interval DP
n ≤ 5000	O(n²)	Simple DP, brute enumeration
n ≤ 10⁶	O(n log n)	Sorting, binary search, segment trees
n ≤ 10⁸	O(n)	Two pointers, prefix sums, linear DP
n ≤ 10¹⁸	O(log n) or O(√n)	Binary search, fast exponentiation, number theory

This table is extremely valuable: When you see n ≤ 10⁶, you immediately know you need O(n log n) or better. This helps you quickly eliminate infeasible approaches in interviews.

4.6 Complexity Is Not the Only Metric

In production engineering, the following factors can matter more than O analysis:

Cache hit rate: Contiguous memory access (arrays) is far faster than random access (linked lists, trees)
Branch prediction: Simple loops are more CPU-friendly than complex conditionals
Memory allocation: Frequent malloc/free overhead can exceed the computation itself
Parallelism: A serial O(n) algorithm may be slower than a parallelizable O(n log n) one
I/O bottlenecks: When data is on disk, reducing random reads matters more than reducing computation (this is why B+Trees exist)

We'll revisit these factors throughout the book. Chapter 2 (arrays and cache friendliness), Chapter 20 (B+Trees and disk I/O), and Chapter 40 (data structure selection guide) will go deeper.

Chapter Summary

Concept	Key Point
Big-O	Upper bound on growth, drops constants and lower-order terms
Big-Ω	Lower bound on growth
Big-Θ	Tight bound on growth
Amortized analysis	Spread occasional high costs across many cheap operations
Master Theorem	Quick solution for T(n) = aT(n/b) + O(nᵈ)
Constant factors	What O discards can be decisive in practice
Cache effects	Memory access patterns can matter more than algorithmic complexity

Exercises:

Analyze the time and space complexity of:

def mystery(n):
    if n <= 0:
        return 1
    return mystery(n - 1) + mystery(n - 1)

Is collections.deque.appendleft O(1) or O(n)? Why can deque achieve O(1) for front insertion while list's insert(0, x) is O(n)?
Given n = 10⁷, which algorithms finish within 1 second (assuming 10⁸ operations/sec)?
- O(n) → 10⁷ ops → ✓
- O(n log n) → ~2.3 × 10⁸ ops → borderline
- O(n√n) → ~3.16 × 10¹⁰ ops → ✗
- O(n²) → 10¹⁴ ops → ✗

Next chapter: Chapter 2 dives into arrays — how contiguous memory makes simple traversal 10x faster than linked lists, the growth strategy inside Python's list, and why "cache friendliness" is the first rule of high-performance code.

Rate this chapter

4.7 / 5 (123 ratings)