Sorting: From Bubble Sort to TimSort
Chapter 21: Sorting — From Bubble Sort to TimSort
Open a Python interpreter, type sorted([3, 1, 4, 1, 5, 9]), and you get [1, 1, 3, 4, 5, 9] in microseconds. This seems trivial, but behind it lies over 60 years of computer science wisdom. Sorting is one of the most thoroughly studied problems in computing — from John von Neumann's merge sort implemented on EDVAC in 1945, to Tim Peters' TimSort designed for CPython in 2002, each generation of algorithms pushes closer to "perfect sorting."
Why does sorting matter so much? Because it is the foundation for other algorithms. Binary search requires sorted arrays; database indexes are essentially sorted mappings; OS process scheduling needs priority ordering; search engine results need relevance ordering. Donald Knuth dedicated an entire volume of "The Art of Computer Programming" (Volume 3) to sorting and searching, noting that "a computer spends roughly 25% of its time sorting."
In this chapter, we start from the simplest bubble sort and work our way up to industrial-grade TimSort and introsort, covering everything you need to know about sorting.
Level 1 · What You Need to Know
21.1 Overview of Eight Classic Sorting Algorithms
Before diving into each algorithm, let's see the big picture. The following table summarizes the core properties of eight classic sorting algorithms:
| Algorithm | Average Time | Worst Time | Space | Stable | Use Case |
|---|---|---|---|---|---|
| Bubble Sort | O(n²) | O(n²) | O(1) | Yes | Teaching only |
| Selection Sort | O(n²) | O(n²) | O(1) | No | Fewest swaps |
| Insertion Sort | O(n²) | O(n²) | O(1) | Yes | Small/nearly-sorted arrays |
| Merge Sort | O(n log n) | O(n log n) | O(n) | Yes | Stability needed, linked lists |
| Quick Sort | O(n log n) | O(n²) | O(log n) | No | General purpose, fastest in practice |
| Heap Sort | O(n log n) | O(n log n) | O(1) | No | Worst-case guarantee |
| Counting Sort | O(n + k) | O(n + k) | O(k) | Yes | Integers with small range |
| Radix Sort | O(d·(n+k)) | O(d·(n+k)) | O(n+k) | Yes | Fixed-width integers |
Here n is the number of elements, k is the value range, and d is the maximum number of digits.
A critical distinction: the first six are comparison sorts — they determine order only by pairwise comparisons. The last two (plus bucket sort) are non-comparison sorts — they exploit the numerical properties of elements to determine positions. This distinction matters enormously: comparison sorts have a theoretical lower bound of O(n log n), while non-comparison sorts can break through it.
21.2 Bubble Sort
Bubble sort is the most intuitive sorting algorithm: repeatedly traverse the array, compare adjacent elements, and swap them if they're in the wrong order. Each pass "bubbles" at least one largest element to its correct position.
def bubble_sort(arr):
"""Bubble sort: compare and swap adjacent elements"""
n = len(arr)
for i in range(n):
swapped = False
for j in range(n - 1 - i):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
if not swapped:
break # Optimization: if no swaps in a pass, array is sorted
return arr
Analysis:
- Best case: Array already sorted, only one pass needed, O(n)
- Worst case: Array in reverse order, n(n-1)/2 comparisons and swaps, O(n²)
- Average case: O(n²)
- Stability: Stable (equal elements are never swapped)
- Space: O(1), in-place
The only virtues of bubble sort are simplicity and stability. In practice, there is almost never a reason to use it — even among O(n²) algorithms, insertion sort is far superior.
21.3 Selection Sort
Selection sort repeatedly finds the minimum element from the unsorted portion and places it at the end of the sorted portion.
def selection_sort(arr):
"""Selection sort: find minimum, place it at front"""
n = len(arr)
for i in range(n):
min_idx = i
for j in range(i + 1, n):
if arr[j] < arr[min_idx]:
min_idx = j
arr[i], arr[min_idx] = arr[min_idx], arr[i]
return arr
Analysis:
- Comparisons are always n(n-1)/2 regardless of input, so best/worst/average are all O(n²)
- At most n-1 swaps — this is selection sort's only advantage
- Unstable: e.g.,
[5a, 5b, 3], first pass swaps 5a with 3, putting 5a after 5b - Space: O(1)
21.4 Insertion Sort
Insertion sort works like sorting a hand of cards: take a new card and find its correct position among the cards already in hand.
def insertion_sort(arr):
"""Insertion sort: insert each element into its correct position"""
for i in range(1, len(arr)):
key = arr[i]
j = i - 1
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
Analysis:
- Best case: Already sorted, inner loop never executes, O(n)
- Worst case: Reverse sorted, O(n²)
- Average case: O(n²)
- Stable: equal elements don't change relative order
- Space: O(1)
Insertion sort is the fastest O(n²) algorithm in practice, because:
- The inner loop exits early (unlike bubble and selection which always complete)
- Degrades to O(n) for nearly-sorted data
- Cache-friendly: always accesses adjacent memory
- Small constant factor: each comparison does one assignment (not a swap)
These properties make insertion sort faster than quicksort for small arrays (n < 16~64). Both TimSort and introsort switch to insertion sort for small subarrays.
21.5 Merge Sort
Merge sort is the classic embodiment of divide and conquer:
- Divide: Split the array in half
- Conquer: Recursively sort each half
- Combine: Merge the two sorted halves into one sorted array
def merge_sort(arr):
"""Merge sort: divide and conquer + merge"""
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
"""Merge two sorted arrays"""
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]: # <= ensures stability
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
Analysis:
- Time: O(n log n) regardless of input. Recurrence T(n) = 2T(n/2) + O(n), by Master Theorem gives O(n log n)
- Space: O(n) — needs extra array for merging
- Stable: using
<=instead of<in merge guarantees stability - Recursion depth: O(log n)
In-place merge sort exists but is complex (Kronrod 1969), rarely used in practice. Merge sort's main disadvantage is the O(n) extra space.
Where merge sort shines:
- Linked list sorting: lists don't support random access, so quicksort's partition degrades, but merge sort only needs pointer manipulation
- External sorting: when data doesn't fit in memory, multi-way merge is the standard approach
- When stability is required: Python's TimSort is an enhancement of merge sort
21.6 Quick Sort
Quick sort was invented by Tony Hoare in 1959 and remains one of the most widely used sorting algorithms. The core idea:
- Choose an element as the pivot
- Partition: rearrange so elements less than pivot come before it, greater after
- Recursively sort the two partitions
def quicksort(arr, lo=0, hi=None):
"""Quick sort (Lomuto partition scheme)"""
if hi is None:
hi = len(arr) - 1
if lo < hi:
pivot_idx = partition_lomuto(arr, lo, hi)
quicksort(arr, lo, pivot_idx - 1)
quicksort(arr, pivot_idx + 1, hi)
return arr
def partition_lomuto(arr, lo, hi):
"""Lomuto partition: use last element as pivot"""
pivot = arr[hi]
i = lo # i points to next position in "less than" zone
for j in range(lo, hi):
if arr[j] <= pivot:
arr[i], arr[j] = arr[j], arr[i]
i += 1
arr[i], arr[hi] = arr[hi], arr[i]
return i
Lomuto vs Hoare Partition
Lomuto partition (above):
- Simple and intuitive, one pointer i maintains the "less than" boundary
- Drawback: degrades with many duplicate elements, more swaps
Hoare partition:
- Two pointers scan from both ends toward the middle
- On average, half as many swaps
- Slightly more complex code
def partition_hoare(arr, lo, hi):
"""Hoare partition: scan from both ends"""
pivot = arr[lo]
i = lo - 1
j = hi + 1
while True:
i += 1
while arr[i] < pivot:
i += 1
j -= 1
while arr[j] > pivot:
j -= 1
if i >= j:
return j
arr[i], arr[j] = arr[j], arr[i]
Randomized Pivot
The worst case O(n²) occurs when the pivot is always the maximum or minimum element. Solution: randomly choose the pivot.
import random
def quicksort_random(arr, lo=0, hi=None):
"""Randomized quicksort"""
if hi is None:
hi = len(arr) - 1
if lo < hi:
rand_idx = random.randint(lo, hi)
arr[rand_idx], arr[hi] = arr[hi], arr[rand_idx]
pivot_idx = partition_lomuto(arr, lo, hi)
quicksort_random(arr, lo, pivot_idx - 1)
quicksort_random(arr, pivot_idx + 1, hi)
return arr
With randomization, the probability of hitting worst-case O(n²) is negligible. Expected time complexity is O(n log n).
Quicksort analysis:
- Average time: O(n log n), with smaller constant factor than merge sort
- Worst time: O(n²) (virtually impossible with randomization)
- Space: O(log n) — recursion stack depth (average case)
- Unstable: partition swaps change relative order of equal elements
21.7 Heap Sort Overview
Heap sort leverages the max-heap property: the root is always the largest element.
- Build heap: O(n)
- Repeatedly extract the root (maximum) to the end of the array, reduce heap size by 1, and sift down
def heap_sort(arr):
"""Heap sort: build heap + repeatedly extract max"""
n = len(arr)
# Build max heap (start from last non-leaf node)
for i in range(n // 2 - 1, -1, -1):
sift_down(arr, i, n)
# Extract max one by one
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0]
sift_down(arr, 0, i)
return arr
def sift_down(arr, root, size):
"""Sift down operation"""
while True:
largest = root
left = 2 * root + 1
right = 2 * root + 2
if left < size and arr[left] > arr[largest]:
largest = left
if right < size and arr[right] > arr[largest]:
largest = right
if largest == root:
break
arr[root], arr[largest] = arr[largest], arr[root]
root = largest
Analysis:
- Time: O(n log n) regardless of input
- Space: O(1), in-place
- Unstable
See Chapter 12 (Heaps and Priority Queues) for detailed discussion. The key point here: heap sort guarantees worst-case O(n log n), but is slower than quicksort in practice (cache-unfriendly — jumping access patterns).
21.8 Counting Sort
When elements are integers with a limited range, counting sort breaks through the O(n log n) barrier.
def counting_sort(arr, max_val=None):
"""Counting sort: for non-negative integers with bounded range"""
if not arr:
return arr
if max_val is None:
max_val = max(arr)
# Count occurrences
count = [0] * (max_val + 1)
for x in arr:
count[x] += 1
# Prefix sums (ensures stability)
for i in range(1, len(count)):
count[i] += count[i - 1]
# Place elements from right to left (ensures stability)
result = [0] * len(arr)
for x in reversed(arr):
count[x] -= 1
result[count[x]] = x
return result
Analysis:
- Time: O(n + k), where k is the value range
- Space: O(n + k)
- Stable
Counting sort's limitation: k cannot be too large. Sorting 1 million integers in range 0100 is extremely efficient; sorting 100 integers in range 01 billion would require a billion-element count array.
21.9 Bucket Sort
Bucket sort generalizes counting sort: distribute elements into "buckets," sort each bucket internally (usually with insertion sort), then concatenate all buckets in order.
def bucket_sort(arr, num_buckets=10):
"""Bucket sort: good for uniformly distributed floats"""
if not arr:
return arr
min_val, max_val = min(arr), max(arr)
if min_val == max_val:
return arr
bucket_range = (max_val - min_val) / num_buckets
buckets = [[] for _ in range(num_buckets)]
for x in arr:
idx = int((x - min_val) / bucket_range)
if idx == num_buckets:
idx -= 1
buckets[idx].append(x)
# Sort each bucket (using insertion sort)
result = []
for bucket in buckets:
insertion_sort(bucket)
result.extend(bucket)
return result
Analysis:
- Average time: O(n + k) (when data is uniformly distributed, each bucket has O(1) elements)
- Worst time: O(n²) (all elements fall into one bucket)
- Space: O(n + k)
- Stability depends on whether the internal sort is stable
21.10 Radix Sort
Radix sort processes digits one at a time, from least significant digit (LSD) or most significant digit (MSD), using a stable sort (like counting sort) at each digit position.
def radix_sort(arr):
"""Radix sort (LSD, least significant digit first)"""
if not arr:
return arr
max_val = max(arr)
exp = 1 # Current digit position: 1, 10, 100, ...
while max_val // exp > 0:
counting_sort_by_digit(arr, exp)
exp *= 10
return arr
def counting_sort_by_digit(arr, exp):
"""Counting sort by a specific digit"""
n = len(arr)
output = [0] * n
count = [0] * 10 # digits 0-9
for x in arr:
digit = (x // exp) % 10
count[digit] += 1
for i in range(1, 10):
count[i] += count[i - 1]
for i in range(n - 1, -1, -1):
digit = (arr[i] // exp) % 10
count[digit] -= 1
output[count[digit]] = arr[i]
for i in range(n):
arr[i] = output[i]
Analysis:
- Time: O(d·(n+k)), where d is the max number of digits, k is the radix (10 here)
- Space: O(n + k)
- Stable
Radix sort excels for fixed-width integers (phone numbers, IDs). When d is constant, time is O(n).
Level 2 · Deeper Understanding
21.11 TimSort: The Choice of Python and Java
TimSort is the sorting algorithm designed by Tim Peters in 2002 for CPython. Python's list.sort() and sorted() both use TimSort underneath, and Java's Arrays.sort() uses TimSort for object arrays. It combines merge sort's stability with insertion sort's efficiency on small or nearly-sorted data.
Core Insight: Exploit Existing Order
Real-world data is rarely completely random. A log file is roughly ordered by timestamp; a user list is roughly ordered by registration date. TimSort's key insight is: first identify naturally ordered segments (runs) in the data, then merge them together using merge sort.
TimSort Execution Flow
Step 1: Identify Runs
Scan left to right, finding natural ascending or descending sequences. Descending sequences are reversed to ascending.
def find_run(arr, lo, hi):
"""Find the natural sorted run starting at lo"""
if lo + 1 > hi:
return lo + 1
run_hi = lo + 1
if arr[run_hi] < arr[lo]:
# Strictly descending
while run_hi <= hi and arr[run_hi] < arr[run_hi - 1]:
run_hi += 1
# Reverse to ascending
reverse(arr, lo, run_hi - 1)
else:
# Non-descending
while run_hi <= hi and arr[run_hi] >= arr[run_hi - 1]:
run_hi += 1
return run_hi
Step 2: Ensure Minimum Run Length (minrun)
If a natural run is too short (shorter than minrun), extend it to minrun length using insertion sort. minrun is typically between 32 and 64, determined by the total array length.
def compute_minrun(n):
"""Compute minrun: take the top 6 bits of n, add 1 if remaining bits are non-zero"""
r = 0
while n >= 64:
r |= n & 1
n >>= 1
return n + r
This formula ensures that n/minrun is either a power of 2 or slightly less — making the final merge phase most efficient.
Step 3: Merge Runs
A stack manages the runs. Each time a new run is found, it's pushed onto the stack, then the stack is checked for merge conditions. Tim Peters defined an invariant to ensure merge efficiency:
For the top three runs with lengths A, B, C:
C > A + BB > A
If the invariant is violated, the smaller adjacent runs are merged.
Step 4: Galloping Mode (Accelerated Merge)
During a merge of two runs, if one side "wins" many times consecutively (e.g., elements from the left run are consecutively smaller than those from the right), there are likely more consecutive wins. At this point, switch to binary search mode (galloping) to find the entire winning range at once, then bulk-copy.
def gallop_left(key, arr, lo, hi):
"""Find insertion position of key in arr[lo:hi] (leftmost).
Uses galloping: exponential jumps, then binary search."""
last_ofs = 0
ofs = 1
while ofs < hi - lo and arr[lo + ofs] < key:
last_ofs = ofs
ofs = (ofs << 1) + 1 # Exponential growth: 1, 3, 7, 15, ...
ofs = min(ofs, hi - lo)
# Binary search within [last_ofs, ofs)
last_ofs += lo
ofs += lo
while last_ofs < ofs:
mid = last_ofs + (ofs - last_ofs) // 2
if arr[mid] < key:
last_ofs = mid + 1
else:
ofs = mid
return ofs
If galloping doesn't yield sufficient benefit (consecutive wins below threshold), it falls back to element-by-element comparison.
TimSort Complexity
- Best case: O(n) — data is already sorted, one run is identified and we're done
- Average case: O(n log n)
- Worst case: O(n log n)
- Space: O(n) — merge requires temporary array
- Stable
TimSort's magic lies in its performance on real-world data: existing order in the data is exploited rather than disrupted and rebuilt.
21.12 Three-Way Partition (Dutch National Flag)
Standard quicksort degrades when facing many duplicate elements. For example, sorting 1 million elements where 99% are the same value: Lomuto partition puts all equal elements on one side, causing severely unbalanced recursion.
Three-way partition divides the array into three parts: less than pivot, equal to pivot, greater than pivot. The "equal" portion is excluded from further recursion.
def quicksort_3way(arr, lo=0, hi=None):
"""Three-way quicksort (Dutch National Flag)"""
if hi is None:
hi = len(arr) - 1
if lo >= hi:
return
# Three-way partition
pivot = arr[lo]
lt = lo # arr[lo..lt-1] < pivot
gt = hi # arr[gt+1..hi] > pivot
i = lo + 1 # arr[lt..i-1] == pivot
while i <= gt:
if arr[i] < pivot:
arr[lt], arr[i] = arr[i], arr[lt]
lt += 1
i += 1
elif arr[i] > pivot:
arr[gt], arr[i] = arr[i], arr[gt]
gt -= 1
# Don't increment i: swapped element hasn't been examined
else:
i += 1
# arr[lt..gt] all equal to pivot, no further sorting needed
quicksort_3way(arr, lo, lt - 1)
quicksort_3way(arr, gt + 1, hi)
This is Dijkstra's famous "Dutch National Flag Problem" — posed in 1976, where the Dutch flag's three colors (red, white, blue) correspond to the three partitions.
21.13 Why Quicksort Is Fastest in Practice
Asymptotically, merge sort and heap sort are both O(n log n), while quicksort's worst case is O(n²). Yet quicksort is usually fastest in practice. Here's why:
1. Cache Friendliness (Cache Locality)
Quicksort's partition phase scans the array sequentially, which is extremely cache-friendly. Merge sort jumps between two arrays during merging; heap sort's parent-child node accesses are scattered, destroying spatial locality.
On modern CPUs, L1/L2/L3 cache hit rates have enormous performance impact. Quicksort's sequential access pattern results in very few cache misses.
2. Small Constant Factor
Quicksort's inner loop is extremely tight: one comparison + one pointer increment. Merge sort's merge phase requires extra arrays and copy operations.
3. In-Place Sorting
Quicksort needs only O(log n) stack space, while merge sort needs O(n) extra space. Less memory allocation = fewer system calls = faster.
4. Tail Recursion Optimization
Quicksort can apply tail recursion on the longer subarray, guaranteeing stack depth never exceeds O(log n):
def quicksort_tail_optimized(arr, lo, hi):
"""Tail-recursive optimization: always recurse on the shorter half"""
while lo < hi:
pivot_idx = partition_lomuto(arr, lo, hi)
if pivot_idx - lo < hi - pivot_idx:
quicksort_tail_optimized(arr, lo, pivot_idx - 1)
lo = pivot_idx + 1
else:
quicksort_tail_optimized(arr, pivot_idx + 1, hi)
hi = pivot_idx - 1
21.14 External Sort
When data exceeds memory capacity (e.g., 100GB of data with only 4GB of RAM), external sorting is needed. The classic approach is multi-way merge sort:
Phase 1: Generate Sorted Runs
- Split data into memory-sized chunks (e.g., 4GB each)
- Sort each chunk in memory using quicksort/TimSort, write back to disk as a sorted file
Phase 2: Multi-Way Merge
- With k sorted files, maintain a min-heap of size k
- Repeatedly extract the minimum from the heap and write to output
- Read the next element from the file that produced the minimum, push to heap
import heapq
def external_merge_sort(sorted_files, output_file):
"""Multi-way merge: merge k sorted files"""
file_handles = [open(f) for f in sorted_files]
# Use heap for k-way merge
heap = []
for i, fh in enumerate(file_handles):
line = fh.readline()
if line:
heapq.heappush(heap, (int(line.strip()), i))
with open(output_file, 'w') as out:
while heap:
val, file_idx = heapq.heappop(heap)
out.write(f"{val}\n")
next_line = file_handles[file_idx].readline()
if next_line:
heapq.heappush(heap, (int(next_line.strip()), file_idx))
for fh in file_handles:
fh.close()
Optimizations:
- Replacement selection to generate initial runs: on average produces runs 2x memory size
- Read/write buffers: reduce disk I/O operations
- Balance the number of merge ways: too many ways means tiny buffers per way, too few means more merge passes
The critical metric for external sorting is not comparisons but disk I/O operations. Modern implementations also consider SSD random read/write characteristics.
Level 3 · Theory and History
21.15 Proof of the O(n log n) Lower Bound for Comparison Sorting
Theorem: Any comparison-based sorting algorithm requires at least Omega(n log n) comparisons in the worst case.
Proof (Decision Tree Model):
Abstract any sorting algorithm as a binary decision tree:
- Each internal node is a comparison
a[i] vs a[j] - Left subtree corresponds to "less than," right to "greater than or equal"
- Each leaf corresponds to one permutation result
There are n! possible permutations of n elements, so the decision tree has at least n! leaves. A binary tree of height h has at most 2^h leaves, therefore:
2^h >= n!
h >= log2(n!)
By Stirling's approximation n! ~ (n/e)^n:
h >= log2((n/e)^n) = n * log2(n/e) = n * log2(n) - n * log2(e)
= Omega(n log n)
Information-Theoretic Interpretation:
Sorting is essentially determining which of n! permutations the input represents. Each comparison provides 1 bit of information. To distinguish among n! possibilities requires at least log2(n!) = Omega(n log n) bits.
This is why counting sort and radix sort can break through this bound — they are not purely comparison-based; they exploit numerical properties of elements (additional information sources).
21.16 History and Inventors of Sorting Algorithms
The history of sorting algorithms nearly parallels the development of computer science itself:
| Year | Algorithm | Inventor | Context |
|---|---|---|---|
| 1945 | Merge Sort | John von Neumann | Described in EDVAC report, one of the earliest computer algorithms |
| 1956 | Radix Sort | Harold H. Seward | MIT Master's thesis |
| 1959 | Quick Sort | Tony Hoare | Invented while working on a translation project at Moscow State University |
| 1964 | Heap Sort | J.W.J. Williams | Exploiting the heap data structure |
| 1964 | Heap Sort improvement | Robert W. Floyd | Optimized heap building to O(n) |
| 1969 | Shell Sort analysis | Knuth | Analyzed Shell's (1959) algorithm |
| 1976 | Dutch National Flag | Edsger Dijkstra | Three-way partition problem |
| 1993 | Introsort | David Musser | Hybrid quicksort + heapsort + insertion sort |
| 1997 | Introsort paper | David Musser | Published formal paper |
| 2002 | TimSort | Tim Peters | Designed for CPython |
| 2009 | Java TimSort | Joshua Bloch | Ported TimSort to Java |
Tony Hoare was only 25 when he invented quicksort. He later recalled: "I was working on machine translation from Russian to English and needed to sort words. I first thought of bubble sort but it was too slow. I thought for a while and discovered the partition idea."
Tim Peters submitted TimSort to the CPython mailing list with the subject "[Python-Dev] Sorting," accompanied by a 27-page design document (listsort.txt) explaining every design decision in detail. This document remains part of the CPython source tree to this day.
21.17 Strict Definition of Stability and Why It Matters
Definition: A sorting algorithm is stable if whenever a[i] == a[j] and i < j before sorting, a[i] still appears before a[j] after sorting.
In other words: equal elements maintain their original relative order.
Why does stability matter?
- Multi-key sorting: Sort by name first, then by age. If the sort is stable, people of the same age remain in name order.
students = [
("Alice", 20), ("Bob", 22), ("Charlie", 20), ("Diana", 22)
]
# Sort by name
students.sort(key=lambda x: x[0])
# [("Alice", 20), ("Bob", 22), ("Charlie", 20), ("Diana", 22)]
# Sort by age (stable sort)
students.sort(key=lambda x: x[1])
# [("Alice", 20), ("Charlie", 20), ("Bob", 22), ("Diana", 22)]
# Same-age: Alice still before Charlie, Bob still before Diana
-
Database queries: SQL's ORDER BY with multiple columns relies on stability.
-
UI sorting: When users click column headers to sort, they expect rows with equal values to maintain their previous order.
-
Correctness: In some applications, the relative order of equal elements has semantic meaning (e.g., events with the same timestamp ordered by arrival time).
Python's sort guarantees stability — this is part of the language specification, not merely an implementation detail.
21.18 introsort: The Hybrid Strategy of C++ std::sort
introsort (introspective sort) is a hybrid sorting algorithm proposed by David Musser in 1997. It is the typical implementation of C++ standard library's std::sort. It combines the strengths of three algorithms:
- Quicksort: Fastest in the average case
- Heapsort: Guarantees worst-case O(n log n)
- Insertion sort: Fastest for small arrays
Strategy:
- Start with quicksort
- If recursion depth exceeds 2*log2(n), indicating potential O(n²) degradation, switch to heapsort
- When subarray size falls below threshold (typically 16), switch to insertion sort
import math
def introsort(arr):
"""introsort: hybrid of quicksort + heapsort + insertion sort"""
max_depth = 2 * int(math.log2(len(arr))) if arr else 0
_introsort_helper(arr, 0, len(arr) - 1, max_depth)
return arr
def _introsort_helper(arr, lo, hi, depth_limit):
SIZE_THRESHOLD = 16
while hi - lo + 1 > SIZE_THRESHOLD:
if depth_limit == 0:
# Too deep, switch to heapsort
heap_sort_range(arr, lo, hi)
return
depth_limit -= 1
# Median-of-three pivot selection
mid = (lo + hi) // 2
pivot_idx = median_of_three(arr, lo, mid, hi)
arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
p = partition_lomuto(arr, lo, hi)
_introsort_helper(arr, p + 1, hi, depth_limit)
hi = p - 1 # Tail recursion optimization
# Small arrays: insertion sort
insertion_sort_range(arr, lo, hi)
def median_of_three(arr, a, b, c):
"""Median of three: return index of median among arr[a], arr[b], arr[c]"""
if arr[a] > arr[b]:
a, b = b, a
if arr[b] > arr[c]:
b, c = c, b
if arr[a] > arr[b]:
a, b = b, a
return b
introsort elegantly combines three algorithms' strengths:
- Most of the time it follows the quicksort path, enjoying quicksort's cache advantage and low constants
- If pathological input causes recursion to go too deep, it automatically switches to heapsort, guaranteeing O(n log n)
- Small subarrays use insertion sort, avoiding recursion overhead
Thus introsort is: O(n log n) worst-case, O(n log n) average, in-place, unstable.
Level 4 · Practice and Interviews
21.19 Python Sorting API Deep Dive
sort() vs sorted()
# list.sort() — in-place, returns None, only works on lists
nums = [3, 1, 4, 1, 5]
nums.sort()
# nums = [1, 1, 3, 4, 5]
# sorted() — returns new list, works on any iterable
result = sorted((3, 1, 4, 1, 5)) # tuples work too
# result = [1, 1, 3, 4, 5]
# sorted accepts generators, dict views, etc.
sorted({"b": 2, "a": 1}.items(), key=lambda x: x[1])
# [('a', 1), ('b', 2)]
Key differences:
sort()is in-place, doesn't consume extra O(n) memory (except TimSort's merge temporary space)sorted()always creates a new listsort()is a list method only;sorted()is a builtin that accepts any iterable
The key Parameter
# lambda
words = ["banana", "apple", "cherry"]
words.sort(key=lambda w: len(w))
# ['apple', 'banana', 'cherry']
# operator.itemgetter — faster than lambda (C implementation)
from operator import itemgetter
students = [("Alice", 85), ("Bob", 92), ("Charlie", 78)]
students.sort(key=itemgetter(1)) # sort by score
# [('Charlie', 78), ('Alice', 85), ('Bob', 92)]
# operator.attrgetter — sort by attribute
from operator import attrgetter
from dataclasses import dataclass
@dataclass
class Student:
name: str
gpa: float
students = [Student("Alice", 3.8), Student("Bob", 3.9)]
students.sort(key=attrgetter('gpa'))
# Multi-level sort: GPA descending, then name ascending
students.sort(key=lambda s: (-s.gpa, s.name))
functools.cmp_to_key
Python 3 removed the cmp parameter from sort, but some comparison logic is hard to express with key (e.g., custom comparison functions). Use cmp_to_key:
from functools import cmp_to_key
def compare_versions(v1, v2):
"""Compare version strings: '1.2.3' vs '1.2.10'"""
parts1 = list(map(int, v1.split('.')))
parts2 = list(map(int, v2.split('.')))
for p1, p2 in zip(parts1, parts2):
if p1 < p2:
return -1
elif p1 > p2:
return 1
return len(parts1) - len(parts2)
versions = ["1.2.3", "1.2.10", "1.1.0", "2.0.0"]
versions.sort(key=cmp_to_key(compare_versions))
# ['1.1.0', '1.2.3', '1.2.10', '2.0.0']
Classic interview application — Largest Number (LeetCode 179):
def largest_number(nums):
"""Given integers, concatenate them to form the largest number"""
strs = [str(n) for n in nums]
def compare(a, b):
if a + b > b + a:
return -1
elif a + b < b + a:
return 1
return 0
strs.sort(key=cmp_to_key(compare))
result = ''.join(strs)
return '0' if result[0] == '0' else result
21.20 High-Frequency Interview Sorting Problems
Problem 1: Merge Intervals (LeetCode 56)
Given a collection of intervals [start, end], merge all overlapping intervals.
def merge_intervals(intervals):
"""
Sort then single-pass merge.
Time O(n log n), Space O(n)
"""
if not intervals:
return []
# Sort by start point
intervals.sort(key=lambda x: x[0])
merged = [intervals[0]]
for start, end in intervals[1:]:
if start <= merged[-1][1]:
# Overlapping: merge (take larger endpoint)
merged[-1] = [merged[-1][0], max(merged[-1][1], end)]
else:
merged.append([start, end])
return merged
# Test
print(merge_intervals([[1,3],[2,6],[8,10],[15,18]]))
# [[1, 6], [8, 10], [15, 18]]
Insight: After sorting, overlapping intervals are necessarily adjacent. So sort + linear scan solves the problem.
Problem 2: Maximum Gap (LeetCode 164)
Given an unsorted array, find the maximum difference between successive elements in sorted form. Required: O(n) time and space.
def maximum_gap(nums):
"""
Bucket sort insight: max gap >= ceil((max-min)/(n-1))
Set bucket size to floor((max-min)/(n-1)), answer must be between buckets (not within)
"""
n = len(nums)
if n < 2:
return 0
min_val, max_val = min(nums), max(nums)
if min_val == max_val:
return 0
# Bucket size and count
bucket_size = max(1, (max_val - min_val) // (n - 1))
bucket_count = (max_val - min_val) // bucket_size + 1
# Each bucket stores only min and max
buckets_min = [float('inf')] * bucket_count
buckets_max = [float('-inf')] * bucket_count
for num in nums:
idx = (num - min_val) // bucket_size
buckets_min[idx] = min(buckets_min[idx], num)
buckets_max[idx] = max(buckets_max[idx], num)
# Max gap = some bucket's min - previous non-empty bucket's max
max_gap = 0
prev_max = buckets_max[0]
for i in range(1, bucket_count):
if buckets_min[i] == float('inf'):
continue # Skip empty buckets
max_gap = max(max_gap, buckets_min[i] - prev_max)
prev_max = buckets_max[i]
return max_gap
Key insight: Pigeonhole principle — if n numbers are uniformly spread over [min, max], the average gap between adjacent elements is (max-min)/(n-1). The maximum gap must be >= this average. Setting bucket size to this average ensures the maximum gap cannot occur within a bucket, only between adjacent buckets.
Problem 3: Sort Colors (LeetCode 75, Dutch National Flag)
Array contains only 0, 1, 2. Sort in-place with a single pass.
def sort_colors(nums):
"""
Dutch National Flag: three-way partition
lo: boundary of 0s (right side)
hi: boundary of 2s (left side)
i: current pointer
"""
lo, i, hi = 0, 0, len(nums) - 1
while i <= hi:
if nums[i] == 0:
nums[lo], nums[i] = nums[i], nums[lo]
lo += 1
i += 1
elif nums[i] == 2:
nums[hi], nums[i] = nums[i], nums[hi]
hi -= 1
# Don't increment i: swapped value hasn't been examined
else: # nums[i] == 1
i += 1
# Test
arr = [2, 0, 2, 1, 1, 0]
sort_colors(arr)
print(arr) # [0, 0, 1, 1, 2, 2]
This is a direct interview application of Dijkstra's Dutch National Flag algorithm. Note it's the same idea as quicksort's three-way partition.
21.21 When to Use Which Sort
In practice, sorting algorithm choice depends on:
| Scenario | Recommended | Reason |
|---|---|---|
| General purpose | TimSort (Python/Java default) | Best for real-world data |
| Small arrays (n < 64) | Insertion sort | Small constant, cache-friendly |
| Large arrays, stability not needed | Quicksort/introsort | Fastest on average |
| Integers, small range | Counting sort | O(n + k), linear time |
| Integers, fixed width | Radix sort | O(d*n), linear time |
| Nearly sorted data | TimSort or insertion sort | Exploits existing order |
| Stability required | Merge sort/TimSort | Quicksort is unstable |
| Memory constrained | Heap sort | O(1) extra space |
| Data doesn't fit in memory | External merge sort | Minimizes disk I/O |
| Linked list sorting | Merge sort | No random access needed |
| Worst-case guarantee needed | Heap sort/introsort | Worst-case O(n log n) |
Practical Python advice:
- 99% of the time, just use
sorted()or.sort()— they use TimSort, which is excellent - If elements are small-range integers and performance is critical, hand-write counting sort
- If data exceeds memory, use external sorting (or tools like databases/Spark)
21.22 Parallelization Potential of Sorting Algorithms
Different sorting algorithms have vastly different parallelization potential:
Merge Sort — Naturally Parallel
The divide-and-conquer structure means both halves can be sorted independently, and the merge phase can be partially parallelized:
from concurrent.futures import ThreadPoolExecutor
def parallel_merge_sort(arr, depth=0, max_depth=3):
"""Parallel merge sort (using thread pool)"""
if len(arr) <= 1:
return arr
mid = len(arr) // 2
if depth < max_depth:
with ThreadPoolExecutor(max_workers=2) as executor:
left_future = executor.submit(parallel_merge_sort, arr[:mid], depth+1, max_depth)
right_future = executor.submit(parallel_merge_sort, arr[mid:], depth+1, max_depth)
left = left_future.result()
right = right_future.result()
else:
left = parallel_merge_sort(arr[:mid], depth+1, max_depth)
right = parallel_merge_sort(arr[mid:], depth+1, max_depth)
return merge(left, right)
Note: Python's GIL limits multi-threaded parallelism. For true parallel sorting, use multiprocessing or C extensions.
Quicksort — Also Parallelizable
After partition, both halves are independent, but partition itself is serial. Quicksort's parallel efficiency depends on partition balance.
Sorting Networks — Best for Hardware Parallelism
Sorting networks are fixed compare-and-swap sequences where all comparisons are determined at compile time, making them ideal for GPU and FPGA parallel implementations. Batcher's bitonic merge sort is the classic parallel sorting network.
Parallel Sorting in Practice:
- C++
std::execution::parpolicy - Java's
Arrays.parallelSort()— based on parallel merge sort - Python:
multiprocessingor NumPy (C-level parallelism underneath) - Big data: MapReduce/Spark distributed sorting
Chapter Summary
Sorting is computer science's "litmus test" — seemingly simple, yet involving core algorithmic ideas: divide and conquer, recursion, randomization, amortized analysis, cache optimization, and parallelization.
Key takeaways:
- Comparison sorting has a theoretical lower bound of O(n log n), but non-comparison sorts can achieve O(n) under specific conditions
- Quicksort is fastest in practice due to cache friendliness and small constants — but needs randomized pivot to avoid worst case
- TimSort is the optimal general-purpose sort for real-world data, exploiting existing order; it's the default in Python/Java
- introsort is C++'s choice: quicksort's speed + heapsort's worst-case guarantee + insertion sort's small-array efficiency
- Stability is a practical requirement: multi-key sorting, database queries, and UI sorting all need it
- Choose the right algorithm based on data characteristics: size, value range, distribution, stability requirements, memory constraints
In the next chapter, we'll discuss search and matching algorithms — close relatives of sorting. Many search problems begin with a sort step.