Chapter 21

Sorting: From Bubble Sort to TimSort

Chapter 21: Sorting — From Bubble Sort to TimSort

Open a Python interpreter, type sorted([3, 1, 4, 1, 5, 9]), and you get [1, 1, 3, 4, 5, 9] in microseconds. This seems trivial, but behind it lies over 60 years of computer science wisdom. Sorting is one of the most thoroughly studied problems in computing — from John von Neumann's merge sort implemented on EDVAC in 1945, to Tim Peters' TimSort designed for CPython in 2002, each generation of algorithms pushes closer to "perfect sorting."

Why does sorting matter so much? Because it is the foundation for other algorithms. Binary search requires sorted arrays; database indexes are essentially sorted mappings; OS process scheduling needs priority ordering; search engine results need relevance ordering. Donald Knuth dedicated an entire volume of "The Art of Computer Programming" (Volume 3) to sorting and searching, noting that "a computer spends roughly 25% of its time sorting."

In this chapter, we start from the simplest bubble sort and work our way up to industrial-grade TimSort and introsort, covering everything you need to know about sorting.

Level 1 · What You Need to Know

21.1 Overview of Eight Classic Sorting Algorithms

Before diving into each algorithm, let's see the big picture. The following table summarizes the core properties of eight classic sorting algorithms:

Algorithm	Average Time	Worst Time	Space	Stable	Use Case
Bubble Sort	O(n²)	O(n²)	O(1)	Yes	Teaching only
Selection Sort	O(n²)	O(n²)	O(1)	No	Fewest swaps
Insertion Sort	O(n²)	O(n²)	O(1)	Yes	Small/nearly-sorted arrays
Merge Sort	O(n log n)	O(n log n)	O(n)	Yes	Stability needed, linked lists
Quick Sort	O(n log n)	O(n²)	O(log n)	No	General purpose, fastest in practice
Heap Sort	O(n log n)	O(n log n)	O(1)	No	Worst-case guarantee
Counting Sort	O(n + k)	O(n + k)	O(k)	Yes	Integers with small range
Radix Sort	O(d·(n+k))	O(d·(n+k))	O(n+k)	Yes	Fixed-width integers

Here n is the number of elements, k is the value range, and d is the maximum number of digits.

A critical distinction: the first six are comparison sorts — they determine order only by pairwise comparisons. The last two (plus bucket sort) are non-comparison sorts — they exploit the numerical properties of elements to determine positions. This distinction matters enormously: comparison sorts have a theoretical lower bound of O(n log n), while non-comparison sorts can break through it.

21.2 Bubble Sort

Bubble sort is the most intuitive sorting algorithm: repeatedly traverse the array, compare adjacent elements, and swap them if they're in the wrong order. Each pass "bubbles" at least one largest element to its correct position.

def bubble_sort(arr):
    """Bubble sort: compare and swap adjacent elements"""
    n = len(arr)
    for i in range(n):
        swapped = False
        for j in range(n - 1 - i):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                swapped = True
        if not swapped:
            break  # Optimization: if no swaps in a pass, array is sorted
    return arr

Analysis:

Best case: Array already sorted, only one pass needed, O(n)
Worst case: Array in reverse order, n(n-1)/2 comparisons and swaps, O(n²)
Average case: O(n²)
Stability: Stable (equal elements are never swapped)
Space: O(1), in-place

The only virtues of bubble sort are simplicity and stability. In practice, there is almost never a reason to use it — even among O(n²) algorithms, insertion sort is far superior.

21.3 Selection Sort

Selection sort repeatedly finds the minimum element from the unsorted portion and places it at the end of the sorted portion.

def selection_sort(arr):
    """Selection sort: find minimum, place it at front"""
    n = len(arr)
    for i in range(n):
        min_idx = i
        for j in range(i + 1, n):
            if arr[j] < arr[min_idx]:
                min_idx = j
        arr[i], arr[min_idx] = arr[min_idx], arr[i]
    return arr

Analysis:

Comparisons are always n(n-1)/2 regardless of input, so best/worst/average are all O(n²)
At most n-1 swaps — this is selection sort's only advantage
Unstable: e.g., [5a, 5b, 3], first pass swaps 5a with 3, putting 5a after 5b
Space: O(1)

21.4 Insertion Sort

Insertion sort works like sorting a hand of cards: take a new card and find its correct position among the cards already in hand.

def insertion_sort(arr):
    """Insertion sort: insert each element into its correct position"""
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        while j >= 0 and arr[j] > key:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key
    return arr

Analysis:

Best case: Already sorted, inner loop never executes, O(n)
Worst case: Reverse sorted, O(n²)
Average case: O(n²)
Stable: equal elements don't change relative order
Space: O(1)

Insertion sort is the fastest O(n²) algorithm in practice, because:

The inner loop exits early (unlike bubble and selection which always complete)
Degrades to O(n) for nearly-sorted data
Cache-friendly: always accesses adjacent memory
Small constant factor: each comparison does one assignment (not a swap)

These properties make insertion sort faster than quicksort for small arrays (n < 16~64). Both TimSort and introsort switch to insertion sort for small subarrays.

21.5 Merge Sort

Merge sort is the classic embodiment of divide and conquer:

Divide: Split the array in half
Conquer: Recursively sort each half
Combine: Merge the two sorted halves into one sorted array

def merge_sort(arr):
    """Merge sort: divide and conquer + merge"""
    if len(arr) <= 1:
        return arr
    
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    
    return merge(left, right)


def merge(left, right):
    """Merge two sorted arrays"""
    result = []
    i = j = 0
    
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:  # <= ensures stability
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    
    result.extend(left[i:])
    result.extend(right[j:])
    return result

Analysis:

Time: O(n log n) regardless of input. Recurrence T(n) = 2T(n/2) + O(n), by Master Theorem gives O(n log n)
Space: O(n) — needs extra array for merging
Stable: using <= instead of < in merge guarantees stability
Recursion depth: O(log n)

In-place merge sort exists but is complex (Kronrod 1969), rarely used in practice. Merge sort's main disadvantage is the O(n) extra space.

Where merge sort shines:

Linked list sorting: lists don't support random access, so quicksort's partition degrades, but merge sort only needs pointer manipulation
External sorting: when data doesn't fit in memory, multi-way merge is the standard approach
When stability is required: Python's TimSort is an enhancement of merge sort

21.6 Quick Sort

Quick sort was invented by Tony Hoare in 1959 and remains one of the most widely used sorting algorithms. The core idea:

Choose an element as the pivot
Partition: rearrange so elements less than pivot come before it, greater after
Recursively sort the two partitions

def quicksort(arr, lo=0, hi=None):
    """Quick sort (Lomuto partition scheme)"""
    if hi is None:
        hi = len(arr) - 1
    if lo < hi:
        pivot_idx = partition_lomuto(arr, lo, hi)
        quicksort(arr, lo, pivot_idx - 1)
        quicksort(arr, pivot_idx + 1, hi)
    return arr


def partition_lomuto(arr, lo, hi):
    """Lomuto partition: use last element as pivot"""
    pivot = arr[hi]
    i = lo  # i points to next position in "less than" zone
    for j in range(lo, hi):
        if arr[j] <= pivot:
            arr[i], arr[j] = arr[j], arr[i]
            i += 1
    arr[i], arr[hi] = arr[hi], arr[i]
    return i

Lomuto vs Hoare Partition

Lomuto partition (above):

Simple and intuitive, one pointer i maintains the "less than" boundary
Drawback: degrades with many duplicate elements, more swaps

Hoare partition:

Two pointers scan from both ends toward the middle
On average, half as many swaps
Slightly more complex code

def partition_hoare(arr, lo, hi):
    """Hoare partition: scan from both ends"""
    pivot = arr[lo]
    i = lo - 1
    j = hi + 1
    while True:
        i += 1
        while arr[i] < pivot:
            i += 1
        j -= 1
        while arr[j] > pivot:
            j -= 1
        if i >= j:
            return j
        arr[i], arr[j] = arr[j], arr[i]

Randomized Pivot

The worst case O(n²) occurs when the pivot is always the maximum or minimum element. Solution: randomly choose the pivot.

import random

def quicksort_random(arr, lo=0, hi=None):
    """Randomized quicksort"""
    if hi is None:
        hi = len(arr) - 1
    if lo < hi:
        rand_idx = random.randint(lo, hi)
        arr[rand_idx], arr[hi] = arr[hi], arr[rand_idx]
        pivot_idx = partition_lomuto(arr, lo, hi)
        quicksort_random(arr, lo, pivot_idx - 1)
        quicksort_random(arr, pivot_idx + 1, hi)
    return arr

With randomization, the probability of hitting worst-case O(n²) is negligible. Expected time complexity is O(n log n).

Quicksort analysis:

Average time: O(n log n), with smaller constant factor than merge sort
Worst time: O(n²) (virtually impossible with randomization)
Space: O(log n) — recursion stack depth (average case)
Unstable: partition swaps change relative order of equal elements

21.7 Heap Sort Overview

Heap sort leverages the max-heap property: the root is always the largest element.

Build heap: O(n)
Repeatedly extract the root (maximum) to the end of the array, reduce heap size by 1, and sift down

def heap_sort(arr):
    """Heap sort: build heap + repeatedly extract max"""
    n = len(arr)
    
    # Build max heap (start from last non-leaf node)
    for i in range(n // 2 - 1, -1, -1):
        sift_down(arr, i, n)
    
    # Extract max one by one
    for i in range(n - 1, 0, -1):
        arr[0], arr[i] = arr[i], arr[0]
        sift_down(arr, 0, i)
    
    return arr


def sift_down(arr, root, size):
    """Sift down operation"""
    while True:
        largest = root
        left = 2 * root + 1
        right = 2 * root + 2
        
        if left < size and arr[left] > arr[largest]:
            largest = left
        if right < size and arr[right] > arr[largest]:
            largest = right
        
        if largest == root:
            break
        arr[root], arr[largest] = arr[largest], arr[root]
        root = largest

Analysis:

Time: O(n log n) regardless of input
Space: O(1), in-place
Unstable

See Chapter 12 (Heaps and Priority Queues) for detailed discussion. The key point here: heap sort guarantees worst-case O(n log n), but is slower than quicksort in practice (cache-unfriendly — jumping access patterns).

21.8 Counting Sort

When elements are integers with a limited range, counting sort breaks through the O(n log n) barrier.

def counting_sort(arr, max_val=None):
    """Counting sort: for non-negative integers with bounded range"""
    if not arr:
        return arr
    
    if max_val is None:
        max_val = max(arr)
    
    # Count occurrences
    count = [0] * (max_val + 1)
    for x in arr:
        count[x] += 1
    
    # Prefix sums (ensures stability)
    for i in range(1, len(count)):
        count[i] += count[i - 1]
    
    # Place elements from right to left (ensures stability)
    result = [0] * len(arr)
    for x in reversed(arr):
        count[x] -= 1
        result[count[x]] = x
    
    return result

Analysis:

Time: O(n + k), where k is the value range
Space: O(n + k)
Stable

Counting sort's limitation: k cannot be too large. Sorting 1 million integers in range 0~~100 is extremely efficient; sorting 100 integers in range 0~~1 billion would require a billion-element count array.

21.9 Bucket Sort

Bucket sort generalizes counting sort: distribute elements into "buckets," sort each bucket internally (usually with insertion sort), then concatenate all buckets in order.

def bucket_sort(arr, num_buckets=10):
    """Bucket sort: good for uniformly distributed floats"""
    if not arr:
        return arr
    
    min_val, max_val = min(arr), max(arr)
    if min_val == max_val:
        return arr
    
    bucket_range = (max_val - min_val) / num_buckets
    buckets = [[] for _ in range(num_buckets)]
    
    for x in arr:
        idx = int((x - min_val) / bucket_range)
        if idx == num_buckets:
            idx -= 1
        buckets[idx].append(x)
    
    # Sort each bucket (using insertion sort)
    result = []
    for bucket in buckets:
        insertion_sort(bucket)
        result.extend(bucket)
    
    return result

Analysis:

Average time: O(n + k) (when data is uniformly distributed, each bucket has O(1) elements)
Worst time: O(n²) (all elements fall into one bucket)
Space: O(n + k)
Stability depends on whether the internal sort is stable

21.10 Radix Sort

Radix sort processes digits one at a time, from least significant digit (LSD) or most significant digit (MSD), using a stable sort (like counting sort) at each digit position.

def radix_sort(arr):
    """Radix sort (LSD, least significant digit first)"""
    if not arr:
        return arr
    
    max_val = max(arr)
    exp = 1  # Current digit position: 1, 10, 100, ...
    
    while max_val // exp > 0:
        counting_sort_by_digit(arr, exp)
        exp *= 10
    
    return arr


def counting_sort_by_digit(arr, exp):
    """Counting sort by a specific digit"""
    n = len(arr)
    output = [0] * n
    count = [0] * 10  # digits 0-9
    
    for x in arr:
        digit = (x // exp) % 10
        count[digit] += 1
    
    for i in range(1, 10):
        count[i] += count[i - 1]
    
    for i in range(n - 1, -1, -1):
        digit = (arr[i] // exp) % 10
        count[digit] -= 1
        output[count[digit]] = arr[i]
    
    for i in range(n):
        arr[i] = output[i]

Analysis:

Time: O(d·(n+k)), where d is the max number of digits, k is the radix (10 here)
Space: O(n + k)
Stable

Radix sort excels for fixed-width integers (phone numbers, IDs). When d is constant, time is O(n).

Level 2 · Deeper Understanding

21.11 TimSort: The Choice of Python and Java

TimSort is the sorting algorithm designed by Tim Peters in 2002 for CPython. Python's list.sort() and sorted() both use TimSort underneath, and Java's Arrays.sort() uses TimSort for object arrays. It combines merge sort's stability with insertion sort's efficiency on small or nearly-sorted data.

Core Insight: Exploit Existing Order

Real-world data is rarely completely random. A log file is roughly ordered by timestamp; a user list is roughly ordered by registration date. TimSort's key insight is: first identify naturally ordered segments (runs) in the data, then merge them together using merge sort.

TimSort Execution Flow

Step 1: Identify Runs

Scan left to right, finding natural ascending or descending sequences. Descending sequences are reversed to ascending.

def find_run(arr, lo, hi):
    """Find the natural sorted run starting at lo"""
    if lo + 1 > hi:
        return lo + 1
    
    run_hi = lo + 1
    if arr[run_hi] < arr[lo]:
        # Strictly descending
        while run_hi <= hi and arr[run_hi] < arr[run_hi - 1]:
            run_hi += 1
        # Reverse to ascending
        reverse(arr, lo, run_hi - 1)
    else:
        # Non-descending
        while run_hi <= hi and arr[run_hi] >= arr[run_hi - 1]:
            run_hi += 1
    
    return run_hi

Step 2: Ensure Minimum Run Length (minrun)

If a natural run is too short (shorter than minrun), extend it to minrun length using insertion sort. minrun is typically between 32 and 64, determined by the total array length.

def compute_minrun(n):
    """Compute minrun: take the top 6 bits of n, add 1 if remaining bits are non-zero"""
    r = 0
    while n >= 64:
        r |= n & 1
        n >>= 1
    return n + r

This formula ensures that n/minrun is either a power of 2 or slightly less — making the final merge phase most efficient.

Step 3: Merge Runs

A stack manages the runs. Each time a new run is found, it's pushed onto the stack, then the stack is checked for merge conditions. Tim Peters defined an invariant to ensure merge efficiency:

For the top three runs with lengths A, B, C:

C > A + B
B > A

If the invariant is violated, the smaller adjacent runs are merged.

Step 4: Galloping Mode (Accelerated Merge)

During a merge of two runs, if one side "wins" many times consecutively (e.g., elements from the left run are consecutively smaller than those from the right), there are likely more consecutive wins. At this point, switch to binary search mode (galloping) to find the entire winning range at once, then bulk-copy.

def gallop_left(key, arr, lo, hi):
    """Find insertion position of key in arr[lo:hi] (leftmost).
    Uses galloping: exponential jumps, then binary search."""
    last_ofs = 0
    ofs = 1
    
    while ofs < hi - lo and arr[lo + ofs] < key:
        last_ofs = ofs
        ofs = (ofs << 1) + 1  # Exponential growth: 1, 3, 7, 15, ...
    
    ofs = min(ofs, hi - lo)
    
    # Binary search within [last_ofs, ofs)
    last_ofs += lo
    ofs += lo
    while last_ofs < ofs:
        mid = last_ofs + (ofs - last_ofs) // 2
        if arr[mid] < key:
            last_ofs = mid + 1
        else:
            ofs = mid
    
    return ofs

If galloping doesn't yield sufficient benefit (consecutive wins below threshold), it falls back to element-by-element comparison.

TimSort Complexity

Best case: O(n) — data is already sorted, one run is identified and we're done
Average case: O(n log n)
Worst case: O(n log n)
Space: O(n) — merge requires temporary array
Stable

TimSort's magic lies in its performance on real-world data: existing order in the data is exploited rather than disrupted and rebuilt.

21.12 Three-Way Partition (Dutch National Flag)

Standard quicksort degrades when facing many duplicate elements. For example, sorting 1 million elements where 99% are the same value: Lomuto partition puts all equal elements on one side, causing severely unbalanced recursion.

Three-way partition divides the array into three parts: less than pivot, equal to pivot, greater than pivot. The "equal" portion is excluded from further recursion.

def quicksort_3way(arr, lo=0, hi=None):
    """Three-way quicksort (Dutch National Flag)"""
    if hi is None:
        hi = len(arr) - 1
    if lo >= hi:
        return
    
    # Three-way partition
    pivot = arr[lo]
    lt = lo      # arr[lo..lt-1] < pivot
    gt = hi      # arr[gt+1..hi] > pivot
    i = lo + 1   # arr[lt..i-1] == pivot
    
    while i <= gt:
        if arr[i] < pivot:
            arr[lt], arr[i] = arr[i], arr[lt]
            lt += 1
            i += 1
        elif arr[i] > pivot:
            arr[gt], arr[i] = arr[i], arr[gt]
            gt -= 1
            # Don't increment i: swapped element hasn't been examined
        else:
            i += 1
    
    # arr[lt..gt] all equal to pivot, no further sorting needed
    quicksort_3way(arr, lo, lt - 1)
    quicksort_3way(arr, gt + 1, hi)

This is Dijkstra's famous "Dutch National Flag Problem" — posed in 1976, where the Dutch flag's three colors (red, white, blue) correspond to the three partitions.

21.13 Why Quicksort Is Fastest in Practice

Asymptotically, merge sort and heap sort are both O(n log n), while quicksort's worst case is O(n²). Yet quicksort is usually fastest in practice. Here's why:

1. Cache Friendliness (Cache Locality)

Quicksort's partition phase scans the array sequentially, which is extremely cache-friendly. Merge sort jumps between two arrays during merging; heap sort's parent-child node accesses are scattered, destroying spatial locality.

On modern CPUs, L1/L2/L3 cache hit rates have enormous performance impact. Quicksort's sequential access pattern results in very few cache misses.

2. Small Constant Factor

Quicksort's inner loop is extremely tight: one comparison + one pointer increment. Merge sort's merge phase requires extra arrays and copy operations.

3. In-Place Sorting

Quicksort needs only O(log n) stack space, while merge sort needs O(n) extra space. Less memory allocation = fewer system calls = faster.

4. Tail Recursion Optimization

Quicksort can apply tail recursion on the longer subarray, guaranteeing stack depth never exceeds O(log n):

def quicksort_tail_optimized(arr, lo, hi):
    """Tail-recursive optimization: always recurse on the shorter half"""
    while lo < hi:
        pivot_idx = partition_lomuto(arr, lo, hi)
        if pivot_idx - lo < hi - pivot_idx:
            quicksort_tail_optimized(arr, lo, pivot_idx - 1)
            lo = pivot_idx + 1
        else:
            quicksort_tail_optimized(arr, pivot_idx + 1, hi)
            hi = pivot_idx - 1

21.14 External Sort

When data exceeds memory capacity (e.g., 100GB of data with only 4GB of RAM), external sorting is needed. The classic approach is multi-way merge sort:

Phase 1: Generate Sorted Runs

Split data into memory-sized chunks (e.g., 4GB each)
Sort each chunk in memory using quicksort/TimSort, write back to disk as a sorted file

Phase 2: Multi-Way Merge

With k sorted files, maintain a min-heap of size k
Repeatedly extract the minimum from the heap and write to output
Read the next element from the file that produced the minimum, push to heap

import heapq

def external_merge_sort(sorted_files, output_file):
    """Multi-way merge: merge k sorted files"""
    file_handles = [open(f) for f in sorted_files]
    
    # Use heap for k-way merge
    heap = []
    for i, fh in enumerate(file_handles):
        line = fh.readline()
        if line:
            heapq.heappush(heap, (int(line.strip()), i))
    
    with open(output_file, 'w') as out:
        while heap:
            val, file_idx = heapq.heappop(heap)
            out.write(f"{val}\n")
            
            next_line = file_handles[file_idx].readline()
            if next_line:
                heapq.heappush(heap, (int(next_line.strip()), file_idx))
    
    for fh in file_handles:
        fh.close()

Optimizations:

Replacement selection to generate initial runs: on average produces runs 2x memory size
Read/write buffers: reduce disk I/O operations
Balance the number of merge ways: too many ways means tiny buffers per way, too few means more merge passes

The critical metric for external sorting is not comparisons but disk I/O operations. Modern implementations also consider SSD random read/write characteristics.

Level 3 · Theory and History

21.15 Proof of the O(n log n) Lower Bound for Comparison Sorting

Theorem: Any comparison-based sorting algorithm requires at least Omega(n log n) comparisons in the worst case.

Proof (Decision Tree Model):

Abstract any sorting algorithm as a binary decision tree:

Each internal node is a comparison a[i] vs a[j]
Left subtree corresponds to "less than," right to "greater than or equal"
Each leaf corresponds to one permutation result

There are n! possible permutations of n elements, so the decision tree has at least n! leaves. A binary tree of height h has at most 2^h leaves, therefore:

2^h >= n!
h >= log2(n!)

By Stirling's approximation n! ~ (n/e)^n:

h >= log2((n/e)^n) = n * log2(n/e) = n * log2(n) - n * log2(e)
  = Omega(n log n)

Information-Theoretic Interpretation:

Sorting is essentially determining which of n! permutations the input represents. Each comparison provides 1 bit of information. To distinguish among n! possibilities requires at least log2(n!) = Omega(n log n) bits.

This is why counting sort and radix sort can break through this bound — they are not purely comparison-based; they exploit numerical properties of elements (additional information sources).

21.16 History and Inventors of Sorting Algorithms

The history of sorting algorithms nearly parallels the development of computer science itself:

Year	Algorithm	Inventor	Context
1945	Merge Sort	John von Neumann	Described in EDVAC report, one of the earliest computer algorithms
1956	Radix Sort	Harold H. Seward	MIT Master's thesis
1959	Quick Sort	Tony Hoare	Invented while working on a translation project at Moscow State University
1964	Heap Sort	J.W.J. Williams	Exploiting the heap data structure
1964	Heap Sort improvement	Robert W. Floyd	Optimized heap building to O(n)
1969	Shell Sort analysis	Knuth	Analyzed Shell's (1959) algorithm
1976	Dutch National Flag	Edsger Dijkstra	Three-way partition problem
1993	Introsort	David Musser	Hybrid quicksort + heapsort + insertion sort
1997	Introsort paper	David Musser	Published formal paper
2002	TimSort	Tim Peters	Designed for CPython
2009	Java TimSort	Joshua Bloch	Ported TimSort to Java

Tony Hoare was only 25 when he invented quicksort. He later recalled: "I was working on machine translation from Russian to English and needed to sort words. I first thought of bubble sort but it was too slow. I thought for a while and discovered the partition idea."

Tim Peters submitted TimSort to the CPython mailing list with the subject "[Python-Dev] Sorting," accompanied by a 27-page design document (listsort.txt) explaining every design decision in detail. This document remains part of the CPython source tree to this day.

21.17 Strict Definition of Stability and Why It Matters

Definition: A sorting algorithm is stable if whenever a[i] == a[j] and i < j before sorting, a[i] still appears before a[j] after sorting.

In other words: equal elements maintain their original relative order.

Why does stability matter?

Multi-key sorting: Sort by name first, then by age. If the sort is stable, people of the same age remain in name order.

students = [
    ("Alice", 20), ("Bob", 22), ("Charlie", 20), ("Diana", 22)
]

# Sort by name
students.sort(key=lambda x: x[0])
# [("Alice", 20), ("Bob", 22), ("Charlie", 20), ("Diana", 22)]

# Sort by age (stable sort)
students.sort(key=lambda x: x[1])
# [("Alice", 20), ("Charlie", 20), ("Bob", 22), ("Diana", 22)]
# Same-age: Alice still before Charlie, Bob still before Diana

Database queries: SQL's ORDER BY with multiple columns relies on stability.
UI sorting: When users click column headers to sort, they expect rows with equal values to maintain their previous order.
Correctness: In some applications, the relative order of equal elements has semantic meaning (e.g., events with the same timestamp ordered by arrival time).

Python's sort guarantees stability — this is part of the language specification, not merely an implementation detail.

21.18 introsort: The Hybrid Strategy of C++ std::sort

introsort (introspective sort) is a hybrid sorting algorithm proposed by David Musser in 1997. It is the typical implementation of C++ standard library's std::sort. It combines the strengths of three algorithms:

Quicksort: Fastest in the average case
Heapsort: Guarantees worst-case O(n log n)
Insertion sort: Fastest for small arrays

Strategy:

Start with quicksort
If recursion depth exceeds 2*log2(n), indicating potential O(n²) degradation, switch to heapsort
When subarray size falls below threshold (typically 16), switch to insertion sort

import math

def introsort(arr):
    """introsort: hybrid of quicksort + heapsort + insertion sort"""
    max_depth = 2 * int(math.log2(len(arr))) if arr else 0
    _introsort_helper(arr, 0, len(arr) - 1, max_depth)
    return arr


def _introsort_helper(arr, lo, hi, depth_limit):
    SIZE_THRESHOLD = 16
    
    while hi - lo + 1 > SIZE_THRESHOLD:
        if depth_limit == 0:
            # Too deep, switch to heapsort
            heap_sort_range(arr, lo, hi)
            return
        
        depth_limit -= 1
        # Median-of-three pivot selection
        mid = (lo + hi) // 2
        pivot_idx = median_of_three(arr, lo, mid, hi)
        arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
        
        p = partition_lomuto(arr, lo, hi)
        _introsort_helper(arr, p + 1, hi, depth_limit)
        hi = p - 1  # Tail recursion optimization
    
    # Small arrays: insertion sort
    insertion_sort_range(arr, lo, hi)


def median_of_three(arr, a, b, c):
    """Median of three: return index of median among arr[a], arr[b], arr[c]"""
    if arr[a] > arr[b]:
        a, b = b, a
    if arr[b] > arr[c]:
        b, c = c, b
    if arr[a] > arr[b]:
        a, b = b, a
    return b

introsort elegantly combines three algorithms' strengths:

Most of the time it follows the quicksort path, enjoying quicksort's cache advantage and low constants
If pathological input causes recursion to go too deep, it automatically switches to heapsort, guaranteeing O(n log n)
Small subarrays use insertion sort, avoiding recursion overhead

Thus introsort is: O(n log n) worst-case, O(n log n) average, in-place, unstable.

Level 4 · Practice and Interviews

21.19 Python Sorting API Deep Dive

sort() vs sorted()

# list.sort() — in-place, returns None, only works on lists
nums = [3, 1, 4, 1, 5]
nums.sort()
# nums = [1, 1, 3, 4, 5]

# sorted() — returns new list, works on any iterable
result = sorted((3, 1, 4, 1, 5))  # tuples work too
# result = [1, 1, 3, 4, 5]

# sorted accepts generators, dict views, etc.
sorted({"b": 2, "a": 1}.items(), key=lambda x: x[1])
# [('a', 1), ('b', 2)]

Key differences:

sort() is in-place, doesn't consume extra O(n) memory (except TimSort's merge temporary space)
sorted() always creates a new list
sort() is a list method only; sorted() is a builtin that accepts any iterable

The key Parameter

# lambda
words = ["banana", "apple", "cherry"]
words.sort(key=lambda w: len(w))
# ['apple', 'banana', 'cherry']

# operator.itemgetter — faster than lambda (C implementation)
from operator import itemgetter
students = [("Alice", 85), ("Bob", 92), ("Charlie", 78)]
students.sort(key=itemgetter(1))  # sort by score
# [('Charlie', 78), ('Alice', 85), ('Bob', 92)]

# operator.attrgetter — sort by attribute
from operator import attrgetter
from dataclasses import dataclass

@dataclass
class Student:
    name: str
    gpa: float

students = [Student("Alice", 3.8), Student("Bob", 3.9)]
students.sort(key=attrgetter('gpa'))

# Multi-level sort: GPA descending, then name ascending
students.sort(key=lambda s: (-s.gpa, s.name))

functools.cmp_to_key

Python 3 removed the cmp parameter from sort, but some comparison logic is hard to express with key (e.g., custom comparison functions). Use cmp_to_key:

from functools import cmp_to_key

def compare_versions(v1, v2):
    """Compare version strings: '1.2.3' vs '1.2.10'"""
    parts1 = list(map(int, v1.split('.')))
    parts2 = list(map(int, v2.split('.')))
    for p1, p2 in zip(parts1, parts2):
        if p1 < p2:
            return -1
        elif p1 > p2:
            return 1
    return len(parts1) - len(parts2)

versions = ["1.2.3", "1.2.10", "1.1.0", "2.0.0"]
versions.sort(key=cmp_to_key(compare_versions))
# ['1.1.0', '1.2.3', '1.2.10', '2.0.0']

Classic interview application — Largest Number (LeetCode 179):

def largest_number(nums):
    """Given integers, concatenate them to form the largest number"""
    strs = [str(n) for n in nums]
    
    def compare(a, b):
        if a + b > b + a:
            return -1
        elif a + b < b + a:
            return 1
        return 0
    
    strs.sort(key=cmp_to_key(compare))
    result = ''.join(strs)
    return '0' if result[0] == '0' else result

21.20 High-Frequency Interview Sorting Problems

Problem 1: Merge Intervals (LeetCode 56)

Given a collection of intervals [start, end], merge all overlapping intervals.

def merge_intervals(intervals):
    """
    Sort then single-pass merge.
    Time O(n log n), Space O(n)
    """
    if not intervals:
        return []
    
    # Sort by start point
    intervals.sort(key=lambda x: x[0])
    
    merged = [intervals[0]]
    for start, end in intervals[1:]:
        if start <= merged[-1][1]:
            # Overlapping: merge (take larger endpoint)
            merged[-1] = [merged[-1][0], max(merged[-1][1], end)]
        else:
            merged.append([start, end])
    
    return merged

# Test
print(merge_intervals([[1,3],[2,6],[8,10],[15,18]]))
# [[1, 6], [8, 10], [15, 18]]

Insight: After sorting, overlapping intervals are necessarily adjacent. So sort + linear scan solves the problem.

Problem 2: Maximum Gap (LeetCode 164)

Given an unsorted array, find the maximum difference between successive elements in sorted form. Required: O(n) time and space.

def maximum_gap(nums):
    """
    Bucket sort insight: max gap >= ceil((max-min)/(n-1))
    Set bucket size to floor((max-min)/(n-1)), answer must be between buckets (not within)
    """
    n = len(nums)
    if n < 2:
        return 0
    
    min_val, max_val = min(nums), max(nums)
    if min_val == max_val:
        return 0
    
    # Bucket size and count
    bucket_size = max(1, (max_val - min_val) // (n - 1))
    bucket_count = (max_val - min_val) // bucket_size + 1
    
    # Each bucket stores only min and max
    buckets_min = [float('inf')] * bucket_count
    buckets_max = [float('-inf')] * bucket_count
    
    for num in nums:
        idx = (num - min_val) // bucket_size
        buckets_min[idx] = min(buckets_min[idx], num)
        buckets_max[idx] = max(buckets_max[idx], num)
    
    # Max gap = some bucket's min - previous non-empty bucket's max
    max_gap = 0
    prev_max = buckets_max[0]
    for i in range(1, bucket_count):
        if buckets_min[i] == float('inf'):
            continue  # Skip empty buckets
        max_gap = max(max_gap, buckets_min[i] - prev_max)
        prev_max = buckets_max[i]
    
    return max_gap

Key insight: Pigeonhole principle — if n numbers are uniformly spread over [min, max], the average gap between adjacent elements is (max-min)/(n-1). The maximum gap must be >= this average. Setting bucket size to this average ensures the maximum gap cannot occur within a bucket, only between adjacent buckets.

Problem 3: Sort Colors (LeetCode 75, Dutch National Flag)

Array contains only 0, 1, 2. Sort in-place with a single pass.

def sort_colors(nums):
    """
    Dutch National Flag: three-way partition
    lo: boundary of 0s (right side)
    hi: boundary of 2s (left side)
    i: current pointer
    """
    lo, i, hi = 0, 0, len(nums) - 1
    
    while i <= hi:
        if nums[i] == 0:
            nums[lo], nums[i] = nums[i], nums[lo]
            lo += 1
            i += 1
        elif nums[i] == 2:
            nums[hi], nums[i] = nums[i], nums[hi]
            hi -= 1
            # Don't increment i: swapped value hasn't been examined
        else:  # nums[i] == 1
            i += 1

# Test
arr = [2, 0, 2, 1, 1, 0]
sort_colors(arr)
print(arr)  # [0, 0, 1, 1, 2, 2]

This is a direct interview application of Dijkstra's Dutch National Flag algorithm. Note it's the same idea as quicksort's three-way partition.

21.21 When to Use Which Sort

In practice, sorting algorithm choice depends on:

Scenario	Recommended	Reason
General purpose	TimSort (Python/Java default)	Best for real-world data
Small arrays (n < 64)	Insertion sort	Small constant, cache-friendly
Large arrays, stability not needed	Quicksort/introsort	Fastest on average
Integers, small range	Counting sort	O(n + k), linear time
Integers, fixed width	Radix sort	O(d*n), linear time
Nearly sorted data	TimSort or insertion sort	Exploits existing order
Stability required	Merge sort/TimSort	Quicksort is unstable
Memory constrained	Heap sort	O(1) extra space
Data doesn't fit in memory	External merge sort	Minimizes disk I/O
Linked list sorting	Merge sort	No random access needed
Worst-case guarantee needed	Heap sort/introsort	Worst-case O(n log n)

Practical Python advice:

99% of the time, just use sorted() or .sort() — they use TimSort, which is excellent
If elements are small-range integers and performance is critical, hand-write counting sort
If data exceeds memory, use external sorting (or tools like databases/Spark)

21.22 Parallelization Potential of Sorting Algorithms

Different sorting algorithms have vastly different parallelization potential:

Merge Sort — Naturally Parallel

The divide-and-conquer structure means both halves can be sorted independently, and the merge phase can be partially parallelized:

from concurrent.futures import ThreadPoolExecutor

def parallel_merge_sort(arr, depth=0, max_depth=3):
    """Parallel merge sort (using thread pool)"""
    if len(arr) <= 1:
        return arr
    
    mid = len(arr) // 2
    
    if depth < max_depth:
        with ThreadPoolExecutor(max_workers=2) as executor:
            left_future = executor.submit(parallel_merge_sort, arr[:mid], depth+1, max_depth)
            right_future = executor.submit(parallel_merge_sort, arr[mid:], depth+1, max_depth)
            left = left_future.result()
            right = right_future.result()
    else:
        left = parallel_merge_sort(arr[:mid], depth+1, max_depth)
        right = parallel_merge_sort(arr[mid:], depth+1, max_depth)
    
    return merge(left, right)

Note: Python's GIL limits multi-threaded parallelism. For true parallel sorting, use multiprocessing or C extensions.

Quicksort — Also Parallelizable

After partition, both halves are independent, but partition itself is serial. Quicksort's parallel efficiency depends on partition balance.

Sorting Networks — Best for Hardware Parallelism

Sorting networks are fixed compare-and-swap sequences where all comparisons are determined at compile time, making them ideal for GPU and FPGA parallel implementations. Batcher's bitonic merge sort is the classic parallel sorting network.

Parallel Sorting in Practice:

C++ std::execution::par policy
Java's Arrays.parallelSort() — based on parallel merge sort
Python: multiprocessing or NumPy (C-level parallelism underneath)
Big data: MapReduce/Spark distributed sorting

Chapter Summary

Sorting is computer science's "litmus test" — seemingly simple, yet involving core algorithmic ideas: divide and conquer, recursion, randomization, amortized analysis, cache optimization, and parallelization.

Key takeaways:

Comparison sorting has a theoretical lower bound of O(n log n), but non-comparison sorts can achieve O(n) under specific conditions
Quicksort is fastest in practice due to cache friendliness and small constants — but needs randomized pivot to avoid worst case
TimSort is the optimal general-purpose sort for real-world data, exploiting existing order; it's the default in Python/Java
introsort is C++'s choice: quicksort's speed + heapsort's worst-case guarantee + insertion sort's small-array efficiency
Stability is a practical requirement: multi-key sorting, database queries, and UI sorting all need it
Choose the right algorithm based on data characteristics: size, value range, distribution, stability requirements, memory constraints

In the next chapter, we'll discuss search and matching algorithms — close relatives of sorting. Many search problems begin with a sort step.

Rate this chapter

4.7 / 5 (9 ratings)

Sorting: From Bubble Sort to TimSort

Chapter 21: Sorting — From Bubble Sort to TimSort

Level 1 · What You Need to Know

21.1 Overview of Eight Classic Sorting Algorithms

21.2 Bubble Sort

21.3 Selection Sort

21.4 Insertion Sort

21.5 Merge Sort

21.6 Quick Sort

Lomuto vs Hoare Partition

Randomized Pivot

21.7 Heap Sort Overview

21.8 Counting Sort

21.9 Bucket Sort

21.10 Radix Sort

Level 2 · Deeper Understanding

21.11 TimSort: The Choice of Python and Java

Core Insight: Exploit Existing Order

TimSort Execution Flow

TimSort Complexity

21.12 Three-Way Partition (Dutch National Flag)

21.13 Why Quicksort Is Fastest in Practice

21.14 External Sort

Level 3 · Theory and History

21.15 Proof of the O(n log n) Lower Bound for Comparison Sorting

21.16 History and Inventors of Sorting Algorithms

21.17 Strict Definition of Stability and Why It Matters

21.18 introsort: The Hybrid Strategy of C++ std::sort

Level 4 · Practice and Interviews

21.19 Python Sorting API Deep Dive

sort() vs sorted()

The key Parameter

functools.cmp_to_key

21.20 High-Frequency Interview Sorting Problems

Problem 1: Merge Intervals (LeetCode 56)

Problem 2: Maximum Gap (LeetCode 164)

Problem 3: Sort Colors (LeetCode 75, Dutch National Flag)

21.21 When to Use Which Sort

21.22 Parallelization Potential of Sorting Algorithms

Chapter Summary

💬 Comments