Chapter 22

Binary Search: Simple Enough to Get Wrong

Chapter 22: Binary Search — Simple Enough to Get Wrong

"I can guarantee you that 90% of computer scientists cannot write a correct binary search in two hours." — Jon Bentley, Programming Pearls

This is not hyperbole. The idea behind binary search is something a child can understand: cut the search space in half every time. But turning it into code that works correctly in all edge cases — that took from 1946, when it was first proposed, until 1962, when someone finally published a bug-free implementation. Sixteen years of brilliant people tripping over off-by-one errors.

It gets worse. In 2006, Joshua Bloch (author of core Java libraries) discovered that the binary search in Java's standard library had harbored an integer overflow bug for 20 years. The same bug existed in Jon Bentley's own code from Programming Pearls.

This chapter will take binary search apart down to the bone. You will truly understand why it is so easy to get wrong, and how to use a unified framework to write binary search that is always correct.

Level 1 · What You Need to Know

1.1 The Most Basic Binary Search

Problem: Given a sorted (ascending) array nums and a target value target, return its index if it exists, otherwise return -1.

def binary_search(nums: list[int], target: int) -> int:
    lo, hi = 0, len(nums) - 1  # closed interval [lo, hi]
    while lo <= hi:
        mid = lo + (hi - lo) // 2  # prevents integer overflow
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Only 10 lines, but every single one is precise:

Initialization: lo = 0, hi = len(nums) - 1. The search interval is [lo, hi], inclusive on both ends.
Loop condition: while lo <= hi. When lo > hi, the interval is empty — exit.
Midpoint: mid = lo + (hi - lo) // 2. Floor division, guaranteed not to overflow.
Shrink interval: Found it — return. Less than target — search right half [mid+1, hi]. Greater than target — search left half [lo, mid-1].

Complexity: Time O(log n), Space O(1). Each iteration halves the interval. For n elements, at most ⌊log₂n⌋ + 1 comparisons.

1.2 Two Styles: Half-Open vs Closed

There are two mainstream styles for binary search, differing in how the search interval is defined.

Style 1: Closed interval [lo, hi]

def search_closed(nums: list[int], target: int) -> int:
    lo, hi = 0, len(nums) - 1
    while lo <= hi:             # interval non-empty condition
        mid = lo + (hi - lo) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            lo = mid + 1        # mid excluded, move left boundary right
        else:
            hi = mid - 1        # mid excluded, move right boundary left
    return -1

Style 2: Half-open interval [lo, hi)

def search_half_open(nums: list[int], target: int) -> int:
    lo, hi = 0, len(nums)      # note: hi = len(nums)
    while lo < hi:              # different non-empty condition
        mid = lo + (hi - lo) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            lo = mid + 1        # mid excluded
        else:
            hi = mid            # note: NOT mid - 1
    return -1

Key differences:

Aspect	[lo, hi]	[lo, hi)
Initialize hi	`len(nums) - 1`	`len(nums)`
Loop condition	`lo <= hi`	`lo < hi`
Move right boundary left	`hi = mid - 1`	`hi = mid`
Interval empty when	`lo > hi`	`lo == hi`

Which is better? No absolute answer. But half-open is more natural when finding "the first position satisfying a condition," because upon exit lo == hi is the answer. Python's bisect module uses half-open. This chapter uses both — you need to master both.

1.3 The bisect Module

Python's standard library bisect module provides efficient binary search tools:

import bisect

# bisect_left: find leftmost insertion point for target
# Equivalent to: first position where element >= target
pos = bisect.bisect_left([1, 3, 5, 7, 9], 5)   # returns 2

# bisect_right (a.k.a. bisect): find rightmost insertion point
# Equivalent to: first position where element > target
pos = bisect.bisect_right([1, 3, 5, 7, 9], 5)  # returns 3

# insort_left: insert while maintaining sort order
a = [1, 3, 5, 7, 9]
bisect.insort_left(a, 5)   # a = [1, 3, 5, 5, 7, 9]

# insort_right: insert to the right of equal elements
a = [1, 3, 5, 7, 9]
bisect.insort_right(a, 5)  # a = [1, 3, 5, 5, 7, 9], new 5 after existing 5

Difference between bisect_left and bisect_right:

When the array contains multiple elements equal to target:

bisect_left returns the index of the leftmost one (first position >= target)
bisect_right returns one past the rightmost one (first position > target)

arr = [1, 3, 5, 5, 5, 7, 9]
bisect.bisect_left(arr, 5)   # 2 (position of first 5)
bisect.bisect_right(arr, 5)  # 5 (one past last 5)

Exact search using bisect:

def binary_search_bisect(nums: list[int], target: int) -> int:
    i = bisect.bisect_left(nums, target)
    if i < len(nums) and nums[i] == target:
        return i
    return -1

1.4 Common Variants

The most practical use of binary search isn't "find an exact value" — it's these four variants:

Variant 1: Find first position >= target (lower bound)

This is exactly what bisect_left does.

def lower_bound(nums: list[int], target: int) -> int:
    """Return index of first element >= target. Returns len(nums) if none."""
    lo, hi = 0, len(nums)  # [lo, hi)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if nums[mid] < target:
            lo = mid + 1
        else:
            hi = mid
    return lo

Variant 2: Find first position > target (upper bound)

This is exactly what bisect_right does.

def upper_bound(nums: list[int], target: int) -> int:
    """Return index of first element > target. Returns len(nums) if none."""
    lo, hi = 0, len(nums)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if nums[mid] <= target:  # note: <= instead of <
            lo = mid + 1
        else:
            hi = mid
    return lo

Variant 3: Find last position <= target

def last_le(nums: list[int], target: int) -> int:
    """Return index of last element <= target. Returns -1 if none."""
    return upper_bound(nums, target) - 1

Variant 4: Find last position < target

def last_lt(nums: list[int], target: int) -> int:
    """Return index of last element < target. Returns -1 if none."""
    return lower_bound(nums, target) - 1

Cheat sheet: All variants compose from lower_bound and upper_bound:

Goal	Implementation
First >= target	`lower_bound(target)`
First > target	`upper_bound(target)`
Last <= target	`upper_bound(target) - 1`
Last < target	`lower_bound(target) - 1`
Does target exist?	Value at `lower_bound(target)` equals target
Count of target	`upper_bound(target) - lower_bound(target)`

1.5 Searching a Rotated Sorted Array

Problem: A sorted array has been rotated at some pivot (e.g., [4,5,6,7,0,1,2]). Search for target.

Key observation: when you split the array at the midpoint, at least one half is sorted.

def search_rotated(nums: list[int], target: int) -> int:
    lo, hi = 0, len(nums) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if nums[mid] == target:
            return mid
        
        # determine which half is sorted
        if nums[lo] <= nums[mid]:
            # left half is sorted
            if nums[lo] <= target < nums[mid]:
                hi = mid - 1
            else:
                lo = mid + 1
        else:
            # right half is sorted
            if nums[mid] < target <= nums[hi]:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1

Logic:

If nums[lo] <= nums[mid], then [lo, mid] is sorted.
Since the left half is sorted, if target is in [nums[lo], nums[mid]), search left; otherwise search right.
Conversely, if the right half is sorted, apply the same logic symmetrically.

Level 2 · Principles and Practice

2.1 Why Binary Search Is So Easy to Get Wrong

Jon Bentley opens Chapter 1 of Programming Pearls with binary search: he asked senior programmers at Bell Labs to write it on the spot. 90% of them produced buggy code.

Common error types:

Error 1: Wrong loop condition

# Bug: using [lo, hi] but loop condition is lo < hi
lo, hi = 0, len(nums) - 1
while lo < hi:  # BUG! When lo == hi, one element left unchecked
    ...

Error 2: Wrong boundary update

# Bug: using [lo, hi) but updating hi = mid - 1
lo, hi = 0, len(nums)
while lo < hi:
    mid = lo + (hi - lo) // 2
    if nums[mid] > target:
        hi = mid - 1  # BUG! Should be hi = mid
    ...

Error 3: Infinite loop

# Bug: when lo + 1 == hi, mid = lo, then lo = mid changes nothing
lo, hi = 0, len(nums) - 1
while lo < hi:
    mid = lo + (hi - lo) // 2  # floor division
    if some_condition:
        lo = mid  # BUG! May cause infinite loop
    else:
        hi = mid - 1

Why is it so error-prone? Because binary search involves three tightly coupled decisions:

Definition of the search interval (open/closed)
Loop continuation condition
Which side the midpoint belongs to

These three decisions must be strictly consistent. Any mismatch produces a bug. And these bugs often only trigger in extreme cases (array length 1, target doesn't exist, target at first/last position) — normal testing won't catch them.

Defense mantra: Pick one style (recommended: half-open), memorize all its details, never mix and match.

2.2 bisect Module Source Analysis

Python's bisect module is written in pure Python (CPython also has a C-accelerated version). Let's examine bisect_left:

def bisect_left(a, x, lo=0, hi=None, *, key=None):
    """Return the index where to insert item x in list a, assuming a is sorted.

    The return value is such that all e in a[:i] have e < x, and all e in
    a[i:] have e >= x.  So if x already appears in the list, a.insert(i, x)
    will insert just before the leftmost x already there.
    """
    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    if key is None:
        while lo < hi:
            mid = (lo + hi) // 2
            if a[mid] < x:
                lo = mid + 1
            else:
                hi = mid
    else:
        while lo < hi:
            mid = (lo + hi) // 2
            if key(a[mid]) < x:
                lo = mid + 1
            else:
                hi = mid
    return lo

Key observations:

Uses half-open [lo, hi): hi starts at len(a), loop condition is lo < hi.
Midpoint uses (lo + hi) // 2: In Python there's no integer overflow (Python ints are arbitrary precision), but in C/Java you'd need lo + (hi - lo) // 2.
Only two branches: a[mid] < x → lo = mid + 1, otherwise hi = mid. No special handling for == x! This is the essence of bisect_left — even after finding an element equal to x, keep searching left.
Python 3.10+ supports key parameter: Allows transforming elements before comparison, like sorted(key=...).

The only difference in bisect_right:

# bisect_left:
if a[mid] < x:   # strict less-than
    lo = mid + 1
else:
    hi = mid

# bisect_right:
if a[mid] <= x:  # less-than-or-equal (added equals)
    lo = mid + 1
else:
    hi = mid

Just one = sign difference determines whether you find the "leftmost insertion point" or "rightmost insertion point."

2.3 Binary Search in Python's sorted containers

Python's standard library lacks a balanced BST (like C++'s std::set), but the popular third-party library sortedcontainers uses "segmented lists + binary search" to achieve O(log n) insertion and lookup:

from sortedcontainers import SortedList

sl = SortedList([5, 1, 3, 7, 9])
# Automatically sorted: [1, 3, 5, 7, 9]

sl.add(4)           # O(log n) insertion
sl.remove(3)        # O(log n) removal
sl.bisect_left(5)   # binary search
sl.irange(3, 7)     # range query [3, 7]

SortedList internal structure:

Data divided into multiple "segments" (~1000 elements each)
An index list records the maximum of each segment
Lookup: binary search the index to find the segment, then binary search within the segment
Insert/delete may require splitting or merging segments

This design is much faster than red-black trees in practice because it is CPU cache-friendly — sequential access over contiguous memory is far faster than random access across tree nodes.

2.4 Floating-Point Binary Search

Binary search works not only on discrete arrays but also on continuous intervals. Here we don't worry about off-by-one errors — we worry about precision.

Example 1: Square Root

def sqrt_binary_search(x: float, eps: float = 1e-10) -> float:
    """Compute sqrt(x) using binary search."""
    if x < 0:
        raise ValueError("Cannot compute sqrt of negative number")
    if x == 0:
        return 0.0
    
    lo, hi = 0.0, max(1.0, x)  # when x < 1, sqrt(x) > x
    while hi - lo > eps:
        mid = (lo + hi) / 2
        if mid * mid < x:
            lo = mid
        else:
            hi = mid
    return (lo + hi) / 2

Example 2: Finding a Root of an Equation

For a monotonic function f(x), if f(a) and f(b) have opposite signs, there's a root in [a, b]:

def find_zero(f, a: float, b: float, eps: float = 1e-10) -> float:
    """Find a root of f in [a, b]. Precondition: f(a) and f(b) have opposite signs."""
    assert f(a) * f(b) <= 0, "f(a) and f(b) must have different signs"
    
    for _ in range(100):  # 100 iterations is enough for 10^-30 precision
        mid = (a + b) / 2
        if f(mid) * f(a) <= 0:
            b = mid
        else:
            a = mid
    return (a + b) / 2

# Usage: find root of x³ - 2x - 5 = 0 in [2, 3]
root = find_zero(lambda x: x**3 - 2*x - 5, 2, 3)
# root ≈ 2.0945514815

Notes on floating-point binary search:

Don't use while lo < hi (floats may never be equal). Use while hi - lo > eps or a fixed iteration count.
Fixed iteration count is more robust: 64 iterations achieves maximum precision within double-precision range.
Choice of eps depends on problem requirements. Competitive programming typically requires 10⁻⁶ or 10⁻⁹.

2.5 Binary Search on the Answer

The most powerful application of binary search isn't finding elements in an array — it's "binary search on the answer." When the answer has monotonicity, guess it and verify.

Pattern: If answer x is feasible and x+1 is also feasible, there's monotonicity — binary search applies.

Classic Problem: Minimize the Maximum

Problem: Split an array into k contiguous subarrays such that the maximum subarray sum is minimized.

def split_array(nums: list[int], k: int) -> int:
    """LeetCode 410: Split Array Largest Sum"""
    
    def can_split(max_sum: int) -> bool:
        """If each subarray sum <= max_sum, can we split into <= k parts?"""
        count = 1
        current = 0
        for num in nums:
            if current + num > max_sum:
                count += 1
                current = num
            else:
                current += num
        return count <= k
    
    # Answer range: [max(nums), sum(nums)]
    lo, hi = max(nums), sum(nums)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if can_split(mid):
            hi = mid       # mid works, try smaller
        else:
            lo = mid + 1   # mid doesn't work, answer is larger
    return lo

Intuition:

The answer (maximum subarray sum) is in [max(nums), sum(nums)]
For a candidate answer mid, greedily check if the array can be split into at most k subarrays
If yes, mid might be too large — try smaller
If no, mid is too small — answer is to the right

Classic Problem: Maximize the Minimum

Problem: Place n cows on a road of length L, maximize the minimum distance between adjacent cows.

def aggressive_cows(positions: list[int], n: int) -> int:
    """Maximize minimum distance."""
    positions.sort()
    
    def can_place(min_dist: int) -> bool:
        """If minimum distance between cows >= min_dist, can we place n cows?"""
        count = 1
        last = positions[0]
        for pos in positions[1:]:
            if pos - last >= min_dist:
                count += 1
                last = pos
        return count >= n
    
    lo, hi = 1, positions[-1] - positions[0]
    while lo < hi:
        mid = lo + (hi - lo + 1) // 2  # ceiling division!
        if can_place(mid):
            lo = mid       # mid works, try larger
        else:
            hi = mid - 1   # mid doesn't work, answer is smaller
    return lo

Note the ceiling division: When searching for "the largest feasible value" with lo = mid, you must use ceiling: mid = lo + (hi - lo + 1) // 2. Otherwise when hi = lo + 1, mid = lo and lo = mid = lo — infinite loop.

Level 3 · Deep Understanding

3.1 The History of Binary Search

The history of binary search is more tortuous than you might expect:

Year	Event
1946	John Mauchly proposes binary search in Moore School Lectures
1957	First published binary search algorithm (works for 2ⁿ-1 elements)
1960	Derrick Henry Lehmer publishes a version for arbitrary n (has bugs)
1962	First proven-correct binary search published (Herman Bottenbruch)
1986	Jon Bentley describes this saga in Programming Pearls
2006	Joshua Bloch discovers integer overflow bug in Java/C standard libraries

From "idea proposed" to "correct implementation" took 16 years. From "widely deployed" to "bug discovered" took another 20 years. The deep lesson: a simple idea does not imply a simple implementation.

3.2 The Integer Overflow Bug

This is one of the most famous bugs in computer science history.

// Classic binary search in Java (JDK 1.2 through JDK 5)
int mid = (low + high) / 2;  // BUG!

When low + high exceeds Integer.MAX_VALUE (2³¹ - 1 ≈ 2.1 billion), integer overflow produces a negative number, causing an array index out of bounds.

The fix:

int mid = low + (high - low) / 2;       // correct
// or
int mid = (low + high) >>> 1;           // unsigned right shift, also correct

Do Python programmers need to worry? In pure Python, no — Python ints are arbitrary precision. But if you:

Use numpy (fixed-precision integers)
Use ctypes to call C libraries
Implement in C++/Java during interviews

you must remember this bug.

Fun fact: This bug lurked in Java's Arrays.binarySearch for 9 years (1997-2006). The reason: it only triggers when the array exceeds 1 billion elements, which was rare before 2006.

3.3 Binary Search vs Interpolation Search vs Exponential Search

Binary Search:

Always picks the midpoint
Time O(log n), worst case O(log n)
Makes no assumptions about data distribution

Interpolation Search:

Estimates position based on target's value: mid = lo + (target - arr[lo]) * (hi - lo) // (arr[hi] - arr[lo])
If data is uniformly distributed, average O(log log n)
If distribution is highly skewed, worst case O(n)

def interpolation_search(nums: list[int], target: int) -> int:
    lo, hi = 0, len(nums) - 1
    while lo <= hi and nums[lo] <= target <= nums[hi]:
        if nums[lo] == nums[hi]:
            if nums[lo] == target:
                return lo
            break
        
        # estimate target's position in the interval
        mid = lo + (target - nums[lo]) * (hi - lo) // (nums[hi] - nums[lo])
        
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Exponential Search:

Exponentially expand to find a range containing target, then binary search within it
Ideal for unbounded or very large arrays, and when target is near the beginning
Time O(log k), where k is target's actual position

def exponential_search(nums: list[int], target: int) -> int:
    n = len(nums)
    if n == 0:
        return -1
    if nums[0] == target:
        return 0
    
    # find a range containing target
    bound = 1
    while bound < n and nums[bound] <= target:
        bound *= 2
    
    # binary search within [bound//2, min(bound, n-1)]
    lo, hi = bound // 2, min(bound, n - 1)
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Comparison:

Algorithm	Average Time	Worst Time	Best For
Binary Search	O(log n)	O(log n)	General purpose, safest
Interpolation Search	O(log log n)	O(n)	Uniformly distributed data
Exponential Search	O(log k)	O(log n)	Unbounded arrays / target near start

3.4 A Unified Framework: Lower Bound and Upper Bound

All binary search problems reduce to a single framework: finding the boundary.

Imagine a boolean array [F, F, F, ..., F, T, T, T, ..., T] — all False on the left, all True on the right. We want to find the first True.

def first_true(lo: int, hi: int, predicate) -> int:
    """Find the first index in [lo, hi) where predicate returns True.
    Precondition: there exists k such that predicate(i) is False for i < k
    and True for i >= k. Returns hi if all are False."""
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if predicate(mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

All variants are special cases of this framework:

# bisect_left: first position >= target
bisect_left = first_true(0, len(nums), lambda i: nums[i] >= target)

# bisect_right: first position > target
bisect_right = first_true(0, len(nums), lambda i: nums[i] > target)

# last position <= target: one before first_true
last_le = first_true(0, len(nums), lambda i: nums[i] > target) - 1

# find minimum in rotated array
min_idx = first_true(0, len(nums), lambda i: nums[i] <= nums[-1])

The framework's significance: You don't need to memorize 6 different binary search templates. Just remember first_true and construct the appropriate predicate for each problem.

For "find the last True":

def last_true(lo: int, hi: int, predicate) -> int:
    """Find the last index in [lo, hi] where predicate returns True.
    Returns lo - 1 if all are False."""
    while lo < hi:
        mid = lo + (hi - lo + 1) // 2  # ceiling!
        if predicate(mid):
            lo = mid
        else:
            hi = mid - 1
    return lo if predicate(lo) else lo - 1

Why does last_true need ceiling division? Because when lo + 1 == hi, floor division gives mid = lo. If predicate(lo) is True, then lo = mid = lo — nothing changes — infinite loop. Ceiling ensures mid = hi, avoiding this.

This is one of the subtlest points in binary search. Memorize the rule:

Find first True → floor division + hi = mid
Find last True → ceiling division + lo = mid

Level 4 · Expert Perspective

4.1 Classic Off-by-One Errors and Infinite Loops

Let us systematically analyze every way binary search can go wrong.

Case Study 1: Infinite Loop

# Intent: find last value satisfying condition in [lo, hi]
lo, hi = 0, n - 1
while lo < hi:
    mid = (lo + hi) // 2      # floor division
    if condition(mid):
        lo = mid              # BUG: may infinite-loop
    else:
        hi = mid - 1

# When lo = 3, hi = 4:
# mid = (3+4)//2 = 3
# If condition(3) is True, lo = 3, interval unchanged → infinite loop!

Fix: mid = (lo + hi + 1) // 2 (ceiling).

Case Study 2: Missing Element

# Intent: find target in [lo, hi]
lo, hi = 0, len(nums) - 1
while lo < hi:         # BUG: when lo == hi, doesn't check!
    mid = (lo + hi) // 2
    if nums[mid] == target:
        return mid
    elif nums[mid] < target:
        lo = mid + 1
    else:
        hi = mid - 1
# If target is exactly at position lo == hi, it's missed

Fix: while lo <= hi (but must match other parts).

Case Study 3: Array Out of Bounds

# Intent: find target using [lo, hi)
lo, hi = 0, len(nums)
while lo < hi:
    mid = (lo + hi) // 2
    if nums[mid] < target:
        lo = mid + 1
    elif nums[mid] > target:
        hi = mid
    else:
        return mid
return lo  # lo might be len(nums)!

# If caller accesses nums[lo] without checking lo < len(nums), out of bounds

Bug prevention checklist:

Check	Description
Empty array	`if not nums: return -1`
Interval consistency	Init, loop condition, boundary update must match
Infinite loop	When `lo = mid`, must use ceiling
Return value bounds	Returned index might be `len(nums)` — check
Duplicate elements	How `==` is handled affects the result

4.2 High-Frequency Interview Problems

4.2.1 Search in Rotated Sorted Array (LeetCode #33)

def search(nums: list[int], target: int) -> int:
    """
    Search in rotated sorted array.
    Key: at least one half is always sorted.
    """
    lo, hi = 0, len(nums) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if nums[mid] == target:
            return mid
        
        if nums[lo] <= nums[mid]:
            # [lo, mid] is sorted
            if nums[lo] <= target < nums[mid]:
                hi = mid - 1
            else:
                lo = mid + 1
        else:
            # [mid, hi] is sorted
            if nums[mid] < target <= nums[hi]:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1

Gotcha: The <= in nums[lo] <= nums[mid]. When lo == mid (only two elements left), the left half has one element and is trivially sorted. Without =, this case would be misclassified as "right half is sorted," causing errors.

Time complexity: O(log n).

4.2.2 Find Peak Element (LeetCode #162)

def find_peak_element(nums: list[int]) -> int:
    """
    Peak: element greater than both neighbors.
    nums[-1] = nums[n] = -infinity.
    
    Key insight: if nums[mid] < nums[mid+1],
    then [mid+1, hi] must contain a peak (either keeps rising
    to the end — which is -inf — so the last rising point is a peak).
    """
    lo, hi = 0, len(nums) - 1
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if nums[mid] < nums[mid + 1]:
            lo = mid + 1   # peak is to the right
        else:
            hi = mid       # peak is to the left (including mid)
    return lo

Why lo < hi instead of lo <= hi? Because we're not looking for a specific value — we're narrowing the range. When lo == hi, that's the answer.

Why no infinite loop? lo = mid + 1 always advances. hi = mid — since mid < hi (floor division), hi always decreases. When lo + 1 == hi: mid = lo. If we go lo = mid + 1 = hi, loop ends. If we go hi = mid = lo, loop ends too.

Time complexity: O(log n).

4.2.3 Find First and Last Position of Element in Sorted Array (LeetCode #34)

def search_range(nums: list[int], target: int) -> list[int]:
    """Find first and last position of target."""
    
    def bisect_left_custom(nums, target):
        lo, hi = 0, len(nums)
        while lo < hi:
            mid = lo + (hi - lo) // 2
            if nums[mid] < target:
                lo = mid + 1
            else:
                hi = mid
        return lo
    
    def bisect_right_custom(nums, target):
        lo, hi = 0, len(nums)
        while lo < hi:
            mid = lo + (hi - lo) // 2
            if nums[mid] <= target:
                lo = mid + 1
            else:
                hi = mid
        return lo
    
    left = bisect_left_custom(nums, target)
    right = bisect_right_custom(nums, target) - 1
    
    if left <= right and left < len(nums) and nums[left] == target:
        return [left, right]
    return [-1, -1]

Essence: Just bisect_left and bisect_right - 1. This problem tests your understanding of binary search variants.

Time complexity: O(log n), two binary searches.

4.2.4 Valid Perfect Square (LeetCode #367)

def is_perfect_square(num: int) -> bool:
    """Determine if num is a perfect square."""
    lo, hi = 1, num
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        square = mid * mid
        if square == num:
            return True
        elif square < num:
            lo = mid + 1
        else:
            hi = mid - 1
    return False

Optimization: Upper bound can be min(num, 46341) (since 46341² > 2³¹ - 1). Or for large numbers, hi = num // 2 + 1 (since for num >= 4, sqrt(num) < num/2).

Note: Python doesn't worry about mid * mid overflow, but in C++/Java you'd need long long or reformulate as mid == num / mid && num % mid == 0.

4.3 When NOT to Use Binary Search

Binary search has clear preconditions. When violated, it doesn't apply:

1. Unsorted data

Binary search's core assumption is "eliminate half." If data has no ordering (or monotonicity of any kind), you cannot safely eliminate either half.

Exception: Find Peak Element (#162) doesn't require global ordering but exploits local monotonicity.

2. Linked lists

Linked lists cannot access the middle element in O(1). Binary search on a linked list takes O(n log n) (finding midpoint is O(n) each time) — worse than linear scan's O(n).

3. Frequent insertions/deletions

If data changes frequently, maintaining sorted order (O(n) insertion each time) may cost more than binary search saves. Use a balanced BST (like SortedList) or skip list instead.

4. Very small data

When n < 32, linear scan may be faster than binary search because:

No branch misprediction penalty
Cache-friendly sequential access
Simpler code, better CPU pipeline utilization

In practice, many sorting algorithms fall back to insertion sort for small arrays. Similarly, for small lookups, just scan.

5. Need all matching elements

If you need all elements equal to target, finding one via binary search still requires expanding in both directions — worst case O(n). Better to use bisect_left + bisect_right to determine the range directly.

4.4 Binary Search on the Answer: Interview Applications

4.4.1 Split Array Largest Sum (LeetCode #410)

The complete solution was given in Level 2. Let's analyze why it's correct.

Monotonicity proof: Let f(x) = "can the array be split into <= k subarrays, each with sum <= x?"

If f(x) is True, then for all y > x, f(y) is also True. (Relaxing the constraint makes it easier.)
Therefore f transitions from False to True at some point and stays True.
We want the first x making f(x) True.

This is exactly the first_true framework!

def split_array(nums: list[int], k: int) -> int:
    def feasible(max_sum: int) -> bool:
        count = 1
        current = 0
        for num in nums:
            if current + num > max_sum:
                count += 1
                current = num
                if count > k:
                    return False
            else:
                current += num
        return True
    
    lo, hi = max(nums), sum(nums)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if feasible(mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

Time complexity: O(n log S), where S = sum(nums) - max(nums).

4.4.2 Koko Eating Bananas (LeetCode #875)

Problem: There are n piles of bananas. A guard leaves for h hours. Koko eats k bananas per hour (finishes one pile before moving to the next). Find the minimum k.

import math

def min_eating_speed(piles: list[int], h: int) -> int:
    """
    Binary search on the answer: guess speed k, verify if all bananas
    can be eaten within h hours.
    """
    
    def can_finish(k: int) -> bool:
        """At speed k, can all bananas be eaten within h hours?"""
        hours = sum(math.ceil(pile / k) for pile in piles)
        return hours <= h
    
    lo, hi = 1, max(piles)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if can_finish(mid):
            hi = mid       # can go slower
        else:
            lo = mid + 1   # too slow, need faster
    return lo

Analysis:

Answer range: [1, max(piles)]. Slowest: 1 per hour. Fastest: max(piles) per hour.
Monotonicity: larger k → less total time needed.
can_finish(k) is O(n), binary search range is O(log(max(piles))).
Total time: O(n log M), M = max(piles).

Gotcha: math.ceil(pile / k) — beware floating-point issues. In Python, prefer (pile + k - 1) // k or math.ceil(pile / k). The former avoids floating-point imprecision.

4.4.3 Recognizing "Binary Search on the Answer"

When you encounter a complex optimization problem in an interview, check these conditions:

The problem asks for "minimum of a maximum" or "maximum of a minimum" → almost certainly binary search on the answer.
The answer has a clear range → you can determine lo and hi.
"Given answer x, is it feasible?" is much simpler than the original problem → the verification function is usually greedy.
Feasibility is monotonic → as x increases, feasibility doesn't flip from True to False (or vice versa).

More classic "binary search on the answer" problems:

Problem	Binary search what?	Verification
LeetCode 410 Split Array	max subarray sum	greedy partition, count groups
LeetCode 875 Koko Bananas	eating speed	compute total hours
LeetCode 1011 Ship Packages in D Days	ship capacity	greedy loading, count days
LeetCode 668 Kth Smallest in Multiplication Table	the value	count numbers <= it
LeetCode 774 Minimize Max Distance to Gas Station	distance	count stations needed
LeetCode 1482 Min Days to Make Bouquets	days	simulate blooming + greedy picking

Interview tips:

First confirm monotonicity — communicate your approach to the interviewer.
Clearly define what lo, hi represent and their initial values.
Write the verification function as a separate function to keep the main logic clean.
Watch ceiling vs floor division (depends on whether you seek first True or last True).

Summary

Binary search is the quintessential example of "simple enough to get wrong" in the algorithm world. Its idea fits in one sentence: eliminate half each time. But getting it right requires extremely precise understanding of boundary conditions.

Core takeaways:

Pick one template. Stick with it. Recommended: half-open + first_true framework.
Interval definition determines everything. Initialization, loop condition, and boundary update must be consistent.
bisect_left and bisect_right solve 95% of problems. Master their semantics: first position >= target vs first position > target.
Binary search on the answer is the most powerful application. When the answer has monotonicity: guess + verify = O(verification time x log range).
Ceiling prevents infinite loops. When using lo = mid, you must use mid = lo + (hi - lo + 1) // 2.

Finally, remember Jon Bentley's lesson. After writing binary search, test at minimum with these cases:

Empty array
Single element (equals / doesn't equal target)
Target at first position
Target at last position
Target doesn't exist (smaller than all / larger than all / between two elements)
All elements identical

Rate this chapter

4.5 / 5 (8 ratings)