BSTs and Balanced Trees
Chapter 11: BSTs and Balanced Trees
In the previous chapter, we learned binary tree traversals and recursive thinking. But a plain binary tree has no "ordering" constraintโyou cannot efficiently search for a value. A Binary Search Tree (BST) adds one simple rule: left < root < right, and suddenly the tree becomes a powerful search structure.
The catch is that BST performance depends entirely on tree height. If data arrives in sorted order, the BST degenerates into a linked list and search becomes O(n). To solve this, computer scientists invented various self-balancing BSTs: AVL trees, Red-Black trees, Treaps, Splay trees... Each uses a different strategy to guarantee O(log n) height at all times.
This chapter covers a lot of ground, but the core ideas are just two: (1) how BST's ordering property enables efficient operations; (2) how rotations restore balance.
Level 1 ยท What You Need to Know
11.1 BST Definition and Properties
A Binary Search Tree is a binary tree satisfying the following property:
- All nodes in the left subtree have values less than the root's value
- All nodes in the right subtree have values greater than the root's value
- Both left and right subtrees are themselves BSTs
Note: this refers to all nodes, not just direct children. A common mistake is checking only the immediate left child < root while ignoring deeper nodes that also must be less than the root.
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
The most important BST property: An inorder traversal (left โ root โ right) of a BST produces a strictly increasing sequence.
def inorder(root):
"""Inorder traversal of a BST yields sorted output"""
if not root:
return []
return inorder(root.left) + [root.val] + inorder(root.right)
# Example BST:
# 5
# / \
# 3 7
# / \ \
# 2 4 8
# Inorder result: [2, 3, 4, 5, 7, 8]
This property is the soul of BSTs. Many interview problems (like "Kth Smallest Element in a BST") essentially exploit the sorted nature of inorder traversal.
11.2 Basic BST Operations
Search: O(h)
def search_bst(root, target):
"""Search for target in BST, return the node or None"""
if not root:
return None
if target == root.val:
return root
elif target < root.val:
return search_bst(root.left, target)
else:
return search_bst(root.right, target)
Each comparison eliminates half the tree, similar to binary search. Time complexity is O(h) where h is the tree height.
Iterative version saves stack space:
def search_bst_iterative(root, target):
node = root
while node:
if target == node.val:
return node
elif target < node.val:
node = node.left
else:
node = node.right
return None
Insertion: O(h)
BST insertion always happens at a leaf position:
def insert_bst(root, val):
"""Insert a value into BST, return root"""
if not root:
return TreeNode(val)
if val < root.val:
root.left = insert_bst(root.left, val)
elif val > root.val:
root.right = insert_bst(root.right, val)
# val == root.val: don't insert (no duplicates)
return root
Deletion: Three Cases
Deletion is the most complex BST operation, with three cases:
- Deleting a leaf node: Simply remove it
- Deleting a node with one child: Replace with the child
- Deleting a node with two children: Replace with the inorder successor (minimum of right subtree) or inorder predecessor (maximum of left subtree), then recursively delete the replacement
def delete_bst(root, key):
"""Delete node with value key from BST"""
if not root:
return None
if key < root.val:
root.left = delete_bst(root.left, key)
elif key > root.val:
root.right = delete_bst(root.right, key)
else:
# Found the node to delete
# Cases 1 & 2: no children or one child
if not root.left:
return root.right
if not root.right:
return root.left
# Case 3: two children
# Find inorder successor (minimum of right subtree)
successor = root.right
while successor.left:
successor = successor.left
# Replace current node's value with successor's
root.val = successor.val
# Recursively delete the successor
root.right = delete_bst(root.right, successor.val)
return root
Why the inorder successor? The inorder successor is the smallest node in the right subtree. It's larger than the current node and smaller than all other nodes in the right subtree. Replacing the current node with it preserves the BST property.
11.3 The Degeneration Problem
What happens if you insert [1, 2, 3, 4, 5] into a BST in order?
1
\
2
\
3
\
4
\
5
A BST that has degenerated into a linked list! Search time degrades from O(log n) to O(n).
| Tree Shape | Height h | Search/Insert/Delete |
|---|---|---|
| Perfectly balanced | O(log n) | O(log n) |
| Random insertion (average) | O(log n) | O(log n) |
| Sorted insertion (worst) | O(n) | O(n) |
This is why we need self-balancing BSTs: they use rotations to guarantee O(log n) height at all times.
11.4 AVL Tree Overview
The AVL tree is the first self-balancing binary search tree, proposed by Soviet mathematicians Adelson-Velsky and Landis in 1962.
Core idea: Each node maintains a balance factor:
Balance Factor = height(left subtree) - height(right subtree)
An AVL tree requires every node's balance factor to be -1, 0, or 1. If an insertion or deletion causes any node's balance factor to become -2 or 2, rotations are needed to restore balance.
Four types of rotations:
| Imbalance Type | Pattern | Fix |
|---|---|---|
| LL (Left-Left) | Inserted into left child's left subtree | Right rotation |
| RR (Right-Right) | Inserted into right child's right subtree | Left rotation |
| LR (Left-Right) | Inserted into left child's right subtree | Left rotation then right rotation |
| RL (Right-Left) | Inserted into right child's left subtree | Right rotation then left rotation |
Right rotation (fixing LL imbalance):
z y
/ \ / \
y T4 โ x z
/ \ / \ / \
x T3 T1 T2 T3 T4
/ \
T1 T2
Left rotation (fixing RR imbalance):
z y
/ \ / \
T1 y โ z x
/ \ / \ / \
T2 x T1 T2 T3 T4
/ \
T3 T4
LR double rotation (left rotate y, then right rotate z):
z z x
/ \ / \ / \
y T4 โ x T4 โ y z
/ \ / \ / \ / \
T1 x y T3 T1 T2 T3 T4
/ \ / \
T2 T3 T1 T2
RL double rotation (right rotate y, then left rotate z):
z z x
/ \ / \ / \
T1 y โ T1 x โ z y
/ \ / \ / \ / \
x T4 T2 y T1 T2 T3 T4
/ \ / \
T2 T3 T3 T4
11.5 Practical BSTs in Python
Python's standard library has no built-in BST. In practice and competitive programming, the most common alternative is the sortedcontainers library:
from sortedcontainers import SortedList, SortedDict, SortedSet
# SortedList โ automatically maintains sorted order
sl = SortedList([5, 1, 3, 7, 2])
print(sl) # SortedList([1, 2, 3, 5, 7])
sl.add(4) # O(log n) insertion
sl.remove(3) # O(log n) deletion
print(sl[2]) # O(log n) indexed access
# Range queries
print(list(sl.irange(2, 5))) # All elements between 2 and 5
# SortedDict โ dictionary ordered by key
sd = SortedDict({'b': 2, 'a': 1, 'c': 3})
print(sd.keys()) # SortedKeysView(['a', 'b', 'c'])
print(sd.peekitem(0)) # ('a', 1) โ minimum key
print(sd.peekitem(-1)) # ('c', 3) โ maximum key
Under the hood, sortedcontainers doesn't use trees but "segmented lists" (similar to B-tree concepts). It provides the same O(log n) guarantees as a balanced BST and is typically faster in practice than pure tree implementations (exploiting CPU cache locality).
In interviews, if the problem requires a sorted data structure and library usage is permitted, SortedList is the Python programmer's first choice.
Level 2 ยท Deep Mastery
11.6 Complete AVL Tree Implementation
Here is a complete AVL tree implementation in Python with insertion, deletion, and rotations:
class AVLNode:
def __init__(self, val):
self.val = val
self.left = None
self.right = None
self.height = 1 # New node has height 1
class AVLTree:
def get_height(self, node):
if not node:
return 0
return node.height
def get_balance(self, node):
"""Balance factor = left height - right height"""
if not node:
return 0
return self.get_height(node.left) - self.get_height(node.right)
def update_height(self, node):
"""Update node height"""
node.height = 1 + max(self.get_height(node.left),
self.get_height(node.right))
def right_rotate(self, z):
"""Right rotation: fixes LL imbalance"""
y = z.left
T3 = y.right
# Perform rotation
y.right = z
z.left = T3
# Update heights (z first, then y, since y is now parent)
self.update_height(z)
self.update_height(y)
return y # y becomes new root
def left_rotate(self, z):
"""Left rotation: fixes RR imbalance"""
y = z.right
T2 = y.left
# Perform rotation
y.left = z
z.right = T2
# Update heights
self.update_height(z)
self.update_height(y)
return y # y becomes new root
def insert(self, root, val):
"""Insert node and maintain AVL balance"""
# 1. Standard BST insertion
if not root:
return AVLNode(val)
if val < root.val:
root.left = self.insert(root.left, val)
elif val > root.val:
root.right = self.insert(root.right, val)
else:
return root # No duplicate values
# 2. Update height of current node
self.update_height(root)
# 3. Get balance factor
balance = self.get_balance(root)
# 4. If unbalanced, four cases
# LL: Left subtree's left subtree is too tall
if balance > 1 and val < root.left.val:
return self.right_rotate(root)
# RR: Right subtree's right subtree is too tall
if balance < -1 and val > root.right.val:
return self.left_rotate(root)
# LR: Left subtree's right subtree is too tall
if balance > 1 and val > root.left.val:
root.left = self.left_rotate(root.left)
return self.right_rotate(root)
# RL: Right subtree's left subtree is too tall
if balance < -1 and val < root.right.val:
root.right = self.right_rotate(root.right)
return self.left_rotate(root)
return root
def get_min_node(self, node):
"""Get the minimum node in the subtree rooted at node"""
current = node
while current.left:
current = current.left
return current
def delete(self, root, val):
"""Delete node and maintain AVL balance"""
# 1. Standard BST deletion
if not root:
return None
if val < root.val:
root.left = self.delete(root.left, val)
elif val > root.val:
root.right = self.delete(root.right, val)
else:
# Found the node to delete
if not root.left:
return root.right
elif not root.right:
return root.left
else:
# Two children: use inorder successor
successor = self.get_min_node(root.right)
root.val = successor.val
root.right = self.delete(root.right, successor.val)
# 2. Update height
self.update_height(root)
# 3. Get balance factor
balance = self.get_balance(root)
# 4. Rebalance (deletion requires checking child's balance)
# LL
if balance > 1 and self.get_balance(root.left) >= 0:
return self.right_rotate(root)
# LR
if balance > 1 and self.get_balance(root.left) < 0:
root.left = self.left_rotate(root.left)
return self.right_rotate(root)
# RR
if balance < -1 and self.get_balance(root.right) <= 0:
return self.left_rotate(root)
# RL
if balance < -1 and self.get_balance(root.right) > 0:
root.right = self.right_rotate(root.right)
return self.left_rotate(root)
return root
def inorder(self, root):
"""Inorder traversal to verify BST property"""
if not root:
return []
return self.inorder(root.left) + [root.val] + self.inorder(root.right)
# Usage example
avl = AVLTree()
root = None
# Inserting 1-7 in order โ would degenerate in a plain BST
for val in [1, 2, 3, 4, 5, 6, 7]:
root = avl.insert(root, val)
print(avl.inorder(root)) # [1, 2, 3, 4, 5, 6, 7]
print(root.val) # 4 (root node โ balanced!)
print(avl.get_height(root) - 1) # Height = 2 (close to log2(7) โ 2.8)
Key difference between insertion and deletion rebalancing:
- During insertion, you determine the rotation type by comparing the new value against the child node's value (which direction was it inserted?)
- During deletion, you must check the child's balance factor to determine the rotation type (since deletion can happen anywhere)
11.7 Red-Black Trees
Red-Black trees are the most widely used self-balancing BST in engineering practice. They don't pursue the strict balance of AVL trees, instead allowing some imbalance to reduce the number of rotations.
The Five Properties of Red-Black Trees
- Every node is either red or black
- The root is black
- Every leaf (NIL/null node) is black
- Both children of a red node are black (no consecutive red nodes)
- For any node, all paths from that node to descendant leaves contain the same number of black nodes (equal black-height)
Why do these five properties guarantee O(log n)?
The key lies in the combination of properties 4 and 5:
- Property 5 ensures all paths have the same number of black nodes; call it B
- Property 4 ensures red nodes are never adjacent, so the maximum number of red nodes on any path is B
- Therefore: longest path โค 2B, shortest path โฅ B
- The longest path is at most twice the shortest
This means a Red-Black tree's height is at most 2 * logโ(n+1), giving O(log n) for all operations.
Red-Black Trees vs AVL Trees
| Dimension | AVL Tree | Red-Black Tree |
|---|---|---|
| Balance strictness | Height difference โค 1 | Longest path โค 2 ร shortest |
| Search performance | Slightly better (shorter) | Slightly worse (can be taller) |
| Insert/delete performance | More rotations | Fewer rotations (at most 3) |
| Best for | Read-heavy (database indices) | Balanced read/write (general containers) |
| Used in | Windows NT kernel | Linux kernel, Java TreeMap |
Core intuition: AVL trees are more "perfect"; Red-Black trees are more "practical." Red-Black trees trade slight imbalance for fewer rotation operations, yielding better overall performance under frequent insertions and deletions.
Red-Black Tree Insertion Fix-up
A newly inserted node is always red (inserting a black node would violate property 5). If the parent is also red, property 4 is violated and we need to fix it.
Three fix-up cases (assuming parent P is the left child of grandparent G):
Case 1: Uncle U is red
- Recolor P and U to black, G to red
- Move up: treat G as the new "problem node"
G(B) G(R) โ might still violate
/ \ / \
P(R) U(R) โ P(B) U(B)
/ /
X(R) X(R)
Case 2: Uncle U is black, X is P's right child
- Left-rotate P, transforming into Case 3
G(B) G(B)
/ \ / \
P(R) U(B) โ X(R) U(B)
\ /
X(R) P(R)
Case 3: Uncle U is black, X is P's left child
- Right-rotate G, then swap colors of P and G
G(B) P(B)
/ \ / \
P(R) U(B) โ X(R) G(R)
/ \
X(R) U(B)
If P is G's right child, the cases are symmetric.
Red-Black Tree Deletion Fix-up
Deletion is more complex than insertion, primarily dealing with the "double-black" problem (after deleting a black node, its replacement carries an "extra black"). We won't present the full code here because Red-Black tree deletion fix-up has 8 cases (4 + 4 symmetric). In practice, you always use a standard library implementation.
11.8 Treap (Tree + Heap)
A Treap is a randomized balanced BST where each node has two attributes:
- key: satisfies BST property (inorder traversal is sorted)
- priority: satisfies heap property (parent priority โฅ child priority)
Since priorities are randomly assigned, the expected height of a Treap is O(log n).
import random
class TreapNode:
def __init__(self, key):
self.key = key
self.priority = random.random() # Random priority
self.left = None
self.right = None
class Treap:
def _right_rotate(self, node):
left = node.left
node.left = left.right
left.right = node
return left
def _left_rotate(self, node):
right = node.right
node.right = right.left
right.left = node
return right
def insert(self, root, key):
"""Insert and maintain heap property via rotations"""
if not root:
return TreapNode(key)
if key < root.key:
root.left = self.insert(root.left, key)
# If left child has higher priority, rotate right
if root.left.priority > root.priority:
root = self._right_rotate(root)
elif key > root.key:
root.right = self.insert(root.right, key)
# If right child has higher priority, rotate left
if root.right.priority > root.priority:
root = self._left_rotate(root)
return root
def delete(self, root, key):
"""Delete: rotate node down to leaf position, then remove"""
if not root:
return None
if key < root.key:
root.left = self.delete(root.left, key)
elif key > root.key:
root.right = self.delete(root.right, key)
else:
# Found the node to delete
if not root.left and not root.right:
return None # Leaf, just remove
elif not root.left:
root = self._left_rotate(root)
root.left = self.delete(root.left, key)
elif not root.right:
root = self._right_rotate(root)
root.right = self.delete(root.right, key)
else:
# Both children exist: rotate the higher-priority child up
if root.left.priority > root.right.priority:
root = self._right_rotate(root)
root.right = self.delete(root.right, key)
else:
root = self._left_rotate(root)
root.left = self.delete(root.left, key)
return root
# Usage example
treap = Treap()
root = None
for val in [1, 2, 3, 4, 5, 6, 7]: # Sequential insertion won't degenerate!
root = treap.insert(root, val)
Treap advantages:
- Much simpler to implement than Red-Black trees (only two rotations)
- Expected O(log n) time complexity with small constants
- Supports efficient split and merge operations
- Very popular in competitive programming
Treap disadvantages:
- Worst case is O(n) (extremely unlikely, but exists)
- Non-deterministic (depends on random number generator)
11.9 B-Tree Introduction
A B-tree is a multi-way search tree where each node can have multiple keys and multiple children. It's the core data structure for database and filesystem indexes.
B-tree characteristics:
- Each node has at most m children (for an order-m B-tree)
- Non-root, non-leaf nodes have at least โm/2โ children
- All leaf nodes are at the same level
Why do databases use B-trees instead of Red-Black trees? Because disk I/O is vastly more expensive than memory access. B-trees increase node "width" to reduce tree "height," thereby minimizing disk accesses.
We will discuss B-trees and B+ trees in detail in Chapter 20.
Level 3 ยท History and Engineering
11.10 History of AVL Trees
AVL trees were proposed by Soviet mathematicians Georgy Adelson-Velsky and Evgenii Landis in their 1962 paper "An algorithm for the organization of information." This was the first self-balancing binary search tree in human history.
Adelson-Velsky (1922-2014) was a legendary figure. Beyond inventing the AVL tree, he was a pioneer in computer chessโhis Kaissa program won the first World Computer Chess Championship in 1974.
The AVL paper was published when the entire field of computer science was still in its infancy. The work's importance lies in proving a crucial possibility: one can spend O(log n) extra work after each insertion/deletion to maintain balance, thereby guaranteeing O(log n) worst-case time complexity for all operations.
11.11 History of Red-Black Trees
The history of Red-Black trees is more winding than many people realize:
-
1972: Rudolf Bayer invented "Symmetric Binary B-trees." This is essentially the Red-Black tree, but without the "red-black" name.
-
1978: Leo Guibas and Robert Sedgewick coined the name "Red-Black tree" in their paper "A Dichromatic Framework for Balanced Trees." According to Sedgewick's later interviews, they chose red and black because the laser printer at Xerox PARC could print in red and black, making the paper's diagrams look beautiful.
-
1993: Arne Andersson proposed the "Left-Leaning Red-Black Tree" (LLRB), simplifying the implementation.
-
2008: Robert Sedgewick further popularized LLRB in his Algorithms textbook, implementing a complete Red-Black tree in about 50 lines of code.
Red-Black trees are the most widely used balanced BST in both textbooks and industry. The reason is simple: they perform well under various workloads, and while the implementation is complex, it's deterministic and well-understood.
11.12 Red-Black Trees in the Linux CFS Scheduler
The Linux kernel's CFS (Completely Fair Scheduler) was developed by Ingo Molnar in 2007 and introduced in Linux 2.6.23. It's a classic application of Red-Black trees.
Problem: The operating system needs to decide which process/thread gets CPU time. The goal is "fairness"โeach process should receive CPU time proportional to its weight.
CFS design:
- Each runnable process is represented by a node
- The node's key is the process's "virtual runtime" (vruntime)โCPU time used so far, normalized by weight
- All runnable processes are organized in a Red-Black tree by vruntime
- Scheduling decision: always pick the process with the smallest vruntime (leftmost node in the Red-Black tree)
- After a process runs, its vruntime increases and it's reinserted into the tree
Why a Red-Black tree?
- Insert and delete are O(log n), where n is the number of runnable processes
- Finding minimum is O(1) (cache the leftmost node)
- Red-Black trees are more efficient than AVL trees under frequent insert/delete
- Cannot use a heap (need efficient deletion of arbitrary nodes)
Starting from Linux 6.6 (October 2023), CFS was replaced by the EEVDF (Earliest Eligible Virtual Deadline First) scheduler, but the underlying data structure is still a Red-Black tree.
11.13 Java TreeMap/TreeSet
Java's standard library TreeMap and TreeSet are implemented using Red-Black trees:
// Key source code of Java TreeMap (simplified)
public class TreeMap<K,V> {
private transient Entry<K,V> root;
static final class Entry<K,V> {
K key;
V value;
Entry<K,V> left;
Entry<K,V> right;
Entry<K,V> parent;
boolean color = BLACK; // Red-Black tree color
}
// Fix Red-Black tree properties after insertion
private void fixAfterInsertion(Entry<K,V> x) {
x.color = RED;
while (x != null && x != root && x.parent.color == RED) {
// ... fix-up logic for three cases ...
}
root.color = BLACK;
}
}
Comparison of ordered structures in Java Collections:
| Data Structure | Implementation | Search | Insert | Ordered Traversal |
|---|---|---|---|---|
| HashMap | Hash table + Red-Black tree | O(1) avg | O(1) avg | Not supported |
| TreeMap | Red-Black tree | O(log n) | O(log n) | Supported |
| LinkedHashMap | Hash table + doubly-linked list | O(1) avg | O(1) avg | Insertion order |
Note that since Java 8, HashMap converts a bucket's linked list to a Red-Black tree when the list length exceeds 8. This is an optimization against worst-case hash collisions.
11.14 Splay Trees
Splay trees were proposed by Daniel Sleator and Robert Tarjan in 1985. The core idea is self-adjustment: after every access to a node, a series of rotations ("splaying") moves that node to the root.
Key characteristics:
- No explicit balance condition (no stored height or color)
- Single operation worst case is O(n)
- But m operations have amortized cost of O(m log n)
- Has the "working set" property: recently accessed elements are found faster
class SplayNode:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
class SplayTree:
def _right_rotate(self, x):
y = x.left
x.left = y.right
y.right = x
return y
def _left_rotate(self, x):
y = x.right
x.right = y.left
y.left = x
return y
def splay(self, root, key):
"""Splay (rotate) the node with given key to root"""
if not root or root.key == key:
return root
if key < root.key:
if not root.left:
return root
# Zig-Zig (LL)
if key < root.left.key:
root.left.left = self.splay(root.left.left, key)
root = self._right_rotate(root)
# Zig-Zag (LR)
elif key > root.left.key:
root.left.right = self.splay(root.left.right, key)
if root.left.right:
root.left = self._left_rotate(root.left)
return self._right_rotate(root) if root.left else root
else: # key > root.key
if not root.right:
return root
# Zag-Zig (RL)
if key < root.right.key:
root.right.left = self.splay(root.right.left, key)
if root.right.left:
root.right = self._right_rotate(root.right)
# Zag-Zag (RR)
elif key > root.right.key:
root.right.right = self.splay(root.right.right, key)
root = self._left_rotate(root)
return self._left_rotate(root) if root.right else root
def search(self, root, key):
"""Search and splay the node to root"""
root = self.splay(root, key)
if root and root.key == key:
return root
return None
Splay tree applications:
- Cache systems (recently accessed data is near the root, faster to access)
- Network routing tables
- Memory management in garbage collectors
Splay trees vs Red-Black trees:
- Splay trees are faster under access patterns with locality
- Red-Black trees are more stable under uniform random access
- Splay trees need no extra space for balance information
Level 4 ยท Practical Pitfalls and Interview Prep
11.15 Common Pitfalls in BST Deletion
When using predecessor/successor replacement in BST deletion, several mistakes are easy to make:
Pitfall 1: Forgetting the case where successor IS the right child
# Buggy code: assumes successor always has a parent and isn't the direct right child
def delete_wrong(root, key):
# ... find the node ...
# Find successor
successor = node.right
parent = node
while successor.left:
parent = successor
successor = successor.left
# If successor == node.right,
# parent.left = successor.right is WRONG!
parent.left = successor.right # BUG when successor is node.right
Pitfall 2: Value-replacement losing node references
In our Level 1 implementation, we used root.val = successor.val (value replacement). This is acceptable in interviews. But in production code, if external references point to the replaced node, value replacement causes problems. Production implementations should do "pointer replacement"โactually moving the successor node to the deleted node's position.
Pitfall 3: Forgetting to update balance information after deletion
In AVL trees, after deletion you must update heights and balance factors from the deletion point up to the root, and multiple rotations might be needed (unlike insertion, which requires at most one rotation to fix).
11.16 Red-Black Tree Interview Strategy
Red-Black trees in interviews are typically NOT about writing the full implementation (that's 200+ lines of code). Instead, interviewers test your understanding of properties and application scenarios.
Common interview questions and reference answers:
Q: Why are Red-Black trees more commonly used than AVL trees?
A: Red-Black trees allow looser balance (longest path โค 2ร shortest), meaning fewer rotations during insertion and deletion. AVL's strict balance has advantages in read-heavy scenarios (like database indices), but for general containers (Java TreeMap, C++ std::map) where reads and writes are balanced, Red-Black trees offer better overall performance.
Q: When does HashMap use a Red-Black tree?
A: Java 8's HashMap converts a bucket's linked list to a Red-Black tree when the list length exceeds 8. This defends against hash-collision attacksโan attacker can craft many keys with the same hash, degrading HashMap to O(n) lookup. After conversion to a Red-Black tree, worst case becomes O(log n).
Q: What's the height upper bound of a Red-Black tree?
A: 2 * logโ(n+1). Proof sketch: a Red-Black tree with black-height h_b has at least 2^h_b - 1 internal nodes; since red nodes are never adjacent, total height is at most 2 * h_b.
11.17 High-Frequency Interview Problems
LeetCode 98: Validate Binary Search Tree
Problem: Given a binary tree, determine if it is a valid BST.
Key pitfall: You cannot just compare each node with its immediate children. You must ensure ALL nodes in the left subtree are less than the root, and ALL nodes in the right subtree are greater.
def isValidBST(root):
"""
Approach 1: Recursion with upper/lower bounds
Time O(n), Space O(h)
"""
def validate(node, low, high):
if not node:
return True
if node.val <= low or node.val >= high:
return False
return (validate(node.left, low, node.val) and
validate(node.right, node.val, high))
return validate(root, float('-inf'), float('inf'))
def isValidBST_inorder(root):
"""
Approach 2: Inorder traversal must be strictly increasing
Time O(n), Space O(h)
"""
prev = float('-inf')
def inorder(node):
nonlocal prev
if not node:
return True
if not inorder(node.left):
return False
if node.val <= prev:
return False
prev = node.val
return inorder(node.right)
return inorder(root)
Interview tip: Approach 1 is the most intuitive and easiest to write correctly. Use float('-inf') and float('inf') for bounds, not concrete integer values (node.val might equal INT_MIN or INT_MAX).
LeetCode 230: Kth Smallest Element in a BST
Problem: Given the root of a BST and integer k, return the kth smallest value.
def kthSmallest(root, k):
"""
Inorder traversal, stop at the kth element
Time O(H + k), Space O(H), H = tree height
"""
stack = []
node = root
count = 0
while stack or node:
# Go all the way left
while node:
stack.append(node)
node = node.left
node = stack.pop()
count += 1
if count == k:
return node.val
node = node.right
return -1 # Should never reach here
Follow-up: If the BST is frequently modified (insertions/deletions) and kth-smallest queries are frequent, you can augment each node with "left subtree size," making kth-smallest queries O(log n). This is called an Order-Statistic Tree.
LeetCode 108: Convert Sorted Array to Balanced BST
Problem: Given an ascending sorted integer array, convert it to a height-balanced BST.
def sortedArrayToBST(nums):
"""
Pick middle element as root, recursively build left and right subtrees
Time O(n), Space O(log n) (recursion stack)
"""
def build(left, right):
if left > right:
return None
mid = (left + right) // 2
node = TreeNode(nums[mid])
node.left = build(left, mid - 1)
node.right = build(mid + 1, right)
return node
return build(0, len(nums) - 1)
Why the middle element? Using the middle element as root ensures roughly equal elements on both sides, producing a balanced tree. This is divide-and-conquer in action.
LeetCode 450: Delete Node in a BST
This is exactly the deletion operation from Section 11.2. The key is handling three cases correctly, especially using the inorder successor when the node has two children.
def deleteNode(root, key):
if not root:
return None
if key < root.val:
root.left = deleteNode(root.left, key)
elif key > root.val:
root.right = deleteNode(root.right, key)
else:
# Cases 1 & 2
if not root.left:
return root.right
if not root.right:
return root.left
# Case 3: find inorder successor
successor = root.right
while successor.left:
successor = successor.left
root.val = successor.val
root.right = deleteNode(root.right, successor.val)
return root
11.18 BST vs Hash Table: When to Choose Which
This is a high-frequency system design interview question.
| Requirement | BST (TreeMap) | Hash Table (HashMap) |
|---|---|---|
| Exact lookup | O(log n) | O(1) average |
| Range query (all keys in [a, b]) | O(log n + k) | Not supported |
| Find min/max | O(log n) or O(1) | O(n) |
| Ordered traversal | O(n) | O(n log n) (need sorting) |
| Find predecessor/successor | O(log n) | Not supported |
| Space efficiency | Worse (pointer overhead) | Better |
| Worst case | O(log n) (balanced) | O(n) (hash collisions) |
Simple decision rule:
- If you only need "existence queries" or "exact lookup," use a hash table
- If you need "ordering" (range queries, ranking, predecessor/successor), use a BST/TreeMap
- If you need both, consider maintaining both structures or using
sortedcontainers
Classic examples:
- "Design a data structure supporting insert/delete/getRandom" โ Hash table + array
- "Design a data structure supporting insert/delete/getMin" โ Balanced BST or heap
- "Design a data structure supporting range queries" โ Balanced BST
Chapter Summary
| Data Structure | Search | Insert | Delete | Characteristics |
|---|---|---|---|---|
| BST (unbalanced) | O(h) | O(h) | O(h) | h can degenerate to n |
| AVL Tree | O(log n) | O(log n) | O(log n) | Strict balance, good for read-heavy |
| Red-Black Tree | O(log n) | O(log n) | O(log n) | Relaxed balance, engineering favorite |
| Treap | O(log n) expected | O(log n) expected | O(log n) expected | Simple implementation, competition favorite |
| Splay Tree | O(log n) amortized | O(log n) amortized | O(log n) amortized | Great with locality |
| B-Tree | O(log n) | O(log n) | O(log n) | Disk I/O friendly, database indexes |
Key takeaways:
- BST inorder traversal is sortedโthis is the foundation of all its properties
- Rotation is the core operation of all balanced treesโit changes shape while preserving BST property
- AVL trees measure balance by height difference; Red-Black trees by color rules
- Red-Black trees are more common in engineering because they require fewer rotations under mixed read/write workloads
- In interviews, BST validation, search, and deletion are high-frequency topics; Red-Black trees require understanding properties not hand-writing code
Next chapter preview: In Chapter 12, we'll study Heaps and Priority Queuesโa tree structure that only requires partial ordering, trading global order for faster min/max operations.