Chapter 19

Locks, Deadlocks and Race Conditions

Locks, Deadlocks and Race Conditions

Two people walk into a bank at the same time. Person A wants to transfer $1,000 to B. Person B simultaneously wants to transfer $500 to A. If the banking system is poorly designed, these two operations can interfere: one thread reads a balance just before the other writes it, double-deductions happen, and money either vanishes or multiplies out of thin air. This is a race condition โ€” the result of the program depends on the unpredictable order in which threads happen to be scheduled.

Concurrency bugs are harder to catch than ordinary bugs. They often appear only under specific timing conditions, pass all local tests, and then crash production systems once every few weeks. They even have a name: Heisenbugs โ€” the act of observing them (by adding logging) changes the timing and makes them disappear.

Core Concepts

Race Conditions: The Bank Transfer Story

# Dangerous transfer function โ€” no locks
accounts = {'A': 1000, 'B': 500}

def transfer(from_acc, to_acc, amount):
    if accounts[from_acc] >= amount:
        # โ† imagine Thread 1 is preempted RIGHT HERE
        accounts[from_acc] -= amount   # step 2
        accounts[to_acc]   += amount   # step 3

# Scenario: Aโ†’B transfers $1000, Bโ†’A transfers $500, simultaneously
# Thread 1: check A >= 1000 โœ“
# Thread 2: check B >= 500  โœ“      (B is still 500 at this point)
# Thread 1: A = 1000 - 1000 = 0
# Thread 1: B = 500  + 1000 = 1500
# Thread 2: B = 1500 - 500  = 1000  โ† deducting from already-updated B
# Thread 2: A = 0    + 500  = 500
# Result: A=500, B=1000, total=1500. $500 was created out of nothing.

The problem: "check balance" and "execute debit" are not atomic. The moment between those two steps, the world can change.

Mutex: Putting a Lock on the Operation

A mutex (mutual exclusion lock) ensures that only one thread at a time can enter a critical section โ€” the code that touches shared data:

Critical section = code that accesses shared resources

Thread 1:  lock()  โ”€โ”€โ”€ critical section โ”€โ”€โ”€  unlock()
                          โ†‘
Thread 2:  lock() ร—       โ”‚                 lock() โœ“  โ”€โ”€โ”€ critical section โ”€โ”€โ”€ unlock()
           blocked  โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
import threading

lock = threading.Lock()
accounts = {'A': 1000, 'B': 500}

def safe_transfer(from_acc, to_acc, amount):
    with lock:    # automatically calls acquire() and release()
        if accounts[from_acc] >= amount:
            accounts[from_acc] -= amount
            accounts[to_acc]   += amount

# Now two threads cannot execute the 'with lock' block simultaneously

The cost of locking: serialization. Operations that could run in parallel are now forced to queue up, giving away some of the concurrency advantage. Coarse-grained locks (one big lock for lots of data) create heavy contention. Fine-grained locks (one lock per data item) are more scalable but harder to manage correctly.

Semaphores: Controlling Concurrency Level

A semaphore is a lock with a counter. It allows up to N threads in simultaneously:

Semaphore(3): at most 3 threads may use the resource at once
              (e.g., a pool of 3 database connections)

counter = 3
Thread 1 enters: counter = 2
Thread 2 enters: counter = 1
Thread 3 enters: counter = 0
Thread 4 tries:  counter = 0 โ†’ BLOCKED
Thread 1 exits:  counter = 1 โ†’ Thread 4 is woken up and enters

A binary semaphore (counter can only be 0 or 1) is equivalent to a mutex. Semaphores are also ideal for producer-consumer synchronization: the producer calls sem.release() after creating each item; the consumer calls sem.acquire() before consuming one, and automatically waits if nothing is available yet.

Deadlock: Four Necessary Conditions

Deadlock occurs when two or more threads are each waiting for a resource held by the other, and neither can ever proceed.

Classic scenario:

Thread 1 holds Lock A, waiting for Lock B
         โ†“                            โ†‘
Thread 2 holds Lock B, waiting for Lock A

โ†’ Circular wait. Both threads are frozen forever.

A deadlock requires all four of these conditions to hold simultaneously:

1. Mutual Exclusion
   A resource can be held by only one thread at a time.
   (Locks satisfy this by definition.)

2. Hold and Wait
   A thread holds at least one resource while waiting
   to acquire additional resources held by others.

3. No Preemption
   A held resource cannot be forcibly taken away.
   It must be released voluntarily by the holder.

4. Circular Wait
   Thread A waits for B, B waits for C, C waits for A.
   A cycle of dependencies with no exit.

Breaking any one condition prevents deadlock:

Condition to break Strategy Example
Hold and Wait Request all resources at once Database transaction locks all tables upfront
No Preemption Release held resources on failure trylock(): if it fails, release everything and retry
Circular Wait Enforce a global lock ordering Always acquire Lock A before Lock B, everywhere

Lock-Free Programming: CAS Atomic Operations

Locks have overhead and introduce the risk of deadlock. Can we avoid them? Yes โ€” using atomic operations provided by the hardware.

CAS (Compare-And-Swap) is an atomic CPU instruction:

CAS(address, expected, new_value):
  if value at address == expected:
    atomically write new_value
    return true
  else:
    return false  (someone else already changed it)
# Python's counter += 1 is NOT atomic. It compiles to:
tmp = counter     # load
tmp = tmp + 1     # increment
counter = tmp     # store
# Any thread can be preempted between these three steps

# In Java (pseudocode showing the CAS idea):
# AtomicInteger counter = new AtomicInteger(0);
# counter.incrementAndGet();  // atomic, no lock needed

# In Python, use threading.Lock for thread safety,
# or rely on the GIL only where you fully understand its guarantees.

CAS pitfalls: the ABA problem (a value changes from A to B and back to A โ€” CAS sees A and thinks nothing changed) and spin-waiting (a thread loops calling CAS until it succeeds, burning CPU). For high-frequency, low-contention counters, lock-free outperforms locking by a significant margin.

Hands-On Verification

# Demonstrate a race condition (GIL in CPython may mask it; use C/Java for a clear example)
import threading

counter = 0

def unsafe_increment():
    global counter
    for _ in range(100_000):
        counter += 1

threads = [threading.Thread(target=unsafe_increment) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()
print(f"Without lock: {counter} (expected 1_000_000)")  # May be correct due to GIL

# Locked version โ€” always correct
counter = 0
lock = threading.Lock()

def safe_increment():
    global counter
    for _ in range(100_000):
        with lock:
            counter += 1

threads = [threading.Thread(target=safe_increment) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()
print(f"With lock: {counter} (expected 1_000_000)")  # Guaranteed correct
# Demonstrate deadlock โ€” the program will hang forever
import threading, time

lock_a = threading.Lock()
lock_b = threading.Lock()

def thread1():
    with lock_a:
        print("Thread 1 acquired Lock A")
        time.sleep(0.1)
        print("Thread 1 waiting for Lock B...")
        with lock_b:   # โ† deadlock โ€” Thread 2 has Lock B
            print("Thread 1 acquired Lock B (you will never see this)")

def thread2():
    with lock_b:
        print("Thread 2 acquired Lock B")
        time.sleep(0.1)
        print("Thread 2 waiting for Lock A...")
        with lock_a:   # โ† deadlock โ€” Thread 1 has Lock A
            print("Thread 2 acquired Lock A (you will never see this)")

t1 = threading.Thread(target=thread1)
t2 = threading.Thread(target=thread2)
t1.start(); t2.start()
# Program hangs. Press Ctrl+C to exit.

๐Ÿ”ฌ Going Deeper

Memory Visibility: More Subtle Than You Think

On a multi-core CPU, each core has its own L1 and L2 cache. When Thread 1 on Core 0 modifies variable x, Thread 2 on Core 1 may not immediately see the new value โ€” the write could still be sitting in Core 0's cache, not yet flushed to main memory. This is the memory visibility problem, entirely separate from atomicity. Languages address it with memory barriers: Java's volatile, C++'s std::atomic, and Go's channel operations all implicitly insert the necessary memory fences to enforce visibility ordering.

Lock-Free Data Structures in the Real World

Many high-performance systems rely heavily on lock-free structures. The Linux kernel's RCU (Read-Copy-Update) mechanism lets readers access shared data without any lock at all; writers make a private copy, modify it, then atomically swap in the pointer. The Disruptor (used by the LMAX currency exchange) is a lock-free ring buffer that outperforms Java's BlockingQueue by an order of magnitude. These designs all rest on a deep understanding of CAS and CPU memory models.

Recommended Reading:

Rate this chapter
4.6  / 5  (10 ratings)

๐Ÿ’ฌ Comments