Chapter 19

Locks, Deadlocks and Race Conditions

Two people walk into a bank at the same time. Person A wants to transfer $1,000 to B. Person B simultaneously wants to transfer $500 to A. If the banking system is poorly designed, these two operations can interfere: one thread reads a balance just before the other writes it, double-deductions happen, and money either vanishes or multiplies out of thin air. This is a race condition — the result of the program depends on the unpredictable order in which threads happen to be scheduled.

Concurrency bugs are harder to catch than ordinary bugs. They often appear only under specific timing conditions, pass all local tests, and then crash production systems once every few weeks. They even have a name: Heisenbugs — the act of observing them (by adding logging) changes the timing and makes them disappear.

Core Concepts

Race Conditions: The Bank Transfer Story

# Dangerous transfer function — no locks
accounts = {'A': 1000, 'B': 500}

def transfer(from_acc, to_acc, amount):
    if accounts[from_acc] >= amount:
        # ← imagine Thread 1 is preempted RIGHT HERE
        accounts[from_acc] -= amount   # step 2
        accounts[to_acc]   += amount   # step 3

# Scenario: A→B transfers $1000, B→A transfers $500, simultaneously
# Thread 1: check A >= 1000 ✓
# Thread 2: check B >= 500  ✓      (B is still 500 at this point)
# Thread 1: A = 1000 - 1000 = 0
# Thread 1: B = 500  + 1000 = 1500
# Thread 2: B = 1500 - 500  = 1000  ← deducting from already-updated B
# Thread 2: A = 0    + 500  = 500
# Result: A=500, B=1000, total=1500. $500 was created out of nothing.

The problem: "check balance" and "execute debit" are not atomic. The moment between those two steps, the world can change.

Mutex: Putting a Lock on the Operation

A mutex (mutual exclusion lock) ensures that only one thread at a time can enter a critical section — the code that touches shared data:

Critical section = code that accesses shared resources

Thread 1:  lock()  ─── critical section ───  unlock()
                          ↑
Thread 2:  lock() ×       │                 lock() ✓  ─── critical section ─── unlock()
           blocked  ──────┘

import threading

lock = threading.Lock()
accounts = {'A': 1000, 'B': 500}

def safe_transfer(from_acc, to_acc, amount):
    with lock:    # automatically calls acquire() and release()
        if accounts[from_acc] >= amount:
            accounts[from_acc] -= amount
            accounts[to_acc]   += amount

# Now two threads cannot execute the 'with lock' block simultaneously

The cost of locking: serialization. Operations that could run in parallel are now forced to queue up, giving away some of the concurrency advantage. Coarse-grained locks (one big lock for lots of data) create heavy contention. Fine-grained locks (one lock per data item) are more scalable but harder to manage correctly.

Semaphores: Controlling Concurrency Level

A semaphore is a lock with a counter. It allows up to N threads in simultaneously:

Semaphore(3): at most 3 threads may use the resource at once
              (e.g., a pool of 3 database connections)

counter = 3
Thread 1 enters: counter = 2
Thread 2 enters: counter = 1
Thread 3 enters: counter = 0
Thread 4 tries:  counter = 0 → BLOCKED
Thread 1 exits:  counter = 1 → Thread 4 is woken up and enters

A binary semaphore (counter can only be 0 or 1) is equivalent to a mutex. Semaphores are also ideal for producer-consumer synchronization: the producer calls sem.release() after creating each item; the consumer calls sem.acquire() before consuming one, and automatically waits if nothing is available yet.

Deadlock: Four Necessary Conditions

Deadlock occurs when two or more threads are each waiting for a resource held by the other, and neither can ever proceed.

Classic scenario:

Thread 1 holds Lock A, waiting for Lock B
         ↓                            ↑
Thread 2 holds Lock B, waiting for Lock A

→ Circular wait. Both threads are frozen forever.

A deadlock requires all four of these conditions to hold simultaneously:

1. Mutual Exclusion
   A resource can be held by only one thread at a time.
   (Locks satisfy this by definition.)

2. Hold and Wait
   A thread holds at least one resource while waiting
   to acquire additional resources held by others.

3. No Preemption
   A held resource cannot be forcibly taken away.
   It must be released voluntarily by the holder.

4. Circular Wait
   Thread A waits for B, B waits for C, C waits for A.
   A cycle of dependencies with no exit.

Breaking any one condition prevents deadlock:

Condition to break	Strategy	Example
Hold and Wait	Request all resources at once	Database transaction locks all tables upfront
No Preemption	Release held resources on failure	`trylock()`: if it fails, release everything and retry
Circular Wait	Enforce a global lock ordering	Always acquire Lock A before Lock B, everywhere

Lock-Free Programming: CAS Atomic Operations

Locks have overhead and introduce the risk of deadlock. Can we avoid them? Yes — using atomic operations provided by the hardware.

CAS (Compare-And-Swap) is an atomic CPU instruction:

CAS(address, expected, new_value):
  if value at address == expected:
    atomically write new_value
    return true
  else:
    return false  (someone else already changed it)

# Python's counter += 1 is NOT atomic. It compiles to:
tmp = counter     # load
tmp = tmp + 1     # increment
counter = tmp     # store
# Any thread can be preempted between these three steps

# In Java (pseudocode showing the CAS idea):
# AtomicInteger counter = new AtomicInteger(0);
# counter.incrementAndGet();  // atomic, no lock needed

# In Python, use threading.Lock for thread safety,
# or rely on the GIL only where you fully understand its guarantees.

CAS pitfalls: the ABA problem (a value changes from A to B and back to A — CAS sees A and thinks nothing changed) and spin-waiting (a thread loops calling CAS until it succeeds, burning CPU). For high-frequency, low-contention counters, lock-free outperforms locking by a significant margin.

Hands-On Verification

# Demonstrate a race condition (GIL in CPython may mask it; use C/Java for a clear example)
import threading

counter = 0

def unsafe_increment():
    global counter
    for _ in range(100_000):
        counter += 1

threads = [threading.Thread(target=unsafe_increment) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()
print(f"Without lock: {counter} (expected 1_000_000)")  # May be correct due to GIL

# Locked version — always correct
counter = 0
lock = threading.Lock()

def safe_increment():
    global counter
    for _ in range(100_000):
        with lock:
            counter += 1

threads = [threading.Thread(target=safe_increment) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()
print(f"With lock: {counter} (expected 1_000_000)")  # Guaranteed correct

# Demonstrate deadlock — the program will hang forever
import threading, time

lock_a = threading.Lock()
lock_b = threading.Lock()

def thread1():
    with lock_a:
        print("Thread 1 acquired Lock A")
        time.sleep(0.1)
        print("Thread 1 waiting for Lock B...")
        with lock_b:   # ← deadlock — Thread 2 has Lock B
            print("Thread 1 acquired Lock B (you will never see this)")

def thread2():
    with lock_b:
        print("Thread 2 acquired Lock B")
        time.sleep(0.1)
        print("Thread 2 waiting for Lock A...")
        with lock_a:   # ← deadlock — Thread 1 has Lock A
            print("Thread 2 acquired Lock A (you will never see this)")

t1 = threading.Thread(target=thread1)
t2 = threading.Thread(target=thread2)
t1.start(); t2.start()
# Program hangs. Press Ctrl+C to exit.

🔬 Going Deeper

Memory Visibility: More Subtle Than You Think

On a multi-core CPU, each core has its own L1 and L2 cache. When Thread 1 on Core 0 modifies variable x, Thread 2 on Core 1 may not immediately see the new value — the write could still be sitting in Core 0's cache, not yet flushed to main memory. This is the memory visibility problem, entirely separate from atomicity. Languages address it with memory barriers: Java's volatile, C++'s std::atomic, and Go's channel operations all implicitly insert the necessary memory fences to enforce visibility ordering.

Lock-Free Data Structures in the Real World

Many high-performance systems rely heavily on lock-free structures. The Linux kernel's RCU (Read-Copy-Update) mechanism lets readers access shared data without any lock at all; writers make a private copy, modify it, then atomically swap in the pointer. The Disruptor (used by the LMAX currency exchange) is a lock-free ring buffer that outperforms Java's BlockingQueue by an order of magnitude. These designs all rest on a deep understanding of CAS and CPU memory models.

Recommended Reading:

Java Concurrency in Practice by Brian Goetz — Chapters 10–13 (deadlock, livelock, performance) are the industry standard reference for concurrent programming. The examples are concrete and the analysis is rigorous.
Operating Systems: Three Easy Pieces (OSTEP) — Chapters 26–34 cover the full concurrency stack: locks, condition variables, semaphores, and common concurrency bugs.
Martin Thompson's Mechanical Sympathy blog series — Essential reading for understanding how CPU caches, cache lines, false sharing, and hardware atomics affect real-world concurrent performance.

Rate this chapter

4.6 / 5 (10 ratings)

Locks, Deadlocks and Race Conditions

Locks, Deadlocks and Race Conditions

Core Concepts

Race Conditions: The Bank Transfer Story

Mutex: Putting a Lock on the Operation

Semaphores: Controlling Concurrency Level

Deadlock: Four Necessary Conditions

Lock-Free Programming: CAS Atomic Operations

Hands-On Verification

🔬 Going Deeper

💬 Comments