Locks, Deadlocks and Race Conditions
Locks, Deadlocks and Race Conditions
Two people walk into a bank at the same time. Person A wants to transfer $1,000 to B. Person B simultaneously wants to transfer $500 to A. If the banking system is poorly designed, these two operations can interfere: one thread reads a balance just before the other writes it, double-deductions happen, and money either vanishes or multiplies out of thin air. This is a race condition — the result of the program depends on the unpredictable order in which threads happen to be scheduled.
Concurrency bugs are harder to catch than ordinary bugs. They often appear only under specific timing conditions, pass all local tests, and then crash production systems once every few weeks. They even have a name: Heisenbugs — the act of observing them (by adding logging) changes the timing and makes them disappear.
Core Concepts
Race Conditions: The Bank Transfer Story
# Dangerous transfer function — no locks
accounts = {'A': 1000, 'B': 500}
def transfer(from_acc, to_acc, amount):
if accounts[from_acc] >= amount:
# ← imagine Thread 1 is preempted RIGHT HERE
accounts[from_acc] -= amount # step 2
accounts[to_acc] += amount # step 3
# Scenario: A→B transfers $1000, B→A transfers $500, simultaneously
# Thread 1: check A >= 1000 ✓
# Thread 2: check B >= 500 ✓ (B is still 500 at this point)
# Thread 1: A = 1000 - 1000 = 0
# Thread 1: B = 500 + 1000 = 1500
# Thread 2: B = 1500 - 500 = 1000 ← deducting from already-updated B
# Thread 2: A = 0 + 500 = 500
# Result: A=500, B=1000, total=1500. $500 was created out of nothing.
The problem: "check balance" and "execute debit" are not atomic. The moment between those two steps, the world can change.
Mutex: Putting a Lock on the Operation
A mutex (mutual exclusion lock) ensures that only one thread at a time can enter a critical section — the code that touches shared data:
Critical section = code that accesses shared resources
Thread 1: lock() ─── critical section ─── unlock()
↑
Thread 2: lock() × │ lock() ✓ ─── critical section ─── unlock()
blocked ──────┘
import threading
lock = threading.Lock()
accounts = {'A': 1000, 'B': 500}
def safe_transfer(from_acc, to_acc, amount):
with lock: # automatically calls acquire() and release()
if accounts[from_acc] >= amount:
accounts[from_acc] -= amount
accounts[to_acc] += amount
# Now two threads cannot execute the 'with lock' block simultaneously
The cost of locking: serialization. Operations that could run in parallel are now forced to queue up, giving away some of the concurrency advantage. Coarse-grained locks (one big lock for lots of data) create heavy contention. Fine-grained locks (one lock per data item) are more scalable but harder to manage correctly.
Semaphores: Controlling Concurrency Level
A semaphore is a lock with a counter. It allows up to N threads in simultaneously:
Semaphore(3): at most 3 threads may use the resource at once
(e.g., a pool of 3 database connections)
counter = 3
Thread 1 enters: counter = 2
Thread 2 enters: counter = 1
Thread 3 enters: counter = 0
Thread 4 tries: counter = 0 → BLOCKED
Thread 1 exits: counter = 1 → Thread 4 is woken up and enters
A binary semaphore (counter can only be 0 or 1) is equivalent to a mutex. Semaphores are also ideal for producer-consumer synchronization: the producer calls sem.release() after creating each item; the consumer calls sem.acquire() before consuming one, and automatically waits if nothing is available yet.
Deadlock: Four Necessary Conditions
Deadlock occurs when two or more threads are each waiting for a resource held by the other, and neither can ever proceed.
Classic scenario:
Thread 1 holds Lock A, waiting for Lock B
↓ ↑
Thread 2 holds Lock B, waiting for Lock A
→ Circular wait. Both threads are frozen forever.
A deadlock requires all four of these conditions to hold simultaneously:
1. Mutual Exclusion
A resource can be held by only one thread at a time.
(Locks satisfy this by definition.)
2. Hold and Wait
A thread holds at least one resource while waiting
to acquire additional resources held by others.
3. No Preemption
A held resource cannot be forcibly taken away.
It must be released voluntarily by the holder.
4. Circular Wait
Thread A waits for B, B waits for C, C waits for A.
A cycle of dependencies with no exit.
Breaking any one condition prevents deadlock:
| Condition to break | Strategy | Example |
|---|---|---|
| Hold and Wait | Request all resources at once | Database transaction locks all tables upfront |
| No Preemption | Release held resources on failure | trylock(): if it fails, release everything and retry |
| Circular Wait | Enforce a global lock ordering | Always acquire Lock A before Lock B, everywhere |
Lock-Free Programming: CAS Atomic Operations
Locks have overhead and introduce the risk of deadlock. Can we avoid them? Yes — using atomic operations provided by the hardware.
CAS (Compare-And-Swap) is an atomic CPU instruction:
CAS(address, expected, new_value):
if value at address == expected:
atomically write new_value
return true
else:
return false (someone else already changed it)
# Python's counter += 1 is NOT atomic. It compiles to:
tmp = counter # load
tmp = tmp + 1 # increment
counter = tmp # store
# Any thread can be preempted between these three steps
# In Java (pseudocode showing the CAS idea):
# AtomicInteger counter = new AtomicInteger(0);
# counter.incrementAndGet(); // atomic, no lock needed
# In Python, use threading.Lock for thread safety,
# or rely on the GIL only where you fully understand its guarantees.
CAS pitfalls: the ABA problem (a value changes from A to B and back to A — CAS sees A and thinks nothing changed) and spin-waiting (a thread loops calling CAS until it succeeds, burning CPU). For high-frequency, low-contention counters, lock-free outperforms locking by a significant margin.
Hands-On Verification
# Demonstrate a race condition (GIL in CPython may mask it; use C/Java for a clear example)
import threading
counter = 0
def unsafe_increment():
global counter
for _ in range(100_000):
counter += 1
threads = [threading.Thread(target=unsafe_increment) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()
print(f"Without lock: {counter} (expected 1_000_000)") # May be correct due to GIL
# Locked version — always correct
counter = 0
lock = threading.Lock()
def safe_increment():
global counter
for _ in range(100_000):
with lock:
counter += 1
threads = [threading.Thread(target=safe_increment) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()
print(f"With lock: {counter} (expected 1_000_000)") # Guaranteed correct
# Demonstrate deadlock — the program will hang forever
import threading, time
lock_a = threading.Lock()
lock_b = threading.Lock()
def thread1():
with lock_a:
print("Thread 1 acquired Lock A")
time.sleep(0.1)
print("Thread 1 waiting for Lock B...")
with lock_b: # ← deadlock — Thread 2 has Lock B
print("Thread 1 acquired Lock B (you will never see this)")
def thread2():
with lock_b:
print("Thread 2 acquired Lock B")
time.sleep(0.1)
print("Thread 2 waiting for Lock A...")
with lock_a: # ← deadlock — Thread 1 has Lock A
print("Thread 2 acquired Lock A (you will never see this)")
t1 = threading.Thread(target=thread1)
t2 = threading.Thread(target=thread2)
t1.start(); t2.start()
# Program hangs. Press Ctrl+C to exit.
🔬 Going Deeper
Memory Visibility: More Subtle Than You Think
On a multi-core CPU, each core has its own L1 and L2 cache. When Thread 1 on Core 0 modifies variable x, Thread 2 on Core 1 may not immediately see the new value — the write could still be sitting in Core 0's cache, not yet flushed to main memory. This is the memory visibility problem, entirely separate from atomicity. Languages address it with memory barriers: Java's volatile, C++'s std::atomic, and Go's channel operations all implicitly insert the necessary memory fences to enforce visibility ordering.
Lock-Free Data Structures in the Real World
Many high-performance systems rely heavily on lock-free structures. The Linux kernel's RCU (Read-Copy-Update) mechanism lets readers access shared data without any lock at all; writers make a private copy, modify it, then atomically swap in the pointer. The Disruptor (used by the LMAX currency exchange) is a lock-free ring buffer that outperforms Java's BlockingQueue by an order of magnitude. These designs all rest on a deep understanding of CAS and CPU memory models.
Recommended Reading:
- Java Concurrency in Practice by Brian Goetz — Chapters 10–13 (deadlock, livelock, performance) are the industry standard reference for concurrent programming. The examples are concrete and the analysis is rigorous.
- Operating Systems: Three Easy Pieces (OSTEP) — Chapters 26–34 cover the full concurrency stack: locks, condition variables, semaphores, and common concurrency bugs.
- Martin Thompson's Mechanical Sympathy blog series — Essential reading for understanding how CPU caches, cache lines, false sharing, and hardware atomics affect real-world concurrent performance.