Chapter 35

Distributed Locks: From SETNX to Redisson

Chapter 35: Complete Guide to Distributed Locks — From SETNX to Redisson

Distributed locks enforce mutual exclusion across multiple nodes accessing shared resources. Redis-based locks are widely adopted for their performance and operational simplicity, but subtle correctness issues lurk beneath the surface. This chapter traces the evolution from naive SETNX to a production-correct implementation, dissects Redisson's industrial-grade design including the WatchDog renewal mechanism, and examines the Redlock algorithm alongside its formal critique.

35.1 Starting with SETNX

35.1.1 Broken Patterns: Non-Atomic Two-Step Lock

The earliest Redis lock implementations used two separate commands:

# Pattern 1: SETNX + EXPIRE — not atomic
SETNX lock:order 1       # Step 1: set the lock
EXPIRE lock:order 30     # Step 2: set expiry
# If the process crashes between steps 1 and 2, the lock has no TTL → permanent deadlock

# Pattern 2: GET + SET race condition
GET lock:order           # Check existence
SET lock:order 1         # Set (another thread can win between GET and SET)
EXPIRE lock:order 30

If the process crashes after SETNX succeeds but before EXPIRE executes, the lock key persists indefinitely — a deadlock requiring manual intervention to resolve.

35.1.2 Correct Pattern: Atomic SET NX PX

Since Redis 2.6.12, the SET command supports NX (only set if not exists) and PX (expire in milliseconds) as a single atomic operation:

SET lock:order:1001 {token} NX PX 30000
# NX: fail if key already exists (atomic check-and-set)
# PX 30000: auto-expire after 30,000 milliseconds
# {token}: unique identifier (UUID) for ownership verification

import uuid

def acquire_lock(r, lock_name: str, expire_ms: int = 30000) -> str | None:
    """
    Attempt to acquire a distributed lock.
    Returns the token (lock acquired) or None (lock not acquired).
    """
    token = str(uuid.uuid4())
    acquired = r.set(
        f'lock:{lock_name}',
        token,
        nx=True,
        px=expire_ms
    )
    return token if acquired else None

Why a unique token? The token identifies the lock owner. Without it, a process cannot distinguish "this lock belongs to me" from "this lock belongs to another process." The release step requires this identity check, as we will see next.

35.2 Safe Lock Release: Lua Atomic CAS

35.2.1 Unsafe Release Patterns

# Wrong: no ownership check — may delete another process's lock
r.delete(f'lock:{lock_name}')

Scenario A: Thread A holds the lock. The TTL expires (business took too long). Thread B acquires the lock. Thread A's code resumes and calls DELETE — it deletes Thread B's lock. Thread C then acquires the same lock. Both B and C now hold the lock simultaneously.

# Wrong: GET and DEL are not atomic
token = r.get(f'lock:{lock_name}')
if token == my_token:
    # Between this GET and the DEL below, another thread may have
    # replaced the key with a new lock
    r.delete(f'lock:{lock_name}')  # may delete the new owner's lock

35.2.2 Correct Release: Lua Script for Atomic Compare-and-Delete

A Lua script executes atomically on the Redis server — no other command can interleave between the GET and DEL:

-- KEYS[1] = lock key
-- ARGV[1] = token acquired at lock time
if redis.call('get', KEYS[1]) == ARGV[1] then
    return redis.call('del', KEYS[1])  -- atomic ownership-verified delete
else
    return 0  -- token mismatch: not our lock, do nothing
end

_UNLOCK_SCRIPT = """
if redis.call('get', KEYS[1]) == ARGV[1] then
    return redis.call('del', KEYS[1])
else
    return 0
end
"""

def release_lock(r, lock_name: str, token: str) -> bool:
    """
    Atomically release the lock only if the token matches.
    Returns True on successful release, False if token mismatch.
    """
    result = r.eval(_UNLOCK_SCRIPT, 1, f'lock:{lock_name}', token)
    return result == 1

# Context manager for automatic release
import contextlib, time

@contextlib.contextmanager
def redis_lock(r, name: str, expire_ms: int = 30000, retry: int = 3):
    token = None
    for attempt in range(retry):
        token = acquire_lock(r, name, expire_ms)
        if token:
            break
        time.sleep(0.1 * (2 ** attempt))  # exponential backoff
    if not token:
        raise TimeoutError(f"Could not acquire lock: {name}")
    try:
        yield token
    finally:
        release_lock(r, name, token)

# Usage
with redis_lock(r, 'order:1001', expire_ms=15000):
    process_order(1001)

var releaseLockScript = redis.NewScript(`
    if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
    end
    return 0
`)

func AcquireLock(ctx context.Context, rdb *redis.Client, name string, ttl time.Duration) (string, error) {
    token := uuid.New().String()
    ok, err := rdb.SetNX(ctx, "lock:"+name, token, ttl).Result()
    if err != nil {
        return "", fmt.Errorf("acquire lock %s: %w", name, err)
    }
    if !ok {
        return "", nil // lock not acquired
    }
    return token, nil
}

func ReleaseLock(ctx context.Context, rdb *redis.Client, name, token string) (bool, error) {
    n, err := releaseLockScript.Run(ctx, rdb, []string{"lock:" + name}, token).Int()
    if err != nil {
        return false, err
    }
    return n == 1, nil
}

// Node.js / ioredis
const UNLOCK_SCRIPT = `
  if redis.call('get', KEYS[1]) == ARGV[1] then
    return redis.call('del', KEYS[1])
  end
  return 0
`;

async function acquireLock(redis, name, ttlMs = 30000) {
    const token = crypto.randomUUID();
    const result = await redis.set(`lock:${name}`, token, 'NX', 'PX', ttlMs);
    return result === 'OK' ? token : null;
}

async function releaseLock(redis, name, token) {
    const n = await redis.eval(UNLOCK_SCRIPT, 1, `lock:${name}`, token);
    return n === 1;
}

// Auto-release helper
async function withLock(redis, name, ttlMs, fn) {
    const token = await acquireLock(redis, name, ttlMs);
    if (!token) throw new Error(`Could not acquire lock: ${name}`);
    try {
        return await fn();
    } finally {
        await releaseLock(redis, name, token);
    }
}

35.3 Lock TTL vs. Business Execution Time

35.3.1 The Fundamental Tension

Every distributed lock must have a TTL to prevent deadlocks — but business logic execution time is non-deterministic:

TTL = 10 s, business takes 15 s → lock expires mid-execution; another node acquires the lock → concurrent modification
TTL = 120 s, process crashes → lock holds for up to 2 minutes → system availability impaired

This creates a genuine dilemma: short TTL risks premature expiry; long TTL risks prolonged deadlock after crashes.

35.3.2 Redisson WatchDog: Automatic Renewal

Redisson resolves this by separating the lock lifetime concept from any fixed TTL:

Call lock.lock() with no leaseTime argument.
Redisson acquires the lock with leaseTime = 30 s (default, configurable).
Simultaneously, a Netty HashedWheelTimer task is registered to run every leaseTime / 3 = 10 s.
Each timer tick executes a Lua script: PEXPIRE lockKey leaseTime, resetting TTL to 30 s.
On lock.unlock(): the timer task is cancelled, then the lock is deleted via Lua CAS.
On JVM crash: the timer never fires → lock expires naturally after 30 s → no deadlock.

WatchDog Lua renewal script (Redisson uses a Hash to support reentrance):

-- KEYS[1] = lock key (Redis Hash)
-- ARGV[1] = leaseTime in milliseconds
-- ARGV[2] = "{uuid}:{threadId}" — lock holder identity
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then
    redis.call('pexpire', KEYS[1], ARGV[1])
    return 1
end
return 0   -- lock no longer held by this thread; stop renewal

35.3.3 Reentrant Lock Mechanism

Redisson implements full reentrance (like Java's ReentrantLock). The lock value is a Redis Hash:

Key:   "lock:order:1001"   (Hash type)
Field: "{uuid}:{threadId}" (unique per thread across JVMs)
Value: 2                   (reentry count)

RLock lock = redisson.getLock("lock:order:1001");

lock.lock();   // Hash field = 1; WatchDog starts
try {
    lock.lock();   // Same thread, same lock: Hash field = 2 (reentrant)
    try {
        processOrder();
    } finally {
        lock.unlock();  // Hash field = 1; WatchDog continues
    }
} finally {
    lock.unlock();  // Hash field = 0 → key deleted; WatchDog stops
}

Lock acquisition Lua script (simplified):

-- KEYS[1] = lock key
-- ARGV[1] = leaseTime (ms)
-- ARGV[2] = thread identity
if (redis.call('exists', KEYS[1]) == 0) then
    -- Lock free: create hash, set TTL
    redis.call('hset', KEYS[1], ARGV[2], 1)
    redis.call('pexpire', KEYS[1], ARGV[1])
    return nil  -- success
end
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then
    -- Reentrant: increment counter
    redis.call('hincrby', KEYS[1], ARGV[2], 1)
    redis.call('pexpire', KEYS[1], ARGV[1])
    return nil  -- success
end
-- Lock held by another thread: return remaining TTL (milliseconds)
return redis.call('pttl', KEYS[1])

35.4 The Redlock Algorithm

35.4.1 Motivation: Single-Node Redis Weaknesses

Single-node Redis locks have an Achilles' heel: the primary can fail. Because replication is asynchronous, a lock acquired on the primary may not have propagated to replicas before the crash. After failover, the new primary has no record of the lock — another client acquires it. Two clients now simultaneously hold the "same" lock.

Redlock, proposed by Salvatore Sanfilippo (Redis creator, antirez), uses N independent Redis instances to tolerate up to N/2 node failures.

35.4.2 Algorithm (N = 5 independent instances)

Prerequisites: 5 independent Redis nodes (not replicas of each other).
               Any 3 surviving nodes constitute a quorum.

1. Record current time T₁ (milliseconds)
2. For each of the 5 nodes:
     SET lock:resource {token} NX PX {lockTTL}
     Request timeout = lockTTL / 10  (don't block on failed nodes)
3. Record current time T₂
4. elapsed = T₂ - T₁
5. Lock acquired IF:
     (a) acquired_count >= 3  (quorum = N/2 + 1)
     (b) elapsed < lockTTL    (meaningful validity time remains)
6. If acquired:
     valid_time = lockTTL - elapsed - clock_drift_tolerance
     Execute business logic within valid_time
7. If not acquired (or after business completes):
     Send DEL to ALL 5 nodes (regardless of whether they returned success)

import uuid, time

class Redlock:
    CLOCK_DRIFT_FACTOR = 0.01  # 1% of TTL

    def __init__(self, nodes):
        self.nodes  = nodes
        self.quorum = len(nodes) // 2 + 1

    def acquire(self, resource: str, ttl_ms: int) -> dict | None:
        token      = str(uuid.uuid4())
        start_ms   = int(time.time() * 1000)
        n_acquired = 0

        for node in self.nodes:
            try:
                if node.set(resource, token, nx=True, px=ttl_ms):
                    n_acquired += 1
            except Exception:
                pass  # node unreachable

        elapsed_ms  = int(time.time() * 1000) - start_ms
        drift_ms    = int(ttl_ms * self.CLOCK_DRIFT_FACTOR) + 2
        validity_ms = ttl_ms - elapsed_ms - drift_ms

        if n_acquired >= self.quorum and validity_ms > 0:
            return {'token': token, 'validity_ms': validity_ms}

        self.release(resource, token)  # failed — release what we have
        return None

    def release(self, resource: str, token: str) -> None:
        script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            end
            return 0
        """
        for node in self.nodes:
            try:
                node.eval(script, 1, resource, token)
            except Exception:
                pass

# Usage
import redis as r_lib
nodes   = [r_lib.Redis(host=f'redis-{i}') for i in range(1, 6)]
redlock = Redlock(nodes)

result = redlock.acquire('lock:critical-section', 10_000)  # 10 s TTL
if result:
    try:
        do_critical_work()
    finally:
        redlock.release('lock:critical-section', result['token'])
else:
    raise RuntimeError("Could not acquire distributed lock")

35.4.3 Martin Kleppmann's Critique

In 2016, distributed systems researcher Martin Kleppmann published a detailed critique of Redlock, arguing it is unsafe under two realistic scenarios:

Scenario 1: Clock skew:

1. Client A acquires Redlock across 5 nodes. Valid for 10 s.
2. NTP adjusts the clock on node 1 forward by 11 seconds.
3. Node 1's lock TTL reaches zero immediately — lock evicted.
4. Client B acquires lock on node 1 (and possibly others).
5. Clients A and B now both believe they hold the lock.

Scenario 2: GC stop-the-world pause:

1. Client A acquires Redlock. Valid for 10 s.
2. JVM GC pause lasts 15 s.
3. Lock expires across all nodes during the pause.
4. Client B acquires the lock and begins mutating shared state.
5. GC completes. Client A resumes, unaware the lock expired.
   Both A and B now modify the shared resource concurrently.

Kleppmann's conclusion: Redlock relies on timing assumptions that distributed systems cannot guarantee. It is more complex than a single-node lock without providing the safety guarantees of a true consensus-based lock service (ZooKeeper, etcd).

35.4.4 Fencing Tokens: The Correct Solution

Kleppmann proposed fencing tokens as the fundamental fix:

1. Lock service returns a monotonically increasing integer token on each acquisition.
   (e.g., 100 → 101 → 102, never decreasing)
2. Clients pass this token to the storage layer with every mutation.
3. Storage layer tracks the highest token seen; rejects any request with a lower token.

# Lock service: generate monotonic tokens via Redis INCR
def acquire_fencing_lock(r, resource: str, ttl_ms: int):
    token    = r.incr('fencing:token:global')  # monotonically increasing
    acquired = r.set(f'lock:{resource}', token, nx=True, px=ttl_ms)
    return token if acquired else None

# Storage layer: verify token monotonicity before applying write
def update_resource(db, resource_id: int, data: dict, fencing_token: int) -> bool:
    # Only update if this token is strictly greater than any previously applied token
    affected = db.execute("""
        UPDATE resources
        SET    data               = %s,
               last_fencing_token = %s
        WHERE  id                 = %s
          AND  last_fencing_token  < %s
    """, (data, fencing_token, resource_id, fencing_token))
    # affected.rowcount == 0 means a newer token already wrote → this write is rejected
    return affected.rowcount > 0

Why fencing tokens work: even if Client A pauses for 60 seconds due to GC, its stale fencing token (e.g., 100) will be rejected by the storage layer because Client B already wrote with token 101. The correctness guarantee moves from the lock protocol to the storage layer — which is where it belongs.

35.4.5 antirez's Response

Salvatore Sanfilippo responded to Kleppmann's critique with several points:

GC pauses of the magnitude required (>TTL duration) are extremely rare in practice.
NTP should be configured to use slewing (gradual clock adjustment) rather than step jumps, which limits the clock skew rate to ~0.5 ms/s.
Redlock aims to improve safety relative to single-node locks, not to provide the formal guarantees of Raft/Paxos.

The debate remains unresolved in the literature. The practical takeaway: if the operation is genuinely non-idempotent and the cost of concurrent execution is severe, use a consensus-based lock service or database transactions.

35.5 Choosing the Right Lock

35.5.1 Decision Matrix

Scenario	Recommended approach	Rationale
Idempotent deduplication	Single-node `SET NX`	Occasional concurrent execution harmless
General business mutex	Redisson (single or Sentinel)	WatchDog + reentrance; operational simplicity
Financial / inventory operations	Redisson + DB optimistic lock	Dual safeguard
Strong-consistency distributed lock	ZooKeeper / etcd	Raft consensus, clock-independent
Maximum safety (banking core)	Database row lock + transaction	ACID guarantees

35.5.2 Redis vs. ZooKeeper Distributed Locks

Dimension	Redis Lock	ZooKeeper Lock
Consistency protocol	None (single node) / clock-dependent (Redlock)	Zab (Paxos variant), strong consistency
Throughput	100,000+ ops/s	Tens of thousands ops/s
Auto-renewal	Redisson WatchDog	Ephemeral nodes (deleted on session loss)
Fencing token	Manual implementation	Built-in (zxid monotonically increasing)
Operational complexity	Low	High (odd-numbered cluster, Zab tuning)
Best for	High-frequency short locks	Low-frequency long locks, strong consistency

35.5.3 Database Row Lock as the Ground Truth

For the most critical operations, bypass Redis entirely:

-- Serializable mutual exclusion via database row lock
BEGIN;
SELECT * FROM orders WHERE id = 1001 FOR UPDATE NOWAIT;
-- NOWAIT: fail immediately if locked (no blocking wait)
UPDATE orders SET status = 'processing', updated_at = NOW() WHERE id = 1001;
INSERT INTO order_events (order_id, event) VALUES (1001, 'started');
COMMIT;

// Go: database transaction with row lock
func processOrderTx(db *sql.DB, orderID int64) error {
    return doInTx(db, func(tx *sql.Tx) error {
        var status string
        err := tx.QueryRow(
            "SELECT status FROM orders WHERE id = $1 FOR UPDATE NOWAIT",
            orderID,
        ).Scan(&status)
        if err != nil {
            return fmt.Errorf("lock order %d: %w", orderID, err)
        }
        if status != "pending" {
            return fmt.Errorf("order %d already in status %s", orderID, status)
        }
        _, err = tx.Exec(
            "UPDATE orders SET status = 'processing' WHERE id = $1",
            orderID,
        )
        return err
    })
}

35.5.4 etcd for Cross-Service Locks

When ZooKeeper is too heavy but stronger guarantees than Redis are needed:

import (
    clientv3 "go.etcd.io/etcd/client/v3"
    "go.etcd.io/etcd/client/v3/concurrency"
)

client, err := clientv3.New(clientv3.Config{
    Endpoints:   []string{"localhost:2379"},
    DialTimeout: 5 * time.Second,
})

// Session TTL = 10 s (renewed automatically by etcd client)
session, err := concurrency.NewSession(client, concurrency.WithTTL(10))
defer session.Close()

mutex := concurrency.NewMutex(session, "/lock/order/1001")

if err := mutex.Lock(ctx); err != nil {
    return fmt.Errorf("could not acquire etcd lock: %w", err)
}
defer mutex.Unlock(ctx)

// etcd uses Raft consensus: lock is guaranteed consistent
// across the cluster regardless of clock skew
processOrder(1001)

35.6 Production Operations

Monitor lock keys:

# Count active locks
redis-cli --scan --pattern 'lock:*' | wc -l

# Inspect a specific lock
redis-cli TTL 'lock:order:1001'
redis-cli TYPE 'lock:order:1001'      # string or hash (Redisson)
redis-cli HGETALL 'lock:order:1001'  # Redisson reentrant lock contents

# TTL = -1 means no expiry set → potential deadlock
redis-cli TTL 'lock:order:1001'  # should never return -1

Automated leak detection:

import redis

r = redis.Redis()

def scan_for_leaked_locks(pattern: str = 'lock:*') -> list[str]:
    """Find lock keys with no TTL (potential deadlocks)."""
    leaked = []
    cursor = 0
    while True:
        cursor, keys = r.scan(cursor, match=pattern, count=500)
        for key in keys:
            if r.ttl(key) == -1:
                leaked.append(key.decode())
        if cursor == 0:
            break
    return leaked

leaked = scan_for_leaked_locks()
if leaked:
    alert(f"Leaked locks detected (no TTL): {leaked[:20]}")

Application-level metrics:

import time, prometheus_client

lock_wait_histogram = prometheus_client.Histogram(
    'distributed_lock_wait_seconds',
    'Time spent waiting to acquire distributed lock',
    ['resource'],
    buckets=[.001, .005, .01, .05, .1, .5, 1.0, 5.0]
)

lock_failure_counter = prometheus_client.Counter(
    'distributed_lock_failures_total',
    'Number of failed lock acquisition attempts',
    ['resource']
)

@contextlib.contextmanager
def instrumented_lock(r, name, expire_ms=30000, retry=3):
    start = time.perf_counter()
    token = None
    for attempt in range(retry):
        token = acquire_lock(r, name, expire_ms)
        if token:
            break
        time.sleep(0.1 * (2 ** attempt))
    
    wait_seconds = time.perf_counter() - start
    lock_wait_histogram.labels(resource=name).observe(wait_seconds)
    
    if not token:
        lock_failure_counter.labels(resource=name).inc()
        raise TimeoutError(f"Lock acquisition failed: {name}")
    
    try:
        yield token
    finally:
        release_lock(r, name, token)

Grafana alert rule: if distributed_lock_wait_seconds{quantile="0.99"} exceeds 1 second, investigate lock contention — either the critical section is too slow, the lock granularity is too coarse, or there is a stuck lock holder.

Rate this chapter

4.8 / 5 (3 ratings)