Chapter 6

listpack and ziplist: The Cascade Update Disaster and Fix

Chapter 6: listpack and ziplist — Cascading Update Disaster and the Fix

6.1 ziplist: Complete Byte Layout

ziplist was Redis's compact encoding for small Hash, List, and ZSet objects. The design goal: store all data in one contiguous memory block to eliminate pointer overhead.

6.1.1 Overall Structure

+----------+----------+--------+---------+-----+---------+--------+
| zlbytes  |  zltail  | zllen  | entry1  | ... | entryN  | zlend  |
|  4 bytes |  4 bytes | 2 bytes|         |     |         | 1 byte |
+----------+----------+--------+---------+-----+---------+--------+
Field Size Description
zlbytes 4B Total bytes of the entire ziplist (including zlend)
zltail 4B Byte offset of the last entry from start (enables O(1) tail access)
zllen 2B Number of entries. If > 65534, fixed at 65535; full traversal required
entries variable Actual data entries
zlend 1B Fixed 0xFF sentinel marking end

Access macros from ziplist.c:

#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))
#define ZIPLIST_ENTRY_HEAD(zl)  ((zl)+ZIPLIST_HEADER_SIZE)
#define ZIPLIST_ENTRY_TAIL(zl)  ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

6.1.2 Entry Internal Structure

Each entry follows this layout:

+-------------+----------+------+
| prevrawlen  | encoding | data |
+-------------+----------+------+

prevrawlen — Length of the previous entry in bytes:

If previous entry length < 254:
  prevrawlen = 1 byte, stores the length directly

If previous entry length >= 254:
  prevrawlen = 5 bytes:
    byte[0] = 0xFE (254, the sentinel value)
    byte[1..4] = actual length in little-endian uint32

This design allows backward traversal: to find the previous entry, read prevrawlen and subtract that many bytes from the current pointer.

encoding field — distinguishes integers from strings and stores metadata:

String encodings:
  00xxxxxx             — String of length 0..63 (6-bit length), 1-byte header
  01xxxxxx xxxxxxxx    — String of length 0..16383 (14-bit), 2-byte header
  10000000 [4B length] — String of length 0..2^32-1, 5-byte header

Integer encodings:
  11000000             — int16_t,  2 bytes of data
  11010000             — int32_t,  4 bytes of data
  11100000             — int64_t,  8 bytes of data
  11110000             — int24,    3 bytes of data
  11111110             — int8_t,   1 byte of data
  1111xxxx (0001-1101) — immediate integer 0..12 stored in low nibble

A concrete entry storing integer 12345:

Memory bytes:  [00] [C0] [39 30]
               │    │    └─ 12345 in little-endian (int16)
               │    └─ encoding = 0xC0 → int16
               └─ prevrawlen = 0 (first entry)

6.1.3 Full Memory Layout Example

Hash {"name": "Alice", "age": 30} stored as ziplist:

Offset  Bytes                            Meaning
──────────────────────────────────────────────────────────
0x00    1A 00 00 00                      zlbytes = 26
0x04    12 00 00 00                      zltail  = 18 (offset to "age" key entry)
0x08    04 00                            zllen   = 4 entries (name, Alice, age, 30)
0x0A    00 04 6E 61 6D 65                entry1: prevlen=0, enc=0x04(4-byte str), "name"
0x10    06 05 41 6C 69 63 65             entry2: prevlen=6, enc=0x05(5-byte str), "Alice"
0x17    07 03 61 67 65                   entry3: prevlen=7, enc=0x03(3-byte str), "age"
0x1C    05 F1 1E 00                      entry4: prevlen=5, enc=0xF1(int8? no,int16), 30
0x20    FF                               zlend

6.2 Cascading Update Disaster

6.2.1 Root Cause

The prevrawlen field is ziplist's most dangerous design element. Consider this arrangement:

[entry1: 253 bytes] → [entry2: ...] → [entry3: ...] → ...

entry2's prevrawlen stores 253 using 1 byte (since 253 < 254). Now expand entry1 to 254 bytes:

entry1 grows: 253 → 254 bytes
  ↓
entry2's prevrawlen must grow: 1 byte → 5 bytes (needs 0xFE prefix)
  ↓
entry2's total size grows by 4 bytes
  ↓
entry3's prevrawlen must update (entry2 changed size)
  ↓
entry3's size grows by 4 bytes
  ↓
entry4 must update ... (cascade continues)

6.2.2 Worst Case Analysis

Scenario: 1000 entries, each exactly 253 bytes.
          Modify entry1 to be 254 bytes.

Step 1: entry2.prevrawlen: 1B → 5B  (+4B), entry2 now = 257 bytes
Step 2: entry3.prevrawlen must store 257 (>253): 1B → 5B (+4B), entry3 now = 257 bytes
Step 3: entry4.prevrawlen must store 257: 1B → 5B (+4B) ...
...
Step N: once an entry's prevrawlen is already 5B, cascade stops.

If all entries start at exactly 253 bytes and grow by 4 bytes each time,
the cascade runs through ALL 1000 entries.

The implementation in ziplist.c:

unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl));
    size_t rawlen, rawlensize, offset, noffset, extra;
    unsigned char *np;
    zlentry cur, next;

    while (p[0] != ZIP_END) {
        zipEntry(p, &cur);
        rawlen = cur.headersize + cur.len;
        rawlensize = zipStorePrevEntryLength(NULL, rawlen);

        /* If next entry's prevrawlen already matches, stop. */
        if (p[rawlen] == ZIP_END) break;
        zipEntry(p + rawlen, &next);
        if (next.prevrawlen == rawlen) break;

        if (next.prevrawlensize < rawlensize) {
            /* Need to expand prevrawlen from 1 byte to 5 bytes */
            offset = p - zl;
            extra = rawlensize - next.prevrawlensize;
            zl = ziplistResize(zl, curlen + extra);  /* realloc entire block! */
            p = zl + offset;

            np = p + rawlen;
            noffset = np - zl;
            if ((zl + intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) != np)
                ziplistSetTailOffset(zl, intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)) + extra);

            /* Shift data to make room */
            memmove(np + rawlensize, np + next.prevrawlensize,
                    curlen - noffset - next.prevrawlensize - 1);
            zipStorePrevEntryLength(np, rawlen);
            p += rawlen;
            curlen += extra;
        } else {
            /* Shrink or equal size — can do in-place, then stop */
            if (next.prevrawlensize > rawlensize) {
                zipStorePrevEntryLengthLarge(p + rawlen, rawlen);
            } else {
                zipStorePrevEntryLength(p + rawlen, rawlen);
            }
            break;
        }
    }
    return zl;
}

6.2.3 Real-World Performance Impact

Benchmark: 1000 entries × 253 bytes each, trigger cascading update
─────────────────────────────────────────────────────────────────
Without cascade:    ~0.01ms per operation
With cascade:       ~2.3ms (Redis single-threaded: 2.3ms stall)

P99 latency impact in production:
  Normal:           0.3ms
  During cascade:   3-8ms (10-25x spike)
  Memory: up to 1000 realloc() calls on a shrinking data block

The cascade is also asymmetric: shrinking an entry never requires cascading (shrinking prevrawlen from 5B to 1B is done by in-place padding with a special marker, not actual shrinkage).

6.3 listpack: Eliminating Cascading Updates Entirely

6.3.1 Design Philosophy

Redis 5.0 introduced listpack. Redis 7.0 made it the universal replacement. The single conceptual change:

ziplist entry:  [prevrawlen][encoding][data]      depends on predecessor
listpack entry: [encoding][data][backlen]         self-contained

backlen is placed at the end of each element and stores the total byte count of that element's encoding + data. This enables backward traversal without any cross-element dependency.

Why this eliminates cascading updates:

When you modify an element in listpack:

  1. The element itself changes size.
  2. Its own backlen is updated to reflect the new size.
  3. No other element is affected. Neighbors don't store any information about their neighbor's size.

6.3.2 listpack Overall Layout

+─────────────+───────────────+──────────+─────+──────────+─────+
│ total-bytes │ num-elements  │ element1 │ ... │ elementN │ end │
│   4 bytes   │   2 bytes     │          │     │          │0xFF │
+─────────────+───────────────+──────────+─────+──────────+─────+

Header is 6 bytes (vs ziplist's 10 bytes — saves 4 bytes per structure).

6.3.3 Element Internal Layout

+──────────────────────+──────────────+─────────+
│ encoding (1-9 bytes) │ data (0-4GB) │ backlen │
+──────────────────────+──────────────+─────────+

Encoding bit patterns (elegant reuse of high bits):

Pattern          Range/Type              Data bytes
─────────────────────────────────────────────────────
0xxxxxxx         7-bit uint (0-127)      0  (value in encoding)
10xxxxxx         6-bit string length     0-63
110xxxxx xxxxxxxx  13-bit signed int     0  (value in encoding)
11110001         int16                   2
11110010         int24                   3
11110011         int32                   4
11110100         int64                   8
11110000         12-bit string length    needs 1 extra byte (total 12 bits)
11111110         int8                    1
1111xxxx         4-bit uint (0-15)       0  (value in low nibble)
11110001 (reused)  32-bit string length  needs 4 extra bytes

backlen variable-length encoding (little-endian, MSB flag = "more bytes follow"):

Element total bytes ≤ 127:     backlen = 1 byte
Element total bytes ≤ 16383:   backlen = 2 bytes
Element total bytes ≤ 2097151: backlen = 3 bytes
...

This is a reversed LEB128 encoding where the continuation bit is in the most significant position of each byte.

6.3.4 Core listpack Source Code

/* lp.c — Create empty listpack */
unsigned char *lpNew(size_t capacity) {
    unsigned char *lp = lp_malloc(capacity > LP_HDR_SIZE+1 ?
                                   capacity : LP_HDR_SIZE+1);
    if (lp == NULL) return NULL;
    lpSetTotalBytes(lp, LP_HDR_SIZE+1);  /* 6 + 1 (end byte) */
    lpSetNumElements(lp, 0);
    lp[LP_HDR_SIZE] = LP_EOF;  /* 0xFF */
    return lp;
}

/* Decode backlen — read backward from the byte before current position */
static inline uint64_t lpDecodeBacklen(unsigned char *p) {
    uint64_t val = 0;
    uint64_t shift = 0;
    do {
        val |= (uint64_t)(p[0] & 127) << shift;
        if (!(p[0] & 128)) break;  /* MSB=0: last byte */
        shift += 7;
        p--;
    } while (shift < 28);
    return val;
}

/* Forward traversal */
unsigned char *lpNext(unsigned char *lp, unsigned char *p) {
    assert(p);
    p = lpSkip(p);           /* advance past encoding+data+backlen */
    if (p[0] == LP_EOF) return NULL;
    return p;
}

/* Backward traversal — purely self-contained, no cross-element reads */
unsigned char *lpPrev(unsigned char *lp, unsigned char *p) {
    if (p - lp == LP_HDR_SIZE) return NULL;  /* already at first element */
    p--;  /* point to last byte of previous element's backlen */
    uint64_t prevlen = lpDecodeBacklen(p);
    prevlen += lpEncodeBacklen(NULL, prevlen);  /* add backlen field size */
    return p - prevlen + 1;
}

6.4 listpack vs ziplist Memory Comparison

6.4.1 100 "hello:world" Key-Value Pairs

Test data: 100 key-value pairs
  key:   "hello:XXXX" (10 bytes)
  value: "world:XXXX" (10 bytes)

ziplist calculation:
  Header:      4 + 4 + 2 = 10 bytes
  Per entry:   prevrawlen(1) + encoding(1, 6-bit str) + data(10) = 12 bytes
  200 entries: 200 × 12 = 2400 bytes
  zlend:       1 byte
  Total:       2411 bytes

listpack calculation:
  Header:      4 + 2 = 6 bytes
  Per element: encoding(1) + data(10) + backlen(1) = 12 bytes
  200 elements: 200 × 12 = 2400 bytes
  end:         1 byte
  Total:       2407 bytes

Memory difference: 4 bytes (listpack smaller due to compact header)
Primary advantage: elimination of cascading update risk, not raw memory savings

6.4.2 Integer-Heavy Workload

100 integers in range 0-127:

ziplist per entry:
  prevrawlen(1) + encoding(1, 1111xxxx immediate) + data(0) = 2 bytes
  Total: 200 bytes + 10 bytes header + 1 end = 211 bytes

listpack per element:
  encoding(1, 0xxxxxxx 7-bit uint) + backlen(1) = 2 bytes
  Total: 200 bytes + 6 bytes header + 1 end = 207 bytes

Savings: 4 bytes — identical per-element cost, smaller header

6.5 Encoding Upgrade Thresholds

Redis 7.0+ uses listpack for small objects, upgrading to heavier structures when thresholds are exceeded:

Hash:
  hash-max-listpack-entries  128   (field count)
  hash-max-listpack-value    64    (max field/value bytes)
  → upgrades to: hashtable (dict with chaining)

ZSet:
  zset-max-listpack-entries  128
  zset-max-listpack-value    64
  → upgrades to: skiplist + hashtable (dual structure)

List:
  list-max-listpack-size     -2    (8KB per quicklist node)
  → upgrades to: quicklist (linked list of listpacks)

Upgrade is one-way: Redis never automatically downgrades from hashtable back to listpack, even if enough elements are deleted. To force a downgrade, you must recreate the key.

6.6 Traversal Algorithm Details

6.6.1 Forward Traversal

/* Calculate total bytes used by one element at p */
static inline unsigned long lpCurrentEncodedSizeBytes(unsigned char *p) {
    if (LP_ENCODING_IS_7BIT_UINT(p[0])) return 1;
    if (LP_ENCODING_IS_6BIT_STR(p[0])) return 1 + LP_ENCODING_6BIT_STR_LEN(p);
    if (LP_ENCODING_IS_13BIT_INT(p[0])) return 2;
    if (LP_ENCODING_IS_16BIT_INT(p[0])) return 3;
    // ... etc.
}

unsigned char *lpSkip(unsigned char *p) {
    unsigned long entrylen = lpCurrentEncodedSizeBytes(p);   /* encoding+data */
    entrylen += lpEncodeBacklen(NULL, entrylen);              /* +backlen bytes */
    p += entrylen;
    return p;
}

6.6.2 Backward Traversal — The Key Difference

ziplist backward:  read prevrawlen at start of current entry
                   → jump backward by prevrawlen bytes
                   Problem: this value was written by the PREVIOUS entry's modifier

listpack backward: read backlen at END of the element just before p
                   → jump backward by (backlen + backlen_field_size) bytes
                   Each element fully describes ITSELF, no cross-element coupling

This single architectural change — moving size metadata from "about my predecessor" to "about myself" — eliminates the entire class of cascading update bugs.

6.7 RDB and AOF Format Changes in Redis 7.0

Redis 7.0 added new RDB type codes to distinguish listpack from ziplist:

/* rdb.h */
#define RDB_TYPE_HASH_LISTPACK    16   /* was: RDB_TYPE_HASH_ZIPLIST = 13 */
#define RDB_TYPE_ZSET_LISTPACK    17   /* was: RDB_TYPE_ZSET_ZIPLIST = 12 */
#define RDB_TYPE_LIST_LISTPACK    18   /* was: RDB_TYPE_LIST_QUICKLIST = 10 */

Loading an older RDB file with ziplist-encoded objects triggers automatic in-memory conversion:

/* rdb.c — loading old ziplist-encoded hash */
case RDB_TYPE_HASH_ZIPLIST: {
    unsigned char *encoded = rdbLoadRawString(rdb);
    o = createHashObject();
    /* Convert from ziplist to listpack on load */
    o->ptr = ziplistToListpack(encoded);
    o->encoding = OBJ_ENCODING_LISTPACK;
    zfree(encoded);
    break;
}

6.8 Summary

Property ziplist listpack
Forward traversal O(n) O(n)
Backward traversal O(n), reads prevrawlen O(n), reads backlen
Cascading update risk Yes — worst case O(n²) None — eliminated
Self-contained elements No (depends on predecessor) Yes
Header size 10 bytes 6 bytes
Introduced Redis 2.x Redis 5.0 (universal: 7.0)
Current status Deprecated Active, all small objects

The design insight of listpack is deceptively simple: store information about yourself at the end of your own record, not information about your predecessor at the beginning of your record. This single change eliminates an entire category of worst-case behavior that existed in Redis for a decade.

Rate this chapter
4.7  / 5  (64 ratings)

💬 Comments