listpack and ziplist: The Cascade Update Disaster and Fix
Chapter 6: listpack and ziplist โ Cascading Update Disaster and the Fix
6.1 ziplist: Complete Byte Layout
ziplist was Redis's compact encoding for small Hash, List, and ZSet objects. The design goal: store all data in one contiguous memory block to eliminate pointer overhead.
6.1.1 Overall Structure
+----------+----------+--------+---------+-----+---------+--------+
| zlbytes | zltail | zllen | entry1 | ... | entryN | zlend |
| 4 bytes | 4 bytes | 2 bytes| | | | 1 byte |
+----------+----------+--------+---------+-----+---------+--------+
| Field | Size | Description |
|---|---|---|
zlbytes |
4B | Total bytes of the entire ziplist (including zlend) |
zltail |
4B | Byte offset of the last entry from start (enables O(1) tail access) |
zllen |
2B | Number of entries. If > 65534, fixed at 65535; full traversal required |
entries |
variable | Actual data entries |
zlend |
1B | Fixed 0xFF sentinel marking end |
Access macros from ziplist.c:
#define ZIPLIST_BYTES(zl) (*((uint32_t*)(zl)))
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))
#define ZIPLIST_LENGTH(zl) (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))
#define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t))
#define ZIPLIST_END_SIZE (sizeof(uint8_t))
#define ZIPLIST_ENTRY_HEAD(zl) ((zl)+ZIPLIST_HEADER_SIZE)
#define ZIPLIST_ENTRY_TAIL(zl) ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))
6.1.2 Entry Internal Structure
Each entry follows this layout:
+-------------+----------+------+
| prevrawlen | encoding | data |
+-------------+----------+------+
prevrawlen โ Length of the previous entry in bytes:
If previous entry length < 254:
prevrawlen = 1 byte, stores the length directly
If previous entry length >= 254:
prevrawlen = 5 bytes:
byte[0] = 0xFE (254, the sentinel value)
byte[1..4] = actual length in little-endian uint32
This design allows backward traversal: to find the previous entry, read prevrawlen and subtract that many bytes from the current pointer.
encoding field โ distinguishes integers from strings and stores metadata:
String encodings:
00xxxxxx โ String of length 0..63 (6-bit length), 1-byte header
01xxxxxx xxxxxxxx โ String of length 0..16383 (14-bit), 2-byte header
10000000 [4B length] โ String of length 0..2^32-1, 5-byte header
Integer encodings:
11000000 โ int16_t, 2 bytes of data
11010000 โ int32_t, 4 bytes of data
11100000 โ int64_t, 8 bytes of data
11110000 โ int24, 3 bytes of data
11111110 โ int8_t, 1 byte of data
1111xxxx (0001-1101) โ immediate integer 0..12 stored in low nibble
A concrete entry storing integer 12345:
Memory bytes: [00] [C0] [39 30]
โ โ โโ 12345 in little-endian (int16)
โ โโ encoding = 0xC0 โ int16
โโ prevrawlen = 0 (first entry)
6.1.3 Full Memory Layout Example
Hash {"name": "Alice", "age": 30} stored as ziplist:
Offset Bytes Meaning
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x00 1A 00 00 00 zlbytes = 26
0x04 12 00 00 00 zltail = 18 (offset to "age" key entry)
0x08 04 00 zllen = 4 entries (name, Alice, age, 30)
0x0A 00 04 6E 61 6D 65 entry1: prevlen=0, enc=0x04(4-byte str), "name"
0x10 06 05 41 6C 69 63 65 entry2: prevlen=6, enc=0x05(5-byte str), "Alice"
0x17 07 03 61 67 65 entry3: prevlen=7, enc=0x03(3-byte str), "age"
0x1C 05 F1 1E 00 entry4: prevlen=5, enc=0xF1(int8? no,int16), 30
0x20 FF zlend
6.2 Cascading Update Disaster
6.2.1 Root Cause
The prevrawlen field is ziplist's most dangerous design element. Consider this arrangement:
[entry1: 253 bytes] โ [entry2: ...] โ [entry3: ...] โ ...
entry2's prevrawlen stores 253 using 1 byte (since 253 < 254). Now expand entry1 to 254 bytes:
entry1 grows: 253 โ 254 bytes
โ
entry2's prevrawlen must grow: 1 byte โ 5 bytes (needs 0xFE prefix)
โ
entry2's total size grows by 4 bytes
โ
entry3's prevrawlen must update (entry2 changed size)
โ
entry3's size grows by 4 bytes
โ
entry4 must update ... (cascade continues)
6.2.2 Worst Case Analysis
Scenario: 1000 entries, each exactly 253 bytes.
Modify entry1 to be 254 bytes.
Step 1: entry2.prevrawlen: 1B โ 5B (+4B), entry2 now = 257 bytes
Step 2: entry3.prevrawlen must store 257 (>253): 1B โ 5B (+4B), entry3 now = 257 bytes
Step 3: entry4.prevrawlen must store 257: 1B โ 5B (+4B) ...
...
Step N: once an entry's prevrawlen is already 5B, cascade stops.
If all entries start at exactly 253 bytes and grow by 4 bytes each time,
the cascade runs through ALL 1000 entries.
The implementation in ziplist.c:
unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl));
size_t rawlen, rawlensize, offset, noffset, extra;
unsigned char *np;
zlentry cur, next;
while (p[0] != ZIP_END) {
zipEntry(p, &cur);
rawlen = cur.headersize + cur.len;
rawlensize = zipStorePrevEntryLength(NULL, rawlen);
/* If next entry's prevrawlen already matches, stop. */
if (p[rawlen] == ZIP_END) break;
zipEntry(p + rawlen, &next);
if (next.prevrawlen == rawlen) break;
if (next.prevrawlensize < rawlensize) {
/* Need to expand prevrawlen from 1 byte to 5 bytes */
offset = p - zl;
extra = rawlensize - next.prevrawlensize;
zl = ziplistResize(zl, curlen + extra); /* realloc entire block! */
p = zl + offset;
np = p + rawlen;
noffset = np - zl;
if ((zl + intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) != np)
ziplistSetTailOffset(zl, intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)) + extra);
/* Shift data to make room */
memmove(np + rawlensize, np + next.prevrawlensize,
curlen - noffset - next.prevrawlensize - 1);
zipStorePrevEntryLength(np, rawlen);
p += rawlen;
curlen += extra;
} else {
/* Shrink or equal size โ can do in-place, then stop */
if (next.prevrawlensize > rawlensize) {
zipStorePrevEntryLengthLarge(p + rawlen, rawlen);
} else {
zipStorePrevEntryLength(p + rawlen, rawlen);
}
break;
}
}
return zl;
}
6.2.3 Real-World Performance Impact
Benchmark: 1000 entries ร 253 bytes each, trigger cascading update
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Without cascade: ~0.01ms per operation
With cascade: ~2.3ms (Redis single-threaded: 2.3ms stall)
P99 latency impact in production:
Normal: 0.3ms
During cascade: 3-8ms (10-25x spike)
Memory: up to 1000 realloc() calls on a shrinking data block
The cascade is also asymmetric: shrinking an entry never requires cascading (shrinking prevrawlen from 5B to 1B is done by in-place padding with a special marker, not actual shrinkage).
6.3 listpack: Eliminating Cascading Updates Entirely
6.3.1 Design Philosophy
Redis 5.0 introduced listpack. Redis 7.0 made it the universal replacement. The single conceptual change:
ziplist entry: [prevrawlen][encoding][data] depends on predecessor
listpack entry: [encoding][data][backlen] self-contained
backlen is placed at the end of each element and stores the total byte count of that element's encoding + data. This enables backward traversal without any cross-element dependency.
Why this eliminates cascading updates:
When you modify an element in listpack:
- The element itself changes size.
- Its own
backlenis updated to reflect the new size. - No other element is affected. Neighbors don't store any information about their neighbor's size.
6.3.2 listpack Overall Layout
+โโโโโโโโโโโโโ+โโโโโโโโโโโโโโโ+โโโโโโโโโโ+โโโโโ+โโโโโโโโโโ+โโโโโ+
โ total-bytes โ num-elements โ element1 โ ... โ elementN โ end โ
โ 4 bytes โ 2 bytes โ โ โ โ0xFF โ
+โโโโโโโโโโโโโ+โโโโโโโโโโโโโโโ+โโโโโโโโโโ+โโโโโ+โโโโโโโโโโ+โโโโโ+
Header is 6 bytes (vs ziplist's 10 bytes โ saves 4 bytes per structure).
6.3.3 Element Internal Layout
+โโโโโโโโโโโโโโโโโโโโโโ+โโโโโโโโโโโโโโ+โโโโโโโโโ+
โ encoding (1-9 bytes) โ data (0-4GB) โ backlen โ
+โโโโโโโโโโโโโโโโโโโโโโ+โโโโโโโโโโโโโโ+โโโโโโโโโ+
Encoding bit patterns (elegant reuse of high bits):
Pattern Range/Type Data bytes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0xxxxxxx 7-bit uint (0-127) 0 (value in encoding)
10xxxxxx 6-bit string length 0-63
110xxxxx xxxxxxxx 13-bit signed int 0 (value in encoding)
11110001 int16 2
11110010 int24 3
11110011 int32 4
11110100 int64 8
11110000 12-bit string length needs 1 extra byte (total 12 bits)
11111110 int8 1
1111xxxx 4-bit uint (0-15) 0 (value in low nibble)
11110001 (reused) 32-bit string length needs 4 extra bytes
backlen variable-length encoding (little-endian, MSB flag = "more bytes follow"):
Element total bytes โค 127: backlen = 1 byte
Element total bytes โค 16383: backlen = 2 bytes
Element total bytes โค 2097151: backlen = 3 bytes
...
This is a reversed LEB128 encoding where the continuation bit is in the most significant position of each byte.
6.3.4 Core listpack Source Code
/* lp.c โ Create empty listpack */
unsigned char *lpNew(size_t capacity) {
unsigned char *lp = lp_malloc(capacity > LP_HDR_SIZE+1 ?
capacity : LP_HDR_SIZE+1);
if (lp == NULL) return NULL;
lpSetTotalBytes(lp, LP_HDR_SIZE+1); /* 6 + 1 (end byte) */
lpSetNumElements(lp, 0);
lp[LP_HDR_SIZE] = LP_EOF; /* 0xFF */
return lp;
}
/* Decode backlen โ read backward from the byte before current position */
static inline uint64_t lpDecodeBacklen(unsigned char *p) {
uint64_t val = 0;
uint64_t shift = 0;
do {
val |= (uint64_t)(p[0] & 127) << shift;
if (!(p[0] & 128)) break; /* MSB=0: last byte */
shift += 7;
p--;
} while (shift < 28);
return val;
}
/* Forward traversal */
unsigned char *lpNext(unsigned char *lp, unsigned char *p) {
assert(p);
p = lpSkip(p); /* advance past encoding+data+backlen */
if (p[0] == LP_EOF) return NULL;
return p;
}
/* Backward traversal โ purely self-contained, no cross-element reads */
unsigned char *lpPrev(unsigned char *lp, unsigned char *p) {
if (p - lp == LP_HDR_SIZE) return NULL; /* already at first element */
p--; /* point to last byte of previous element's backlen */
uint64_t prevlen = lpDecodeBacklen(p);
prevlen += lpEncodeBacklen(NULL, prevlen); /* add backlen field size */
return p - prevlen + 1;
}
6.4 listpack vs ziplist Memory Comparison
6.4.1 100 "hello:world" Key-Value Pairs
Test data: 100 key-value pairs
key: "hello:XXXX" (10 bytes)
value: "world:XXXX" (10 bytes)
ziplist calculation:
Header: 4 + 4 + 2 = 10 bytes
Per entry: prevrawlen(1) + encoding(1, 6-bit str) + data(10) = 12 bytes
200 entries: 200 ร 12 = 2400 bytes
zlend: 1 byte
Total: 2411 bytes
listpack calculation:
Header: 4 + 2 = 6 bytes
Per element: encoding(1) + data(10) + backlen(1) = 12 bytes
200 elements: 200 ร 12 = 2400 bytes
end: 1 byte
Total: 2407 bytes
Memory difference: 4 bytes (listpack smaller due to compact header)
Primary advantage: elimination of cascading update risk, not raw memory savings
6.4.2 Integer-Heavy Workload
100 integers in range 0-127:
ziplist per entry:
prevrawlen(1) + encoding(1, 1111xxxx immediate) + data(0) = 2 bytes
Total: 200 bytes + 10 bytes header + 1 end = 211 bytes
listpack per element:
encoding(1, 0xxxxxxx 7-bit uint) + backlen(1) = 2 bytes
Total: 200 bytes + 6 bytes header + 1 end = 207 bytes
Savings: 4 bytes โ identical per-element cost, smaller header
6.5 Encoding Upgrade Thresholds
Redis 7.0+ uses listpack for small objects, upgrading to heavier structures when thresholds are exceeded:
Hash:
hash-max-listpack-entries 128 (field count)
hash-max-listpack-value 64 (max field/value bytes)
โ upgrades to: hashtable (dict with chaining)
ZSet:
zset-max-listpack-entries 128
zset-max-listpack-value 64
โ upgrades to: skiplist + hashtable (dual structure)
List:
list-max-listpack-size -2 (8KB per quicklist node)
โ upgrades to: quicklist (linked list of listpacks)
Upgrade is one-way: Redis never automatically downgrades from hashtable back to listpack, even if enough elements are deleted. To force a downgrade, you must recreate the key.
6.6 Traversal Algorithm Details
6.6.1 Forward Traversal
/* Calculate total bytes used by one element at p */
static inline unsigned long lpCurrentEncodedSizeBytes(unsigned char *p) {
if (LP_ENCODING_IS_7BIT_UINT(p[0])) return 1;
if (LP_ENCODING_IS_6BIT_STR(p[0])) return 1 + LP_ENCODING_6BIT_STR_LEN(p);
if (LP_ENCODING_IS_13BIT_INT(p[0])) return 2;
if (LP_ENCODING_IS_16BIT_INT(p[0])) return 3;
// ... etc.
}
unsigned char *lpSkip(unsigned char *p) {
unsigned long entrylen = lpCurrentEncodedSizeBytes(p); /* encoding+data */
entrylen += lpEncodeBacklen(NULL, entrylen); /* +backlen bytes */
p += entrylen;
return p;
}
6.6.2 Backward Traversal โ The Key Difference
ziplist backward: read prevrawlen at start of current entry
โ jump backward by prevrawlen bytes
Problem: this value was written by the PREVIOUS entry's modifier
listpack backward: read backlen at END of the element just before p
โ jump backward by (backlen + backlen_field_size) bytes
Each element fully describes ITSELF, no cross-element coupling
This single architectural change โ moving size metadata from "about my predecessor" to "about myself" โ eliminates the entire class of cascading update bugs.
6.7 RDB and AOF Format Changes in Redis 7.0
Redis 7.0 added new RDB type codes to distinguish listpack from ziplist:
/* rdb.h */
#define RDB_TYPE_HASH_LISTPACK 16 /* was: RDB_TYPE_HASH_ZIPLIST = 13 */
#define RDB_TYPE_ZSET_LISTPACK 17 /* was: RDB_TYPE_ZSET_ZIPLIST = 12 */
#define RDB_TYPE_LIST_LISTPACK 18 /* was: RDB_TYPE_LIST_QUICKLIST = 10 */
Loading an older RDB file with ziplist-encoded objects triggers automatic in-memory conversion:
/* rdb.c โ loading old ziplist-encoded hash */
case RDB_TYPE_HASH_ZIPLIST: {
unsigned char *encoded = rdbLoadRawString(rdb);
o = createHashObject();
/* Convert from ziplist to listpack on load */
o->ptr = ziplistToListpack(encoded);
o->encoding = OBJ_ENCODING_LISTPACK;
zfree(encoded);
break;
}
6.8 Summary
| Property | ziplist | listpack |
|---|---|---|
| Forward traversal | O(n) | O(n) |
| Backward traversal | O(n), reads prevrawlen | O(n), reads backlen |
| Cascading update risk | Yes โ worst case O(nยฒ) | None โ eliminated |
| Self-contained elements | No (depends on predecessor) | Yes |
| Header size | 10 bytes | 6 bytes |
| Introduced | Redis 2.x | Redis 5.0 (universal: 7.0) |
| Current status | Deprecated | Active, all small objects |
The design insight of listpack is deceptively simple: store information about yourself at the end of your own record, not information about your predecessor at the beginning of your record. This single change eliminates an entire category of worst-case behavior that existed in Redis for a decade.