Chapter 5

SDS Dynamic String: Why Not C String

Chapter 5: SDS โ€” Simple Dynamic String: Why Not C Strings?

5.1 The Five Fundamental Flaws of C Strings

Before examining SDS, we must understand precisely why C strings fail as a foundation for a high-performance data store. C strings are null-terminated char arrays โ€” a design that made sense in 1970s Unix, but introduces five structural problems for any system that processes untrusted data at high throughput.

5.1.1 O(n) Length Retrieval

The C standard library's strlen() scans every byte until it finds a null terminator:

/* Standard strlen implementation */
size_t strlen(const char *s) {
    const char *p = s;
    while (*p != '\0') p++;
    return p - s;
}

For a 10 MB string, this requires 10 million memory reads. If strlen() is called in a loop โ€” during string concatenation, pattern matching, or log formatting โ€” the cost accumulates rapidly. On modern CPUs with L1 cache lines of 64 bytes, scanning a cold string causes a cache miss every 64 bytes, making this particularly expensive for large strings.

SDS solution: The len field is stored directly in the header. sdslen() is a single memory read โ€” O(1) regardless of string size:

/* src/sds.h */
static inline size_t sdslen(const sds s) {
    unsigned char flags = s[-1];  /* flags is at offset -1 from buf */
    switch (flags & SDS_TYPE_MASK) {
    case SDS_TYPE_5:
        return SDS_TYPE_5_LEN(flags);  /* high 5 bits of flags = length */
    case SDS_TYPE_8:
        return SDS_HDR(8, s)->len;     /* direct field read */
    case SDS_TYPE_16:
        return SDS_HDR(16, s)->len;
    case SDS_TYPE_32:
        return SDS_HDR(32, s)->len;
    case SDS_TYPE_64:
        return SDS_HDR(64, s)->len;
    }
    return 0;
}

5.1.2 Buffer Overflow

Standard C string functions do not check destination buffer capacity:

char s1[6] = "Hello";    /* 6 bytes: 5 chars + '\0' */
char s2[] = " World";    /* 7 bytes */
strcat(s1, s2);          /* UNDEFINED BEHAVIOR: writes 13 bytes into 6-byte buffer */
/* This overwrites adjacent stack/heap memory */
/* CVE-2015-3416, CVE-2018-16395, and hundreds of others trace back to this */

SDS solution: Every write operation calls sdsMakeRoomFor() before modifying the buffer, guaranteeing sufficient capacity:

/* src/sds.c */
sds sdsMakeRoomFor(sds s, size_t addlen) {
    size_t avail = sdsavail(s);    /* avail = alloc - len */
    if (avail >= addlen) return s; /* Already enough space, no-op */

    size_t len = sdslen(s);
    size_t newlen = len + addlen;

    /* Space preallocation policy */
    if (newlen < SDS_MAX_PREALLOC)    /* SDS_MAX_PREALLOC = 1MB (1 << 20) */
        newlen *= 2;                   /* Double for small strings */
    else
        newlen += SDS_MAX_PREALLOC;   /* Add 1MB for large strings */

    /* Type may change (e.g., sdshdr8 โ†’ sdshdr16 if newlen > 255) */
    return _sdsMakeRoomFor(s, addlen, 1);
}

The buffer is always valid before any write; the SDS contract guarantees len โ‰ค alloc at all times.

5.1.3 Frequent Reallocation

Appending to a C string requires realloc on every operation:

/* Naive C string append (must realloc every time) */
char *s = strdup("hello");
for (int i = 0; i < 1000; i++) {
    size_t len = strlen(s);
    s = realloc(s, len + 4 + 1);  /* +4 for " foo", +1 for '\0' */
    strcat(s, " foo");
}
/* 1000 realloc calls = 1000 potential memory copies */

realloc in the worst case: allocates new memory, copies the old contents, frees the old allocation. For a string growing from 1KB to 2KB, that is 1KB of data copied per realloc. For a 1MB string: 1MB copied per append.

SDS solution: Space preallocation (covered in 5.4) combined with lazy space release (5.5) dramatically reduces realloc frequency.

5.1.4 Cannot Store Binary Data Containing Null Bytes

The null terminator convention makes C strings fundamentally incompatible with binary data:

/* Attempt to store a JPEG file header */
char data[] = "\xff\xd8\xff\xe0\x00\x10JFIF";
strlen(data);   /* Returns 4! Stops at the '\xff' that happens to be... */
                /* Actually stops at the first 0x00 byte */

/* 0x00 at offset 4 terminates the string โ€” the "JFIF" marker is invisible */
memcpy(dest, data, strlen(data));   /* Copies only 4 bytes of the 10-byte header */

Any protocol that uses 0x00 bytes as part of its framing โ€” gzip, PNG, TLS records, MessagePack โ€” cannot be safely stored in a C string.

SDS solution: The len field records the actual byte count; buf[] is treated as an arbitrary byte sequence:

# Store a raw PNG header (contains 0x00 bytes)
redis-cli SET imgmeta $'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a\x00\x00\x00\x0d'
redis-cli STRLEN imgmeta   # โ†’ 12 (correct: counts all bytes including 0x00)

# Binary-safe GETRANGE
redis-cli GETRANGE imgmeta 0 3   # โ†’ \x89PNG (first 4 bytes, correct)

5.1.5 Unsafe Standard Library Functions

Functions like gets(), scanf("%s", buf), sprintf() without width limits, and strcpy() are the root cause of decades of security vulnerabilities. Even "safer" alternatives (snprintf, strncat) require the programmer to manually calculate remaining buffer space on every call โ€” and mistakes are common.

SDS eliminates all use of these functions in the hot path. All string operations go through the SDS API, which handles bounds checking internally.

5.2 The Five SDS Header Structures

SDS selects the smallest header that can represent the string's length, minimizing per-string overhead:

/* src/sds.h โ€” all headers use __attribute__((packed)) */

/* sdshdr5: length stored in high 5 bits of flags; max string: 31 bytes */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags;   /* bits [7:3] = length (0-31); bits [2:0] = SDS_TYPE_5 */
    char buf[];            /* no len or alloc fields โ€” cannot grow without type change */
};

/* sdshdr8: max string length = 255 bytes (uint8_t) */
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len;           /* Used bytes in buf (not counting '\0') */
    uint8_t alloc;         /* Total allocated bytes in buf (not counting '\0') */
    unsigned char flags;   /* bits [2:0] = SDS_TYPE_8 (0x01) */
    char buf[];            /* String content + '\0' */
};

/* sdshdr16: max string length = 65,535 bytes (64 KB) */
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len;
    uint16_t alloc;
    unsigned char flags;   /* SDS_TYPE_16 (0x02) */
    char buf[];
};

/* sdshdr32: max string length = 4,294,967,295 bytes (~4 GB) */
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len;
    uint32_t alloc;
    unsigned char flags;   /* SDS_TYPE_32 (0x03) */
    char buf[];
};

/* sdshdr64: max string length = 2^64 - 1 bytes (theoretical; RAM is the real limit) */
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len;
    uint64_t alloc;
    unsigned char flags;   /* SDS_TYPE_64 (0x04) */
    char buf[];
};

5.2.1 __attribute__((packed)): Eliminating Alignment Padding

Without packed, the C compiler aligns struct fields to their natural alignment boundary:

/* Without packed: sdshdr8 layout on a 64-bit compiler */
struct sdshdr8_unaligned {
    uint8_t len;    /* offset 0 (1 byte) */
    /* 1 byte padding to align alloc to 2-byte boundary */
    uint8_t alloc;  /* offset 2 (1 byte) */
    /* 1 byte padding */
    unsigned char flags; /* offset 4 (1 byte) */
    /* padding to next alignment boundary */
    char buf[];     /* offset varies */
};
/* sizeof: potentially 6+ bytes */

/* With packed: sdshdr8 actual layout */
struct sdshdr8_packed {
    uint8_t len;         /* offset 0 */
    uint8_t alloc;       /* offset 1 */
    unsigned char flags; /* offset 2 */
    char buf[];          /* offset 3 */
};
/* sizeof: 3 bytes (zero waste) */

For sdshdr8, packed saves 3 bytes per header. At 100 million short strings, this is 300 MB of RAM savings.

Performance trade-off: Unaligned memory access is slower on some architectures (particularly older ARM and MIPS). On modern x86/x86_64 with hardware alignment handling, the penalty is negligible (the CPU silently issues a second fetch for crossing a cache line boundary). Redis's primary deployment target is x86_64 Linux servers, making this trade-off acceptable.

5.2.2 SDS Type Selection Logic

/* src/sds.c */
static inline char sdsReqType(size_t string_size) {
    if (string_size < 1<<5)          return SDS_TYPE_5;   /* < 32 bytes */
    if (string_size < 1<<8)          return SDS_TYPE_8;   /* < 256 bytes */
    if (string_size < 1<<16)         return SDS_TYPE_16;  /* < 65,536 bytes */
#if (LONG_MAX == LLONG_MAX)
    if (string_size < 1ll<<32)       return SDS_TYPE_32;  /* < 4 GB (64-bit OS) */
#endif
    return SDS_TYPE_64;
}

Header sizes by type:

SDS Type Header Size Max String Length Notes
sdshdr5 1 byte 31 bytes No alloc field; rarely used for values
sdshdr8 3 bytes 255 bytes Used by embstr for strings โ‰ค 44 bytes
sdshdr16 5 bytes 65,535 bytes (64 KB)
sdshdr32 9 bytes 4,294,967,295 bytes (4 GB)
sdshdr64 17 bytes Effectively unlimited RAM is the real constraint

Practical note: sdshdr5 is used in Redis primarily for key names (which are typically short). It has no alloc field, making it unsuitable for values that might grow. sdshdr8 is the workhorse for most short values.

5.2.3 The flags Field: Dual Purpose

sdshdr5 flags byte:
  Bit 7  Bit 6  Bit 5  Bit 4  Bit 3  |  Bit 2  Bit 1  Bit 0
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
        String length (5 bits)        |    SDS type (3 bits)
        0 to 31                       |    000 = SDS_TYPE_5

Other sdshdr* flags byte:
  Bit 7  Bit 6  Bit 5  Bit 4  Bit 3  |  Bit 2  Bit 1  Bit 0
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
              Reserved (0)            |    SDS type (3 bits)
                                      |    001 = SDS_TYPE_8
                                      |    010 = SDS_TYPE_16
                                      |    011 = SDS_TYPE_32
                                      |    100 = SDS_TYPE_64

The SDS pointer (of type sds, which is typedef char *) points to buf[0]. The flags byte is always at buf[-1]. This placement allows the SDS type to be determined from any sds pointer with a single byte read:

/* Determine SDS type from any sds pointer */
#define SDS_TYPE_MASK 7
unsigned char flags = s[-1];
int type = flags & SDS_TYPE_MASK;   /* Mask off upper bits */

/* Access the header of the correct type */
#define SDS_HDR(T, s)  ((struct sdshdr##T *)((s) - (sizeof(struct sdshdr##T))))

/* Example: get len for sdshdr8 */
uint8_t len = SDS_HDR(8, s)->len;   /* s - 3 bytes = start of sdshdr8 */

5.3 Complete Memory Layout Diagram

Storing the string "Redis" (5 bytes) as an sdshdr8:

Memory address    Hex     Meaning
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
ptr - 3           0x05    sdshdr8.len   = 5  (string length)
ptr - 2           0x05    sdshdr8.alloc = 5  (allocated capacity)
ptr - 1           0x01    sdshdr8.flags: bits[2:0] = 001 = SDS_TYPE_8
ptr + 0           0x52    buf[0] = 'R'
ptr + 1           0x65    buf[1] = 'e'
ptr + 2           0x64    buf[2] = 'd'
ptr + 3           0x69    buf[3] = 'i'
ptr + 4           0x73    buf[4] = 's'
ptr + 5           0x00    buf[5] = '\0'  (NUL terminator, not in len)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Hex dump of the entire sdshdr8 structure:
Offset: 00 01 02 03 04 05 06 07 08
Value:  05 05 01 52 65 64 69 73 00
        โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€
         hdr   flg   "Redis"     \0

The sds variable (the pointer returned by sdsnew()) points to offset 3 (the start of buf). This makes SDS pass-by-pointer compatible with C string functions โ€” printf("%s", sds_ptr) works correctly because buf is null-terminated.

Accessing the header from buf:

sds s = sdsnew("Redis");

/* s points to buf[0] */
printf("%s\n", s);             /* Works: C string semantics */
printf("%d\n", sdslen(s));     /* 5: O(1) length */

/* Navigate to header via negative offset */
struct sdshdr8 *hdr = (struct sdshdr8 *)(s - 3);
/* or equivalently: */
struct sdshdr8 *hdr = SDS_HDR(8, s);

printf("len=%d alloc=%d\n", hdr->len, hdr->alloc);  /* len=5 alloc=5 */

5.4 Space Preallocation

When SDS needs to grow (e.g., APPEND command), it allocates more than the minimum required โ€” called space preallocation:

/* src/sds.c โ€” simplified preallocation logic */
sds sdsMakeRoomFor(sds s, size_t addlen) {
    size_t avail = sdsavail(s);
    if (avail >= addlen) return s;   /* Already sufficient */

    size_t len = sdslen(s);
    size_t newlen = len + addlen;

    if (newlen < SDS_MAX_PREALLOC) {
        newlen *= 2;                  /* Double: less than 1 MB */
    } else {
        newlen += SDS_MAX_PREALLOC;   /* Add 1 MB: 1 MB or larger */
    }
    /* SDS_MAX_PREALLOC = 1 << 20 = 1,048,576 bytes */

    char type = sdsReqType(newlen);
    /* ... realloc, possibly upgrade type (e.g., sdshdr8 โ†’ sdshdr16) ... */
    return newsh->buf;
}

Effect on a growing string (100-byte segments appended repeatedly):

State len alloc Realloc?
Initial 0 0 โ€”
After 1st APPEND (100 bytes) 100 200 Yes
After 2nd APPEND (100 bytes) 200 200 No (avail = 100)
After 3rd APPEND (100 bytes) 300 600 Yes
After 4th APPEND (100 bytes) 400 600 No (avail = 200)
After 5th APPEND (100 bytes) 500 600 No (avail = 100)
After 6th APPEND (100 bytes) 600 600 No (avail = 0)
After 7th APPEND (100 bytes) 700 1400 Yes

Without preallocation: 7 reallocs for 7 appends. With preallocation: 3 reallocs. The savings ratio improves as the string grows and doubling allocates larger chunks.

The 1 MB cap rationale: Doubling a 500 MB string would allocate 1 GB of additional memory โ€” completely unreasonable. The cap at 1 MB means large strings grow in 1 MB increments, which is predictable and bounded. In practice, any value approaching 1 MB in Redis represents a design smell and should be refactored.

Practical verification:

SET mystring ""
DEBUG OBJECT mystring
# serializedlength:0 encoding:embstr

# Append enough to trigger RAW encoding and watch alloc grow
for i in {1..5}; do redis-cli APPEND mystring "$(python3 -c "print('x'*100, end='')") "; done

DEBUG OBJECT mystring
# You can observe the encoding change and length growth
STRLEN mystring    # โ†’ 505 (5 * 101 bytes)

5.5 Lazy Space Release

When a string is shortened โ€” by SETRANGE, by a Lua script overwriting with a shorter value, or by explicit SDSRANGE โ€” SDS does not immediately release the freed space:

/* src/sds.c */
void sdssetlen(sds s, size_t newlen) {
    unsigned char flags = s[-1];
    switch(flags & SDS_TYPE_MASK) {
    case SDS_TYPE_8:
        SDS_HDR(8, s)->len = newlen;     /* Reduce len */
        /* alloc is NOT changed โ€” free space is retained */
        break;
    case SDS_TYPE_16:
        SDS_HDR(16, s)->len = newlen;
        break;
    /* ... */
    }
    s[newlen] = '\0';    /* Place NUL at new end position */
}

After sdssetlen(s, 3) on a string with len=10, alloc=20:

Benefit: If the string grows again (APPEND within the same size range), the existing allocation is reused immediately. For strings that oscillate in size (common in protocol parsing buffers), lazy release prevents realloc oscillation.

Explicit release: When Redis needs to reclaim memory (e.g., before saving an RDB snapshot, or when resizing internal buffers), it calls sdsRemoveFreeSpace():

/* src/sds.c */
sds sdsRemoveFreeSpace(sds s, int would_regrow) {
    void *sh;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen, oldhdrlen = sdsHdrSize(oldtype);
    size_t len = sdslen(s);
    size_t avail = sdsavail(s);

    if (avail == 0) return s;    /* Nothing to release */

    /* Possibly downgrade type: if len now fits in a smaller header */
    type = sdsReqType(len);
    hdrlen = sdsHdrSize(type);

    /* Realloc to exact size: hdrlen + len + 1 (NUL) */
    /* ... */
}

5.6 Binary Safety

The len field makes SDS binary-safe by construction. The string is delimited by len, not by the first \0 byte:

/* Binary-safe string comparison */
int sdscmp(const sds s1, const sds s2) {
    size_t l1 = sdslen(s1);
    size_t l2 = sdslen(s2);
    size_t minlen = (l1 < l2) ? l1 : l2;
    
    int cmp = memcmp(s1, s2, minlen);   /* memcmp, not strcmp */
    /* strcmp stops at '\0'; memcmp compares all minlen bytes */
    
    if (cmp == 0) return (l1 > l2) ? 1 : (l1 < l2) ? -1 : 0;
    return cmp;
}

memcmp compares exactly minlen bytes regardless of \0 bytes โ€” the core of binary-safe comparison. strcmp would give incorrect results for strings containing embedded nulls.

Practical binary storage:

# Store a TLS certificate (binary DER format)
# In production, use base64 or store as a file reference โ€” but SDS can handle raw binary
redis-cli SET cert "$(cat certificate.der | base64 -d)" 

# Store arbitrary bytes including 0x00
redis-cli SET sensor:reading $'\x02\xFF\x00\x1A\x3B\x00\xC4'
redis-cli STRLEN sensor:reading   # โ†’ 7 (counts all bytes)
redis-cli GETRANGE sensor:reading 2 3  # โ†’ \x00\x1A (bytes at offset 2-3)

5.7 SDS and C String Interoperability

SDS maintains a '\0' byte at buf[len] (not counted in len) specifically to allow SDS strings to be passed to C library functions:

/* sds is typedef'd as char* */
typedef char *sds;

sds s = sdsnew("Hello, World");

/* These all work correctly for non-binary SDS strings: */
printf("%s\n", s);         /* Prints via NUL terminator */
puts(s);                   /* Prints via NUL terminator */
fprintf(log_file, s);      /* Writes via NUL terminator */
fwrite(s, 1, sdslen(s), f);/* Binary-safe: uses len explicitly */

/* Correct internal Redis usage (always uses sdslen): */
size_t n = sdslen(s);      /* O(1) */

The compatibility guarantee: Since buf always has '\0' at position len, any C function that treats SDS as a C string will get correct behavior for strings that do not contain embedded nulls. For binary data, always use sdslen() and memcpy()/fwrite() rather than C string functions.

5.8 The embstr / raw Split: Deep Implementation Analysis

5.8.1 Why 44 Bytes Exactly?

The 44-byte threshold for embstr is derived from three constants:

jemalloc size class that encompasses the combined structure: 64 bytes

Budget allocation:
  sizeof(robj)        = 16 bytes  (type:4 + encoding:4 + lru:24 + refcount + ptr)
  sizeof(sdshdr8)     =  3 bytes  (len + alloc + flags, each 1 byte, packed)
  NUL terminator      =  1 byte
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Total overhead      = 20 bytes
  Remaining for data  = 64 - 20 = 44 bytes

The 64-byte jemalloc size class is the critical constant. jemalloc's small size classes go: 8, 16, 32, 48, 64, 80, 96... bytes. A 44-byte string requires 20 + 44 + 1 = 65 bytes, which rounds up to jemalloc's 80-byte class โ€” but Redis sets the threshold at 44, not 43, because the 64-byte allocation (for strings โ‰ค 44 bytes) fits exactly in one CPU cache line.

Verification:

/* src/object.c */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44

robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr, len);
    else
        return createRawStringObject(ptr, len);
}
# Verify the boundary behavior
python3 -c "import subprocess; [subprocess.run(['redis-cli', 'SET', f'k{n}', 'x'*n]) for n in [43,44,45]]"
redis-cli OBJECT ENCODING k43   # โ†’ embstr
redis-cli OBJECT ENCODING k44   # โ†’ embstr
redis-cli OBJECT ENCODING k45   # โ†’ raw

5.8.2 embstr Allocation: One malloc Call

/* src/object.c */
robj *createEmbeddedStringObject(const char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj) + sizeof(struct sdshdr8) + len + 1);
    /*           โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
                 16 bytes       3 bytes         len bytes   1 byte(\0)
                 robj           sdshdr8 header  content     NUL
    */
    
    struct sdshdr8 *sh = (void *)(o + 1);  /* sdshdr8 starts right after robj */

    o->type     = OBJ_STRING;
    o->encoding = OBJ_ENCODING_EMBSTR;
    o->ptr      = sh + 1;      /* ptr points to sh->buf (the string content) */
    o->refcount = 1;
    o->lru      = LRU_CLOCK();

    sh->len     = len;
    sh->alloc   = len;         /* alloc == len: embstr never has extra space */
    sh->flags   = SDS_TYPE_8;

    if (ptr) {
        memcpy(sh->buf, ptr, len);
    }
    sh->buf[len] = '\0';

    return o;
}

Memory layout for "hello" (5 bytes) as embstr:

Single allocation of 25 bytes, rounded up to 32 by jemalloc:

Byte  0: type:4=0(STRING), encoding:4=8(EMBSTR), lru:24=...  โŽซ robj
Byte  4: refcount = 1                                          โŽช (16 bytes)
Byte  8: ptr โ†’ &byte[19]                                       โŽญ
Byte 16: len = 5                                               โŽซ sdshdr8
Byte 17: alloc = 5                                             โŽช (3 bytes)
Byte 18: flags = 0x01 (SDS_TYPE_8)                            โŽญ
Byte 19: 'h'                                                   โŽซ buf[]
Byte 20: 'e'                                                   โŽช (6 bytes)
Byte 21: 'l'                                                   โŽช
Byte 22: 'l'                                                   โŽช
Byte 23: 'o'                                                   โŽช
Byte 24: '\0'                                                  โŽญ
(Bytes 25-31: jemalloc internal metadata / padding)

Both the robj and sdshdr8 fit within a single 64-byte cache line. Accessing o->ptr to get the string content, and sdslen(o->ptr) to get its length, both resolve to data in the same cache line โ€” zero additional cache misses after the initial load.

5.8.3 The embstr โ†’ raw Conversion Path

embstr strings are effectively immutable. Redis converts them to raw on any modification:

/* src/t_string.c โ€” APPEND command */
void appendCommand(client *c) {
    long long totlen;
    robj *append;
    robj *o = lookupKeyWrite(c->db, c->argv[1]);
    
    if (o == NULL) {
        o = createRawStringObject(c->argv[2]->ptr, sdslen(c->argv[2]->ptr));
        dbAdd(c->db, c->argv[1], o);
    } else {
        if (checkStringLength(c, curlen, applen) != C_OK) return;
        
        /* This call converts embstr or INT to raw before modifying */
        o = dbUnshareStringValue(c->db, c->argv[1], o);
        
        /* Now safe to modify: o->encoding is RAW */
        o->ptr = sdscatlen(o->ptr, append, applen);
    }
    // ...
}

/* src/db.c */
robj *dbUnshareStringValue(redisDb *db, robj *key, robj *o) {
    if (o->refcount != 1 || o->encoding == OBJ_ENCODING_EMBSTR) {
        robj *decoded = getDecodedObject(o);
        /* getDecodedObject converts INT โ†’ string, EMBSTR โ†’ RAW */
        o = createRawStringObject(decoded->ptr, sdslen(decoded->ptr));
        decrRefCount(decoded);
        dbOverwrite(db, key, o);
    }
    return o;
}
SET mykey "hello"
OBJECT ENCODING mykey   # โ†’ embstr (5 bytes โ‰ค 44)

APPEND mykey " world"   # dbUnshareStringValue called โ†’ converts to raw
OBJECT ENCODING mykey   # โ†’ raw
# "hello world" is 11 bytes โ€” still < 44, but encoding stays raw
# Once converted to raw, never reverts to embstr

STRLEN mykey            # โ†’ 11

This is by design: dbUnshareStringValue converts embstr to raw once and for all. Tracking whether the modified string would fit back in embstr would add complexity for marginal benefit โ€” the conversion cost (one malloc) is a one-time operation.

5.9 SDS API Reference

Function Purpose Complexity
sdsnew(s) Create SDS from C string O(n)
sdsnewlen(s, len) Create SDS from byte array + explicit length O(n)
sdsempty() Create empty SDS O(1)
sdslen(s) Get string length O(1)
sdsavail(s) Get unused allocated space O(1)
sdsdup(s) Duplicate SDS O(n)
sdsfree(s) Free SDS (includes header) O(1)
sdscpy(s, t) Copy C string into SDS (resizes if needed) O(n)
sdscat(s, t) Append C string (resizes if needed) O(n)
sdscatsds(s, t) Append SDS to SDS O(n)
sdscatlen(s, t, len) Append binary data of explicit length O(n)
sdscmp(s1, s2) Binary-safe comparison O(n)
sdsrange(s, start, end) In-place substring (like Python slice) O(n)
sdstrim(s, cset) Strip leading/trailing chars from set O(mร—n)
sdsMakeRoomFor(s, n) Ensure n bytes of additional capacity O(n) if realloc
sdsRemoveFreeSpace(s, r) Release unused allocated space O(n) realloc
sdsIncrLen(s, n) Advance len by n (after manual write into buf) O(1)

5.10 SDS Usage Throughout Redis Internals

SDS is not just for user-facing string values. It permeates Redis's internal implementation:

Context Usage
Key names Every key in the main dict is an SDS
Hash fields and values SDS in HT encoding; embedded in listpack in listpack encoding
ZSet members SDS in skiplist nodes
Set members SDS in HT encoding (non-integer members)
client.querybuf Incoming network data buffer (SDS)
client.buf Outgoing response buffer (fixed array, not SDS)
server.aof_buf AOF write buffer โ€” grows as commands are buffered
server.aof_rewrite_buf_blocks AOF rewrite background buffer
Lua script responses Intermediate string results

The AOF buffer deserves special attention:

/* src/aof.c โ€” command appended to AOF buffer after execution */
void feedAppendOnlyFile(struct redisCommand *cmd, int dictid, robj **argv, int argc) {
    sds buf = sdsempty();
    // ... format command into RESP protocol ...
    buf = sdscatlen(buf, cmd_str, cmd_len);
    
    server.aof_buf = sdscatlen(server.aof_buf, buf, sdslen(buf));
    /* server.aof_buf grows with each command; flushed to disk periodically */
}

The AOF buffer can grow to many megabytes between flushes. SDS's space preallocation reduces the realloc frequency even for this internal buffer โ€” the same mechanism that benefits user-facing APPEND operations.

5.11.1 Repeated APPEND to Large Strings

# Anti-pattern: building a large log in a single key
for i in {1..100000}; do
    redis-cli APPEND log:daily "$(date): event $i\n"
done
# When log:daily reaches 10 MB, each APPEND involves:
# 1. Lock the string
# 2. sdsMakeRoomFor: realloc (copies 10 MB of data)
# 3. Append new content
# 4. Total: O(N) per append on large string

Solution: Shard the log across multiple keys (by time bucket, sequence number, or hash), or use a Redis List/Stream, which grows via linked structure addition (O(1) append):

# Better: use a List (O(1) LPUSH/RPUSH)
RPUSH log:daily "$(date): event $i"
# Or Stream for time-series with consumer groups
XADD log:stream * msg "event $i" ts "$(date +%s)"

5.11.2 Unintentional embstr โ†’ raw Conversion

SET counter 0          # INT encoding (shared integer)
INCR counter           # โ†’ 1 (still INT)
OBJECT ENCODING counter  # โ†’ int

APPEND counter ""      # Empty append still triggers INT โ†’ embstr โ†’ raw conversion!
OBJECT ENCODING counter  # โ†’ raw

# From this point, counter is no longer INT-encoded
# Memory: 16 bytes (raw robj) + 4 bytes (sdshdr8) + 2 bytes ("1\0") โ‰ˆ 32 bytes
# vs. INT encoding: 16 bytes (robj, no SDS allocation)

Rule: Never mix APPEND with INCR/DECR on the same key. If you need both string manipulation and numeric operations, store the counter as a separate key.

5.11.3 Large Value Key Scan

# Checking encoding of all keys (dangerous on large datasets)
for key in $(redis-cli KEYS "*"); do
    redis-cli OBJECT ENCODING "$key"
done
# KEYS * is O(N) and blocks Redis; never use in production
# Use SCAN for safe iteration:
redis-cli --scan --pattern "*" | while read key; do
    echo "$key: $(redis-cli OBJECT ENCODING "$key")"
done

5.12 Summary

SDS addresses every fundamental shortcoming of C strings in a way that is both memory-efficient and practical:

Problem C String SDS
Length retrieval O(n) strlen scan O(1) len field read
Buffer overflow Undefined behavior Checked by sdsMakeRoomFor
Frequent realloc Every modification Amortized via preallocation
Binary data with 0x00 Impossible Native via len field
Unsafe stdlib functions Everywhere Avoided entirely

The five SDS header types (sdshdr5 through sdshdr64) each use __attribute__((packed)) to eliminate alignment padding, trading microscopic access penalties for significant memory savings at scale.

The embstr/raw split reflects a careful analysis of jemalloc's size classes: a 44-byte string with its 3-byte sdshdr8 header and 16-byte robj wrapper fits exactly in a 64-byte allocation โ€” one cache line, one malloc. This is not accidental; it represents the kind of deliberate system-level thinking that makes Redis code worth studying.

In the next chapter, we examine listpack โ€” the data structure that replaced ziplist in Redis 7.0 and serves as the compact encoding for small Hash, List, and ZSet values.

Rate this chapter
4.9  / 5  (73 ratings)

๐Ÿ’ฌ Comments