Chapter 10

Memory Is a Long Street

Memory Is a Long Street

Imagine an infinitely long street where every house has a numbered address starting from zero. Every house is identical in size—exactly one byte. If you want to find any piece of data, you just need to know its address. That's the entire essence of computer memory.

When a program runs, all its variables, strings, function arguments, and return values live somewhere on this long street. The CPU finds data by address, reads it in, processes it, and writes it back. Once you understand the street, pointers stop being scary, alignment starts making sense, and the heap and stack become just two different neighborhoods on the same map.

Core Concepts

Address, Byte, Word: Three Levels

A byte is the smallest individually addressable unit of memory. Every byte has a unique address. On a 64-bit system, an address is a 64-bit integer—theoretically enough to address 2^64 bytes, though in practice Linux and Windows only use 48 bits of that (256 TB of virtual address space per process).

A word is the CPU's natural unit of operation. On 32-bit CPUs a word is 4 bytes; on 64-bit CPUs it's 8 bytes. Even though addresses are measured in bytes, the CPU fetches one word at a time from memory.

Memory street (each cell = 1 byte):

Address: 0x00  0x01  0x02  0x03  0x04  0x05  0x06  0x07
        ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
Value:  │ 12  │ 34  │ 56  │ 78  │ 00  │ 00  │ 00  │ 00  │
        └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
          ↑
          int x = 0x12345678; stored at address 0x00 (4 bytes)

Big-Endian vs Little-Endian: How Numbers Lie Down in Memory

A 4-byte integer 0x12345678 must be split into 4 bytes when stored. Which byte goes first? Does the most significant byte 0x12 occupy the lowest address, or does the least significant byte 0x78?

These two choices are called big-endian and little-endian:

Value: 0x12345678

Big-endian (network byte order, PowerPC, SPARC):
Address: 0x00  0x01  0x02  0x03
        ┌─────┬─────┬─────┬─────┐
        │ 12  │ 34  │ 56  │ 78  │
        └─────┴─────┴─────┴─────┘
        Most significant byte at lowest address → matches how humans write numbers

Little-endian (x86, x86-64, ARM default — almost certainly your machine):
Address: 0x00  0x01  0x02  0x03
        ┌─────┬─────┬─────┬─────┐
        │ 78  │ 56  │ 34  │ 12  │
        └─────┴─────┴─────┴─────┘
        Least significant byte at lowest address → reversed

Most of the time you never need to think about this. It only matters in two situations: network communication (machines with different endianness exchanging data) and reading binary files (file formats often specify a fixed endianness).

Pointers: Variables That Hold Addresses

A pointer is simply a variable that stores an address—nothing more. It's not magic; it's just a number whose value happens to be the street address of something else in memory.

int  x = 42;        // x lives at some address, say 0x7fff5000
int *p = &x;        // p is a pointer; its value is 0x7fff5000
                    // p itself also lives in memory, say at 0x7fff5008

// *p means "go to the address stored in p, and read the value there"
printf("%d\n", *p); // prints 42

A pointer's size is always equal to the address width: 4 bytes on 32-bit systems, 8 bytes on 64-bit systems—regardless of what type it points to.

Memory Alignment: Why Structs Have Mysterious Gaps

When the CPU fetches data, it prefers that the address be a multiple of the data's size—this is called natural alignment. Fetching a 4-byte int is happiest when the address is divisible by 4; an 8-byte double prefers addresses divisible by 8. Misaligned access can cause hardware exceptions on strict RISC architectures (older ARM, MIPS) or just slow things down on x86.

To guarantee alignment, the compiler inserts padding bytes between struct members:

struct Bad {         // intuitively: 1+4+1 = 6 bytes
    char  a;         // 1 byte, at offset 0
    // [3 bytes padding]  ← compiler inserts this so b aligns to offset 4
    int   b;         // 4 bytes, at offset 4
    char  c;         // 1 byte, at offset 8
    // [3 bytes padding]  ← tail padding so total size is a multiple of 4
};
// sizeof(struct Bad) = 12, not 6!

struct Good {        // put larger fields first
    int   b;         // 4 bytes, at offset 0
    char  a;         // 1 byte, at offset 4
    char  c;         // 1 byte, at offset 5
    // [2 bytes padding]
};
// sizeof(struct Good) = 8  — saves 4 bytes

The rule: put larger fields first, smaller fields last to minimize wasted padding.

Heap vs Stack: Two Neighborhoods on the Street

Process virtual memory layout (high to low address):

High  ┌─────────────────────────────────┐
      │  Kernel space (OS reserved)     │
      ├─────────────────────────────────┤
      │  Stack                          │ ← grows downward
      │  Local variables, arguments     │   managed automatically by compiler
      │  ↓  ↓  ↓                        │
      │                                 │
      │  (free space between them)      │
      │                                 │
      │  ↑  ↑  ↑                        │
      │  Heap                           │ ← grows upward
      │  malloc/new allocations         │   managed by programmer (or GC)
      ├─────────────────────────────────┤
      │  BSS (uninitialized globals)    │
      ├─────────────────────────────────┤
      │  Data segment (init. globals)   │
      ├─────────────────────────────────┤
      │  Text segment (program code)    │
Low   └─────────────────────────────────┘

Stack: automatically allocated and freed—just moving a stack pointer, so it's extremely fast. Size is limited (8 MB default on Linux). When a function returns, its frame vanishes instantly.

Heap: manually allocated (malloc/free), effectively unlimited (bounded by physical RAM + swap). Forget to free it and you have a memory leak; free it and keep using it and you have a dangling pointer.

Try It Yourself

Verify byte order in Python:

import struct, sys

value = 0x12345678

be = struct.pack('>I', value)   # '>' = big-endian
print([hex(b) for b in be])
# ['0x12', '0x34', '0x56', '0x78']

le = struct.pack('<I', value)   # '<' = little-endian
print([hex(b) for b in le])
# ['0x78', '0x56', '0x34', '0x12']

print(sys.byteorder)  # 'little' on most modern machines

Verify struct padding in C:

#include <stdio.h>
#include <stdlib.h>

struct Bad  { char a; int b; char c; };
struct Good { int b; char a; char c; };

int main() {
    printf("Bad:  %zu bytes\n", sizeof(struct Bad));   // 12
    printf("Good: %zu bytes\n", sizeof(struct Good));  // 8

    // See where different memory regions live
    static int global = 100;
    int local = 200;
    int *heap = malloc(sizeof(int));
    *heap = 300;

    printf("global (data):  %p\n", (void*)&global);
    printf("local  (stack): %p\n", (void*)&local);
    printf("heap:           %p\n", (void*)heap);
    // The three addresses will be far apart, in completely different regions

    free(heap);
    return 0;
}

🔬 Going Deeper

Alignment isn't just about performance—sometimes it's about correctness

On strict RISC architectures (early ARM, MIPS), accessing a misaligned address triggers a hardware exception (Bus Error) and crashes the program outright. Even on the permissive x86, certain atomic operations with the lock prefix require aligned addresses—misalignment yields undefined behavior. Embedded developers run into this constantly because they routinely work with memory-mapped hardware registers at fixed addresses, which may not be aligned to their natural boundary.

Pointer arithmetic: the type determines the stride

In C, adding 1 to a pointer doesn't increment the address by 1 byte—it increments by the size of the pointed-to type. int *p; p + 1 advances the address by 4 bytes (assuming int is 4 bytes). This makes array traversal natural: p[i] is exactly equivalent to *(p + i), and the stride automatically matches the element size. Understanding this explains why casting char * to int * and doing arithmetic is a minefield: the stride changes, and you're suddenly skipping 4 bytes at a time.

NUMA: which neighborhood on which street?

In multi-socket servers (dual Xeon, for example), each CPU has its own local memory banks. Accessing local memory is 2–4× faster than accessing the other CPU's memory. This Non-Uniform Memory Access (NUMA) topology means that where you allocate memory matters as much as how much you allocate. Linux's numactl tool, Java's NUMA-aware garbage collector, and Redis's NUMA pinning options are all engineering responses to this street-geography problem.

Where to learn more

Rate this chapter
4.5  / 5  (34 ratings)

💬 Comments