Chapter 18

Map Internals: Hash Table, Growth and Concurrency

Chapter 18: Map Internals: Hash Table, Growth and Concurrency

Go's map is one of the most used — and most frequently misused — data structures in the standard library. Most programmers know its interface cold: m[key] = value, delete(m, key), v, ok := m[key]. But when a production service panics with concurrent map read and map write, or when you wonder why map iteration order differs on every run, or when a profiler reveals map operations as a hot spot, you realize your understanding of map stops at the interface layer rather than the implementation layer.

This chapter dives into Go's hash table implementation, examining the layout of hmap and bmap, explaining why maps are not safe for concurrent use (and why Go deliberately made this choice), and then presenting real-world concurrent-safe solutions: from sync.RWMutex to sync.Map to the sharded map pattern.

Level 1: Philosophy — Why Maps Are Not Concurrency-Safe

A Deliberate Design Decision

Go's map is not concurrency-safe. This is not an oversight; it is a deliberate, carefully considered design decision. The official Go blog explained this choice explicitly:

"After long discussion it was decided that the typical use of maps did not require safe access from multiple goroutines, and in those cases where it did, the map was probably part of some larger data structure or computation that was already synchronized."

In typical use cases, maps do not need concurrent multi-goroutine access; in those cases where they do, the map is usually already part of a larger, already-synchronized data structure.

The cost of this decision: concurrent reads and writes to a map cause a panic (runtime detection added in Go 1.6+). The benefit: when used from a single goroutine, maps carry zero overhead — no atomic operations, no locking, just direct array reads and writes. If maps were built-in concurrency-safe, every access would pay the cost of atomic operations, even though 99% of map usage never requires that guarantee.

This reflects a core Go philosophy: you don't pay for what you don't use.

What Is a Data Race

Before diving into map internals, let's pin down a concept: a data race occurs when two goroutines concurrently access the same memory location, at least one access is a write, and there is no synchronization mechanism guaranteeing the order of access.

// Classic data race
m := map[string]int{"a": 1}
go func() { m["a"] = 2 }()  // goroutine 1 writes
go func() { _ = m["a"] }()  // goroutine 2 reads
// Result: undefined behavior — may panic, may return wrong values

For simple types like int, a data race might only result in reading a "stale" or "new" value and appear harmless. But for a complex data structure like a map, concurrent writes corrupt internal pointer structures, causing segfaults or infinite loops. This is why Go chose to detect and panic at runtime rather than silently producing wrong results.

Level 2: Internals — The hmap and bmap Layouts

hmap: The Control Structure

Every make(map[K]V) call (or map literal) allocates an hmap struct on the heap (runtime/map.go):

type hmap struct {
    count      int            // number of key-value pairs (returned by len(m))
    flags      uint8          // state flags (concurrent read/write detection here)
    B          uint8          // log2 of number of buckets (2^B buckets total)
    noverflow  uint16         // approximate count of overflow buckets
    hash0      uint32         // hash seed (randomized, guards against HashDoS)
    buckets    unsafe.Pointer // pointer to bucket array (2^B bmap elements)
    oldbuckets unsafe.Pointer // during growth, points to old bucket array
    nevacuate  uintptr        // incremental migration progress counter
    extra      *mapextra      // overflow bucket linked list, etc.
}

The B field is the key to understanding map layout: a map has 2^B buckets, each capable of storing up to 8 key-value pairs. Initially B may be 0 (1 bucket) or larger (pre-allocated multiple buckets).

bmap: The Fine-Grained Bucket Layout

Each bucket corresponds to a bmap struct, but its actual memory layout is more intricate than the struct definition suggests:

bmap (bucket) memory layout:

  ┌────────────────────────────────────┐
  │ tophash [8]uint8                   │  ← high 8 bits of hash for each of 8 slots
  ├────────────────────────────────────┤
  │ key[0]  key[1]  ...  key[7]        │  ← 8 keys stored contiguously
  ├────────────────────────────────────┤
  │ val[0]  val[1]  ...  val[7]        │  ← 8 values stored contiguously
  ├────────────────────────────────────┤
  │ overflow *bmap                     │  ← pointer to next overflow bucket
  └────────────────────────────────────┘

Notice: keys and values are stored separately (all keys first, then all values), not interleaved as in []struct{K, V}. This layout's advantage: when values are much larger than keys, scanning keys does not unnecessarily load value data into cache lines; when keys or values are not alignment-sized, less padding is wasted.

The tophash Optimization: Avoiding Full Key Comparisons

The tophash array is a critical performance optimization for map lookups. Each slot stores the high 8 bits of the corresponding key's hash:

Looking up key "hello" (assume hash(hello) = 0xABCD1234):

1. Compute bucket index: hash & (2^B - 1) = 0x34 & 0x07 = bucket #4
2. Compute tophash: hash >> (64-8) = 0xAB

3. Search bucket #4's tophash array for 0xAB:
   tophash: [0xAB, 0x12, 0xAB, 0x00, ...]
              ↑              ↑
           match!         also matches!

4. Only perform full key comparison for slots where tophash matches
   (most slots require comparing just 1 byte, not the full key)

tophash also uses several special sentinel values to represent slot states (empty, deleted, etc.), eliminating the need for a separate bitmap to track slot validity.

Load Factor and Growth Trigger

A map triggers growth when either of the following conditions is met:

Condition 1: Load factor too high (count / 2^B > 6.5)

6.5 is an empirically determined threshold from benchmarking: at this value, the trade-off between hash collision rate and memory utilization is well balanced. When exceeded, each bucket averages more than 6.5 key-value pairs and lookup performance degrades significantly.

Trigger Condition 1: count > 6.5 * 2^B

With B=3 (8 buckets), triggers when count > 52
New B becomes 4 (16 buckets), capacity doubles

Condition 2: Too many overflow buckets

Even if the load factor is acceptable, if the number of overflow buckets exceeds a threshold (when B < 15, more than 2^B overflow buckets exist), growth is triggered. This happens after many insertions followed by many deletions: the element count is low, but buckets contain many "tombstones" (deleted slots), causing lookups to traverse long overflow chains. In this case, growth (same-size growth, B unchanged) reorganizes memory and eliminates fragmentation.

Incremental Rehashing

When the hash table grows, all old bucket data cannot be migrated to new buckets at once — that would cause large latency spikes. Go uses an incremental migration strategy:

Growth triggered (B: 3→4):
  oldbuckets = buckets (points to old 8 buckets)
  buckets = newly allocated 16 buckets
  nevacuate = 0 (start migrating from old bucket #0)

  ┌──────────────────────────────────────────┐
  │ buckets (new, 16 buckets)                │
  │  [0][1][2]...[7][8][9]...[15]            │
  │   ↑ gradually filled                     │
  └──────────────────────────────────────────┘
  ┌──────────────────────────────────────────┐
  │ oldbuckets (old, 8 buckets)              │
  │  [0][1][2][3][4][5][6][7]                │
  │   ↑ gradually drained                    │
  └──────────────────────────────────────────┘

Each map write additionally migrates 1-2 old buckets.
When nevacuate == 8 (all old buckets migrated),
oldbuckets is set to nil and GC reclaims old memory.

Each write to the map (mapassign or mapdelete) additionally migrates one or two old buckets. This amortizes the O(n) bulk migration cost across subsequent O(1) regular operations, guaranteeing amortized O(1) complexity for map operations.

During growth, read operations check both oldbuckets and buckets to ensure correct data access while migration is in progress.

Hash Seed: Defending Against HashDoS

hmap.hash0 is a random hash seed, initialized from OS randomness when make(map) is called. This is a defense against HashDoS (Hash Denial of Service) attacks, introduced in Go 1.4.

HashDoS attack mechanism: if the hash function is deterministic, an attacker can construct many keys with identical hash values, forcing them all into the same bucket and degrading lookup complexity from O(1) to O(n), exhausting server resources. With a random seed, even if an attacker knows the key content, they cannot predict the resulting hash values, neutralizing the attack.

The cost: map iteration order differs across runs (because identical keys have different hash values in different process instances, landing in different buckets, which are visited in order during iteration). The Go runtime additionally randomizes the starting bucket and slot for iteration, further ensuring that developers cannot rely on iteration order.

Level 3: Code Patterns

Map Iteration Order Randomness

m := map[string]int{"a": 1, "b": 2, "c": 3, "d": 4}

// Iteration order may differ each run
for k, v := range m {
    fmt.Printf("%s=%d ", k, v)
}
// May print: a=1 c=3 b=2 d=4
// Or:        b=2 d=4 a=1 c=3

// For ordered iteration:
keys := make([]string, 0, len(m))
for k := range m {
    keys = append(keys, k)
}
sort.Strings(keys)
for _, k := range keys {
    fmt.Printf("%s=%d ", k, m[k])
}

The root cause of non-deterministic iteration order: map iteration starts at a random bucket at a random slot, then traverses buckets in order. Even if the bucket distribution of key-value pairs is fixed (the hash seed does not change within a single process), the starting position is randomized.

Using Structs as Map Keys

Go requires map key types to be comparable — they must support the == operator. A struct is comparable if all its fields are comparable:

type Point struct {
    X, Y int
}

// Point's fields are all int (comparable), so Point can be a key
m := map[Point]string{
    {1, 2}: "origin-ish",
    {0, 0}: "origin",
}

fmt.Println(m[Point{1, 2}])  // "origin-ish"

// Structs with slice fields cannot be keys
type Bad struct {
    Data []int  // slice is not comparable
}
// m2 := map[Bad]string{}  // compile error: invalid map key type Bad

Performance note: comparing struct keys requires field-by-field comparison across all fields. Larger structs mean slower comparisons. For large structs, consider using a hash or string representation as the key:

// Large struct: use string representation as key
type Config struct {
    Host    string
    Port    int
    Timeout time.Duration
    // ... many fields
}

func configKey(c Config) string {
    return fmt.Sprintf("%s:%d:%v", c.Host, c.Port, c.Timeout)
}
cache := map[string]Result{}
cache[configKey(cfg)] = result

Map vs Slice Performance Benchmark

// Scenario: integer-key lookup, comparing map[int]T vs []T
func BenchmarkMapLookup(b *testing.B) {
    m := make(map[int]int, 1000)
    for i := 0; i < 1000; i++ {
        m[i] = i * 2
    }
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _ = m[i%1000]
    }
}

func BenchmarkSliceLookup(b *testing.B) {
    s := make([]int, 1000)
    for i := 0; i < 1000; i++ {
        s[i] = i * 2
    }
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _ = s[i%1000]
    }
}

// Typical results:
// BenchmarkMapLookup-8    50000000    28.5 ns/op
// BenchmarkSliceLookup-8  500000000    2.1 ns/op
// slice is ~13x faster

When the key is a contiguous integer (or can be mapped to one), slice lookup is roughly an order of magnitude faster than map. Reason: a slice lookup is direct memory indexing (pointer + offset), while a map lookup requires computing a hash, locating a bucket, comparing tophash values, and comparing the full key — multiple memory indirections.

Selection Principles:

Key is a contiguous integer or enum → prefer slice/array
Key is sparse, dynamically added/removed, or is a string/struct → map
Fixed, small element count → consider sorted slice + binary search

sync.Map: Applicable Scenarios and Limitations

Go 1.9 introduced sync.Map, a concurrency-safe map optimized for specific access patterns:

var m sync.Map

// Write
m.Store("key", "value")

// Read
if v, ok := m.Load("key"); ok {
    fmt.Println(v.(string))
}

// Atomic read-or-write
actual, loaded := m.LoadOrStore("key", "default")
fmt.Println(actual, loaded)

// Delete
m.Delete("key")

// Iterate (order not guaranteed)
m.Range(func(k, v interface{}) bool {
    fmt.Println(k, v)
    return true  // return false to stop
})

sync.Map's internal structure contains two maps:

sync.Map {
    mu     Mutex           // protects dirty
    read   atomic.Value    // read-only map (atomic reads, lock-free)
    dirty  map[any]*entry  // writable map with all keys (requires lock)
    misses int             // read miss counter
}

A read operation first attempts to read from read (read-only view) without locking. Only when read doesn't contain the key does it lock and check dirty. When misses exceeds a threshold, dirty is promoted to become the new read, and dirty is cleared (lazy promotion).

sync.Map suitable for:

Read-heavy, write-light workloads (keys written once, then read many times)
Different goroutines reading/writing different keys (reduces lock contention)

sync.Map not suitable for:

Frequent writes to the same key set (dirty and read sync too often, performance degrades)
When len() is needed (sync.Map has no O(1) len; requires traversal)
Frequent iteration (Range is less efficient than a standard map)

Level 4: Advanced — Race Detection, Read-Only Maps and Sharded Maps

Runtime Concurrency Detection and the -race Flag

Go 1.6 introduced runtime detection of concurrent map reads and writes. However, this detection is not based on the happens-before memory model — it is a lightweight check on the hmap.flags field:

// Simplified from runtime/map.go
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    if h.flags&hashWriting != 0 {
        throw("concurrent map read and map write")
    }
    // ...
}

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }
    h.flags ^= hashWriting
    // ... write operations ...
    h.flags &^= hashWriting
}

This detection catches most concurrent read/write patterns, but it is not perfectly reliable: under extreme race conditions, it can miss cases where two goroutines' flag checks and flag sets interleave at exactly the wrong moment.

For precise data race detection, use the -race flag:

go test -race ./...
go run -race main.go
go build -race -o myapp main.go

-race uses the ThreadSanitizer (TSan) algorithm, tracking every memory access with shadow memory to precisely detect all data races. The cost: programs run 5-10x slower and use 5-10x more memory. Typically only enabled during testing and debugging.

// What -race catches: concurrent map read/write
func TestRace(t *testing.T) {
    m := map[string]int{}
    var wg sync.WaitGroup
    wg.Add(2)
    go func() {
        defer wg.Done()
        m["a"] = 1  // write
    }()
    go func() {
        defer wg.Done()
        _ = m["a"]  // read
    }()
    wg.Wait()
    // go test -race reports: DATA RACE on map
}

Protecting a Map with sync.RWMutex

The most direct concurrency-safe map implementation: wrap a standard map with sync.RWMutex.

type SafeMap struct {
    mu sync.RWMutex
    m  map[string]int
}

func NewSafeMap() *SafeMap {
    return &SafeMap{m: make(map[string]int)}
}

func (s *SafeMap) Get(key string) (int, bool) {
    s.mu.RLock()
    defer s.mu.RUnlock()
    v, ok := s.m[key]
    return v, ok
}

func (s *SafeMap) Set(key string, val int) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.m[key] = val
}

func (s *SafeMap) Delete(key string) {
    s.mu.Lock()
    defer s.mu.Unlock()
    delete(s.m, key)
}

func (s *SafeMap) Len() int {
    s.mu.RLock()
    defer s.mu.RUnlock()
    return len(s.m)
}

RWMutex characteristics: multiple goroutines may hold a read lock simultaneously (RLock), but a write lock (Lock) is exclusive. In read-heavy workloads, RWMutex significantly outperforms a plain Mutex.

Note: sync.RWMutex write operations can suffer starvation under high contention — a flood of read locks can prevent a write lock from being acquired for a long time. Go 1.14 improved the RWMutex implementation, giving write locks higher priority: when a write lock is waiting, new read lock requests are blocked.

Sharded Map: The High-Concurrency Tool

When concurrency is very high, a single lock protecting the entire map becomes a bottleneck. The solution is sharding: split the map into multiple independent shards, each with its own lock, mapping different keys to different shards to dramatically reduce lock contention.

const numShards = 32  // typically a power of 2

type ShardedMap struct {
    shards [numShards]mapShard
}

type mapShard struct {
    mu sync.RWMutex
    m  map[string]int
    _  [56]byte  // cache line padding to prevent false sharing
}

func NewShardedMap() *ShardedMap {
    sm := &ShardedMap{}
    for i := range sm.shards {
        sm.shards[i].m = make(map[string]int)
    }
    return sm
}

func (sm *ShardedMap) shardIndex(key string) int {
    // Map key to shard index using FNV hash
    h := fnv.New32a()
    h.Write([]byte(key))
    return int(h.Sum32()) % numShards
}

func (sm *ShardedMap) Get(key string) (int, bool) {
    shard := &sm.shards[sm.shardIndex(key)]
    shard.mu.RLock()
    defer shard.mu.RUnlock()
    v, ok := shard.m[key]
    return v, ok
}

func (sm *ShardedMap) Set(key string, val int) {
    shard := &sm.shards[sm.shardIndex(key)]
    shard.mu.Lock()
    defer shard.mu.Unlock()
    shard.m[key] = val
}

func (sm *ShardedMap) Delete(key string) {
    shard := &sm.shards[sm.shardIndex(key)]
    shard.mu.Lock()
    defer shard.mu.Unlock()
    delete(shard.m, key)
}

// Len must aggregate all shards
func (sm *ShardedMap) Len() int {
    total := 0
    for i := range sm.shards {
        sm.shards[i].mu.RLock()
        total += len(sm.shards[i].m)
        sm.shards[i].mu.RUnlock()
    }
    return total
}

Choosing numShards:

Too few: contention remains high
Too many: global operations like Len() are expensive; memory is fragmented
Typically choose a power of 2 between 16 and 256 (enables bitwise modulo)

Performance comparison (high concurrency, 100 goroutines reading and writing simultaneously):

Approach	Throughput (ops/s)	Notes
sync.Mutex + map	~2M	Heavy lock contention
sync.RWMutex + map	~8M	Good read concurrency, writes still serial
sync.Map	~12M	Best for read-heavy, write-light
ShardedMap (32)	~25M	Best for balanced read/write

Go 1.21+ maps Package

Go 1.21 introduced the maps standard library package with a collection of commonly needed map utility functions:

import "maps"

m1 := map[string]int{"a": 1, "b": 2}
m2 := map[string]int{"b": 3, "c": 4}

// Clone a map
cloned := maps.Clone(m1)

// Copy all key-value pairs from m2 into m1 (m2 overwrites m1 on conflict)
maps.Copy(m1, m2)
// m1 is now: {"a": 1, "b": 3, "c": 4}

// Delete key-value pairs satisfying a predicate
maps.DeleteFunc(m1, func(k string, v int) bool {
    return v < 2  // delete pairs where value < 2
})

// Check equality of two maps
fmt.Println(maps.Equal(m1, m2))

// Iterate keys (Go 1.23+ range over func iterator)
for k := range maps.Keys(m1) {
    fmt.Println(k)
}

maps.Clone performs a shallow copy: both keys and values are copied, but if a value is a pointer or contains pointers, both maps' values point to the same underlying objects after cloning.

// The shallow copy trap
m1 := map[string]*[]int{
    "nums": {1, 2, 3},
}
m2 := maps.Clone(m1)
(*m2["nums"])[0] = 99  // mutates the slice m1["nums"] also points to!
fmt.Println(*m1["nums"])  // [99 2 3] — unexpectedly modified

Read-Only Map Pattern

Sometimes you have a configuration map that is populated at initialization and never modified afterward, and you want multiple goroutines to safely read it concurrently. For truly read-only access, a standard map is concurrency-safe (concurrent reads are safe; only concurrent reads and writes are dangerous):

// Initialization phase (single goroutine)
config := map[string]string{
    "host": "localhost",
    "port": "8080",
}

// Passing map to multiple goroutines for read-only use — this is safe
for i := 0; i < 10; i++ {
    go func() {
        host := config["host"]  // pure read, no race
        _ = host
    }()
}

If you want to explicitly express "this map is immutable after initialization," encapsulate it through an interface or function type:

type ReadOnlyMap struct {
    m map[string]string
}

func NewReadOnlyMap(m map[string]string) ReadOnlyMap {
    // Defensive copy — prevents caller from modifying the original map later
    copied := make(map[string]string, len(m))
    for k, v := range m {
        copied[k] = v
    }
    return ReadOnlyMap{m: copied}
}

func (r ReadOnlyMap) Get(key string) (string, bool) {
    v, ok := r.m[key]
    return v, ok
}

// No Set/Delete methods — compile-time guarantee of read-only access

Summary

Go's map is an elegantly designed hash table. The two-level design of hmap control structure plus bmap bucket array, the tophash fast filter, incremental growth's amortized cost control, and hash seed security protection — each design decision reflects a concrete engineering trade-off.

Maps being non-concurrency-safe is a deliberate design: zero overhead for single-goroutine use, and a choice of sync.Mutex, sync.RWMutex, sync.Map, or sharded maps for concurrent scenarios. Understanding when to apply each approach and its performance characteristics is essential knowledge for writing high-performance Go programs.

Remember: iteration order is undefined, zero-value reads are safe (returning the type's zero value), concurrent writes cause a panic.

Rate this chapter

4.7 / 5 (15 ratings)