Chapter 11

sync Package: Mutex, WaitGroup, Once, Pool

Go's concurrency model centers on CSP (Communicating Sequential Processes), with channels as the preferred synchronization method. But in practice, not every concurrency problem is best solved with channels. When multiple goroutines need to access shared data structures, direct locking is often simpler and more efficient than passing ownership through channels. The standard library's sync package provides a carefully designed set of low-level synchronization primitives—the foundational tools for building high-performance concurrent programs.

The design philosophy of sync is "less is more"—it provides only the most essential primitives, each with a clear use case. As Russ Cox discussed in Go 2017: "The sync package is for those scenarios where channels can't solve the problem or would be too awkward."

Level 1: What You Need to Know

Mutex: Mutual Exclusion Lock

sync.Mutex is the most basic synchronization primitive—it ensures only one goroutine can access a critical section at a time.

type SafeCounter struct {
    mu    sync.Mutex
    count int
}

func (c *SafeCounter) Increment() {
    c.mu.Lock()
    c.count++
    c.mu.Unlock()
}

func (c *SafeCounter) Get() int {
    c.mu.Lock()
    defer c.mu.Unlock()
    return c.count
}

Key rules:

Lock() acquires the lock; blocks if already held
Unlock() releases the lock; panics if not held
Locks aren't bound to goroutines—goroutine A can lock, goroutine B can unlock (but this is bad practice)
Always use defer for unlock, unless you have a clear performance reason not to

func (c *SafeCounter) IncrementBad() {
    c.mu.Lock()
    // If this panics, lock is never released—deadlock!
    riskyOperation()
    c.mu.Unlock()
}

func (c *SafeCounter) IncrementGood() {
    c.mu.Lock()
    defer c.mu.Unlock() // Released even on panic
    riskyOperation()
}

Complete Example: Thread-Safe Map

The standard library map is not concurrency-safe. In Go 1.6+, concurrent read-write on a map causes a direct panic (not undefined behavior from data race, but explicit detection followed by fatal).

type SafeMap[K comparable, V any] struct {
    mu sync.Mutex
    m  map[K]V
}

func NewSafeMap[K comparable, V any]() *SafeMap[K, V] {
    return &SafeMap[K, V]{m: make(map[K]V)}
}

func (sm *SafeMap[K, V]) Get(key K) (V, bool) {
    sm.mu.Lock()
    defer sm.mu.Unlock()
    v, ok := sm.m[key]
    return v, ok
}

func (sm *SafeMap[K, V]) Set(key K, value V) {
    sm.mu.Lock()
    defer sm.mu.Unlock()
    sm.m[key] = value
}

func (sm *SafeMap[K, V]) Delete(key K) {
    sm.mu.Lock()
    defer sm.mu.Unlock()
    delete(sm.m, key)
}

RWMutex: Read-Write Lock

If reads vastly outnumber writes, sync.Mutex makes all reads block each other—wasteful. sync.RWMutex allows multiple concurrent readers, only requiring mutual exclusion for writes.

type Config struct {
    mu   sync.RWMutex
    data map[string]string
}

func (c *Config) Get(key string) string {
    c.mu.RLock()         // Read lock: multiple reads can proceed concurrently
    defer c.mu.RUnlock()
    return c.data[key]
}

func (c *Config) Set(key, value string) {
    c.mu.Lock()          // Write lock: exclusive access
    defer c.mu.Unlock()
    c.data[key] = value
}

Read-write lock semantics:

RLock(): Acquire read lock. Blocks if write lock is held; otherwise succeeds (can coexist with other read locks)
RUnlock(): Release read lock
Lock(): Acquire write lock. Blocks if any lock (read or write) is held
Unlock(): Release write lock

When to use RWMutex?

Rule of thumb: RWMutex only makes sense when read-to-write ratio exceeds 10:1. When the ratio is near 1:1, RWMutex's overhead (maintaining reader counter internally) actually makes it slower than plain Mutex.

Read:Write Ratio    Recommendation
1:1                 sync.Mutex
5:1                 sync.Mutex (borderline, benchmark needed)
10:1+               sync.RWMutex
100:1+              Consider sync.Map or atomic

WaitGroup: Waiting for a Group of Goroutines

sync.WaitGroup waits for a group of goroutines to complete. It's the core tool for the "fork-join" concurrency model.

func fetchAll(urls []string) []string {
    var wg sync.WaitGroup
    results := make([]string, len(urls))

    for i, url := range urls {
        wg.Add(1) // Call Add BEFORE launching goroutine
        go func(idx int, u string) {
            defer wg.Done() // Call Done when goroutine completes
            resp, err := http.Get(u)
            if err != nil {
                results[idx] = "error"
                return
            }
            defer resp.Body.Close()
            body, _ := io.ReadAll(resp.Body)
            results[idx] = string(body)
        }(i, url)
    }

    wg.Wait() // Blocks until all goroutines call Done
    return results
}

Key rules:

Add(n) must be called before the go statement (otherwise Wait might return before Add)
Done() is equivalent to Add(-1)
Counter going negative causes panic
WaitGroup can be reused (after counter returns to 0, you can Add again)

Common mistake: Calling Add inside the goroutine

// Wrong! May cause Wait to return early
for _, url := range urls {
    go func(u string) {
        wg.Add(1) // Too late! main goroutine may already be at Wait()
        defer wg.Done()
        fetch(u)
    }(url)
}
wg.Wait()

Correct approach:

for _, url := range urls {
    wg.Add(1) // Add before launching goroutine
    go func(u string) {
        defer wg.Done()
        fetch(u)
    }(url)
}
wg.Wait()

sync.Once: Guarantee Single Execution

sync.Once ensures a function executes exactly once regardless of how many goroutines call it. The most common use is singleton initialization.

var (
    instance *Database
    once     sync.Once
)

func GetDB() *Database {
    once.Do(func() {
        // This function runs only once, even if 1000 goroutines call GetDB simultaneously
        instance = &Database{
            conn: connectToDB(),
        }
    })
    return instance
}

sync.Once guarantees:

Function executes only once (even with concurrent calls)
All callers wait until the first execution completes before returning
After first execution completes, all subsequent Do calls return immediately (near-zero overhead)

Note: If the function passed to Once.Do panics, Once still considers it "done." Subsequent calls won't re-execute:

var once sync.Once

once.Do(func() {
    panic("oops") // Panicked
})

once.Do(func() {
    fmt.Println("this will never print") // Won't execute
})

Starting from Go 1.21, new helpers sync.OnceFunc, sync.OnceValue, and sync.OnceValues provide a more convenient API:

// Go 1.21+
getDB := sync.OnceValue(func() *Database {
    return &Database{conn: connectToDB()}
})

db := getDB() // First call initializes, subsequent calls return cached value

Practical Example: Concurrency-Safe Cache

Combining Mutex, WaitGroup, and Once to build a production-grade cache:

type Cache struct {
    mu    sync.RWMutex
    items map[string]*cacheItem
}

type cacheItem struct {
    value  interface{}
    expiry time.Time
}

func NewCache() *Cache {
    c := &Cache{items: make(map[string]*cacheItem)}
    // Launch background goroutine to clean expired items
    go c.janitor()
    return c
}

func (c *Cache) Get(key string) (interface{}, bool) {
    c.mu.RLock()
    defer c.mu.RUnlock()
    
    item, exists := c.items[key]
    if !exists {
        return nil, false
    }
    if time.Now().After(item.expiry) {
        return nil, false // Expired, treat as non-existent
    }
    return item.value, true
}

func (c *Cache) Set(key string, value interface{}, ttl time.Duration) {
    c.mu.Lock()
    defer c.mu.Unlock()
    
    c.items[key] = &cacheItem{
        value:  value,
        expiry: time.Now().Add(ttl),
    }
}

func (c *Cache) janitor() {
    ticker := time.NewTicker(1 * time.Minute)
    defer ticker.Stop()
    
    for range ticker.C {
        c.mu.Lock()
        for key, item := range c.items {
            if time.Now().After(item.expiry) {
                delete(c.items, key)
            }
        }
        c.mu.Unlock()
    }
}

Level 2: How It Works Under the Hood

sync.Pool: Object Reuse

sync.Pool is a temporary object pool that caches allocated objects for reuse, reducing memory allocation and GC pressure.

var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func processRequest(data []byte) string {
    buf := bufPool.Get().(*bytes.Buffer) // Get from pool
    buf.Reset()                          // Reset state
    defer bufPool.Put(buf)               // Return when done

    buf.Write(data)
    buf.WriteString(" processed")
    return buf.String()
}

sync.Pool characteristics:

Get(): Retrieves an object from the pool. If empty, calls New to create one
Put(): Returns an object to the pool
Cleared on GC: Every GC cycle, all objects in the Pool may be cleared (no survival guarantee)
No size limit: Pool grows as needed, GC reclaims

Critical constraint: Objects in Pool can be reclaimed at any time. Don't store persistent data in Pool, and don't rely on Pool size.

Real-world usage in standard library—fmt package:

// fmt/print.go (simplified)
var ppFree = sync.Pool{
    New: func() interface{} { return new(pp) },
}

func Fprintf(w io.Writer, format string, a ...interface{}) (n int, err error) {
    p := ppFree.Get().(*pp)
    p.doPrintf(format, a)
    n, err = w.Write(p.buf)
    p.free() // Internally calls ppFree.Put(p)
    return
}

fmt.Printf needs a pp struct for formatting on every call. If every call uses new, high-frequency calls generate massive garbage. With Pool reuse, GC pressure drops dramatically.

Performance comparison:

// Benchmark: without Pool
func BenchmarkNoPool(b *testing.B) {
    for i := 0; i < b.N; i++ {
        buf := new(bytes.Buffer)
        buf.WriteString("hello")
        _ = buf.String()
    }
}

// Benchmark: with Pool
func BenchmarkWithPool(b *testing.B) {
    pool := sync.Pool{New: func() interface{} { return new(bytes.Buffer) }}
    for i := 0; i < b.N; i++ {
        buf := pool.Get().(*bytes.Buffer)
        buf.Reset()
        buf.WriteString("hello")
        _ = buf.String()
        pool.Put(buf)
    }
}

// Typical results (Go 1.21, Apple M1):
// BenchmarkNoPool-8     30000000    42 ns/op    64 B/op    1 allocs/op
// BenchmarkWithPool-8   50000000    28 ns/op     0 B/op    0 allocs/op

Pool reduces operation time by ~33%, and more importantly achieves 0 allocs/op—no burden on GC.

sync.Map: Concurrency-Safe Map

sync.Map, introduced in Go 1.9, is a concurrency-safe map optimized for specific scenarios.

var cache sync.Map

// Store
cache.Store("key1", "value1")

// Load
if val, ok := cache.Load("key1"); ok {
    fmt.Println(val.(string))
}

// LoadOrStore (atomic operation)
actual, loaded := cache.LoadOrStore("key2", "value2")
// loaded = false: didn't exist before, stored "value2"
// loaded = true:  already existed, returned old value

// Delete
cache.Delete("key1")

// Range
cache.Range(func(key, value interface{}) bool {
    fmt.Println(key, value)
    return true // Return false to stop iteration
})

sync.Map's two ideal scenarios (from official docs):

Entries written once but read many times (e.g., a growing cache)
Multiple goroutines read and write disjoint key sets

In other scenarios, sync.Mutex + regular map is usually faster.

Why? sync.Map internally uses two maps—a read-only read map and a lock-requiring dirty map. Read operations first check the read map (lock-free, atomic operations); only on miss does it lock and access dirty. If the key set is stable (rarely adding new keys), most operations take the lock-free path.

// sync.Map internals (simplified)
type Map struct {
    mu    Mutex
    read  atomic.Pointer[readOnly]  // Lock-free reads
    dirty map[interface{}]*entry     // Requires lock
    misses int
}

type readOnly struct {
    m       map[interface{}]*entry
    amended bool // Whether dirty has keys not in read
}

Performance comparison: choosing by scenario

Scenario                                    sync.Map    Mutex+Map
Read-heavy (99:1)                           2-5x faster  slower
Balanced read/write (50:50)                 1-3x slower  faster
Key set constantly growing                   slower       faster
Fixed keys, goroutines operate on diff keys  3-10x faster slower

Mutex vs Channel: When to Use Which

Use Mutex when:

Protecting shared data structures (maps, slices, struct fields)
Simple counters, flags
Short critical sections (a few lines)
Performance-sensitive hot paths

Use Channel when:

Transferring data ownership between goroutines
Coordinating execution order of multiple goroutines
Implementing timeout, cancellation
Complex patterns like fan-out/fan-in

Rob Pike's advice: "If you're protecting a data structure, use a mutex. If you're coordinating workflow, use a channel."

// Mutex: protecting shared state
type Counter struct {
    mu sync.Mutex
    n  int
}

// Channel: coordinating workflow
func pipeline(input <-chan int) <-chan int {
    output := make(chan int)
    go func() {
        defer close(output)
        for v := range input {
            output <- transform(v)
        }
    }()
    return output
}

WaitGroup Internal Implementation

sync.WaitGroup's core is a 64-bit atomic counter and a semaphore:

// Simplified WaitGroup structure
type WaitGroup struct {
    // High 32 bits: counter
    // Low 32 bits: waiter count
    state atomic.Uint64
    sema  uint32 // Semaphore for blocking/waking
}

Add(n): Atomically adds n to counter
Done(): Atomically decrements counter by 1; if counter reaches zero, wakes all waiters
Wait(): If counter > 0, increments waiter count, then calls runtime_Semacquire to block

Counter and waiter count are packed into a single 64-bit integer so Add can atomically check both whether counter reached zero and whether there are waiters—more efficient than using two separate variables.

Cond: Condition Variable

sync.Cond is a relatively uncommon but highly valuable primitive for specific scenarios. It allows goroutines to wait until a condition becomes true.

type BoundedQueue struct {
    mu       sync.Mutex
    notEmpty *sync.Cond
    notFull  *sync.Cond
    buf      []int
    capacity int
}

func NewBoundedQueue(cap int) *BoundedQueue {
    q := &BoundedQueue{
        buf:      make([]int, 0, cap),
        capacity: cap,
    }
    q.notEmpty = sync.NewCond(&q.mu)
    q.notFull = sync.NewCond(&q.mu)
    return q
}

func (q *BoundedQueue) Put(val int) {
    q.mu.Lock()
    defer q.mu.Unlock()
    
    for len(q.buf) == q.capacity {
        q.notFull.Wait() // Releases lock and waits; reacquires lock when woken
    }
    q.buf = append(q.buf, val)
    q.notEmpty.Signal() // Notify one waiting consumer
}

func (q *BoundedQueue) Get() int {
    q.mu.Lock()
    defer q.mu.Unlock()
    
    for len(q.buf) == 0 {
        q.notEmpty.Wait()
    }
    val := q.buf[0]
    q.buf = q.buf[1:]
    q.notFull.Signal()
    return val
}

Cond.Wait()'s three-step operation (executed atomically):

Release the associated lock
Suspend current goroutine
Reacquire lock when woken

Why use a for loop instead of if? Because after Wait returns, the condition may no longer be true (other goroutines may have acted first). This is called "spurious wakeup"—while Go's implementation doesn't produce true spurious wakeups, the spec recommends always checking conditions in a loop.

Signal() wakes one waiter; Broadcast() wakes all waiters.

Atomic Operations: sync/atomic

For simple counters and flags, atomic operations are lighter than Mutex:

import "sync/atomic"

var counter int64

func increment() {
    atomic.AddInt64(&counter, 1)
}

func get() int64 {
    return atomic.LoadInt64(&counter)
}

Go 1.19 introduced typed atomic variables for safer, more ergonomic usage:

var counter atomic.Int64

func increment() {
    counter.Add(1)
}

func get() int64 {
    return counter.Load()
}

Atomic operations:

Operation	Function	Go 1.19+ Type Method
Load	`LoadInt64(&x)`	`x.Load()`
Store	`StoreInt64(&x, v)`	`x.Store(v)`
Add	`AddInt64(&x, n)`	`x.Add(n)`
CAS	`CompareAndSwapInt64(&x, old, new)`	`x.CompareAndSwap(old, new)`
Swap	`SwapInt64(&x, new)`	`x.Swap(new)`

When atomic vs Mutex?

Single variable, simple operation → atomic
Multiple variables need updating together → Mutex
Complex logic (if-then-update) → Mutex

Level 3: What the Specification Says

Mutex Implementation: From Spinning to Semaphore

Go's Mutex implementation has evolved through multiple iterations. The current implementation (Go 1.9+) combines spinning and semaphores, introducing starvation mode.

// sync/mutex.go (simplified)
type Mutex struct {
    state int32  // Lock state (multiple flag bits)
    sema  uint32 // Semaphore
}

const (
    mutexLocked      = 1 << iota // 1: lock is held
    mutexWoken                    // 2: a goroutine has been woken
    mutexStarving                 // 4: starvation mode
    mutexWaiterShift = iota       // 3: bit offset for waiter count
)

Complete Lock() flow:

Fast path: CAS attempts to set state from 0 to mutexLocked. If successful, returns immediately—this is the uncontended path, requiring only one atomic operation.
Slow path: If fast path fails (lock already held), enters lockSlow():
- Spinning phase: If lock is held and in normal mode, goroutine spins. Spin conditions:
  - Running on multicore machine
  - Current GOMAXPROCS > 1
  - At least one other P (processor) is running
  - Spin count < 4
- Semaphore phase: After spin limit exceeded, goroutine calls runtime_SemacquireMutex to sleep
After waking: Goroutine woken by semaphore must compete with newly arriving goroutines for the lock.

Why spin first, then semaphore? Spinning avoids thread switch overhead (~1-2 microseconds). For short critical sections (tens of nanoseconds), spinning until lock release is much faster than sleep-wake. But infinite spinning wastes CPU, so after 4 iterations it switches to semaphore.

Starvation Mode (Go 1.9+)

Before Go 1.9, Mutex had a serious problem: newly arriving goroutines could acquire the lock more easily than already-waiting ones (because new goroutines are already running on CPU and can immediately spin-compete). This caused waiting goroutines to potentially "starve"—unbounded wait time.

Go 1.9 introduced starvation mode to solve this:

Normal mode:

Waiters queue in FIFO order
Woken waiters compete with newly arriving goroutines
New arrivals have advantage (already running on CPU)

Starvation mode:

Trigger condition: a waiter has waited over 1ms
Behavior: lock is handed directly to head of wait queue; new arrivals don't compete
Exit condition: goroutine that acquired the lock is the last in queue, or wait time < 1ms

Timeline (starvation problem in normal mode):

G1 holds lock
G2 waiting... (100us)
G3 arrives -> spins -> acquires lock (G2 keeps waiting)
G4 arrives -> spins -> acquires lock (G2 keeps waiting)
...
G2 may wait indefinitely

Timeline (starvation mode):

G1 holds lock
G2 waiting... (>1ms) -> triggers starvation mode
G1 unlocks -> directly handed to G2 (G3, G4 must queue)

Dmitry Vyukov proposed this improvement in Go issue #13086, with commit message: "sync: make Mutex more fair." Benchmarks showed starvation mode reduced worst-case latency from hundreds of milliseconds to ~1ms, though average throughput slightly decreased (fewer spinning opportunities).

sync.Pool and GC Interaction

sync.Pool's lifecycle is tightly coupled with GC. Its internal implementation uses per-P (processor) local pools to reduce lock contention:

// sync/pool.go (simplified)
type Pool struct {
    noCopy noCopy

    local     unsafe.Pointer // [P]poolLocal array
    localSize uintptr

    victim     unsafe.Pointer // local from previous GC cycle
    victimSize uintptr

    New func() interface{}
}

type poolLocal struct {
    poolLocalInternal
    pad [128 - unsafe.Sizeof(poolLocalInternal{})%128]byte // Prevent false sharing
}

type poolLocalInternal struct {
    private interface{} // Only current P can access (lock-free)
    shared  poolChain   // Other Ps can steal from (lock-free)
}

Get() flow:

Pin current goroutine to P (pin())
Check current P's private field—lock-free
If private is nil, pop from head of current P's shared list
If shared is also empty, steal from tail of other Ps' shared lists (work-stealing)
If all empty, check victim pool (leftover from previous GC)
If everything empty, call New()

GC cleanup:

Every GC cycle: victim = local; local = nil
This means objects survive at most two GC cycles: first cycle moves from local to victim, second cycle clears victim
This "double buffering" strategy avoids the performance cliff of clearing all objects immediately after GC

Why Pool isn't suitable for connection pools:

// Wrong usage: database connection pool
var connPool = sync.Pool{
    New: func() interface{} {
        conn, _ := sql.Open("mysql", dsn)
        return conn
    },
}
// Problem: connections cleared after GC, next Get requires reconnection (expensive)
// Correct: use sql.DB's built-in pool, or implement channel-based pool yourself

sync.Pool is for "cheap to create but high frequency" temporary objects (like bytes.Buffer), not "expensive to create but low frequency" long-lived resources (like DB connections).

sync.Once Implementation

Once's implementation appears simple but has subtle performance considerations:

// sync/once.go
type Once struct {
    done atomic.Uint32
    m    Mutex
}

func (o *Once) Do(f func()) {
    // Fast path: already done, return immediately
    if o.done.Load() == 1 {
        return
    }
    // Slow path: first call (or first is still executing)
    o.doSlow(f)
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done.Load() == 0 {
        defer o.done.Store(1)
        f()
    }
}

Why not just use CAS?

// Wrong implementation (conceptual)
func (o *Once) Do(f func()) {
    if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
        f()
    }
}

Problem: If goroutine A wins the CAS and starts executing f(), goroutine B sees done=1 and returns immediately—but f() might not have finished yet! B could use an uninitialized object.

The correct implementation uses Mutex to ensure: all concurrent callers wait until f() completes before returning. This is stronger than "execute only once"—it guarantees "complete execution before anyone else proceeds."

Memory Model Guarantees for sync

The Go Memory Model's happens-before guarantees for sync package:

Mutex: The nth Unlock() happens-before the (n+1)th Lock() returns
RWMutex: For any RLock() call, there exists some n such that the nth Unlock() happens-before that RLock() returns, and the corresponding RUnlock() happens-before the (n+1)th Lock() returns
Once: Completion of f in once.Do(f) happens-before any once.Do returns
WaitGroup: wg.Done() happens-before the corresponding wg.Wait() returns
atomic: Go 1.19+ clarified that atomic operations establish happens-before relationships (previously undefined)

These guarantees mean:

var data string
var mu sync.Mutex

// Goroutine A
mu.Lock()
data = "hello"
mu.Unlock()

// Goroutine B (acquires lock after A unlocks)
mu.Lock()
fmt.Println(data) // Guaranteed to see "hello"
mu.Unlock()

Without Mutex, even if A runs first, B isn't guaranteed to see A's write (due to CPU caches and compiler optimizations).

RWMutex Implementation Details

RWMutex uses a counter to track reader count, plus a Mutex to protect write operations:

// sync/rwmutex.go (simplified)
type RWMutex struct {
    w           Mutex  // Write lock mutex
    writerSem   uint32 // Writer semaphore
    readerSem   uint32 // Reader semaphore
    readerCount atomic.Int32 // Reader count (may be negative)
    readerWait  atomic.Int32 // Readers waiting to finish
}

const rwmutexMaxReaders = 1 << 30

RLock() implementation:

func (rw *RWMutex) RLock() {
    if rw.readerCount.Add(1) < 0 {
        // Writer waiting or holding write lock, block
        runtime_SemacquireRWMutexR(&rw.readerSem, false, 0)
    }
}

Lock() (write lock) implementation:

func (rw *RWMutex) Lock() {
    rw.w.Lock() // Exclude other writers
    // Notify readers a writer has arrived: subtract rwmutexMaxReaders from readerCount
    r := rw.readerCount.Add(-rwmutexMaxReaders) + rwmutexMaxReaders
    // If there are active readers, wait for them to finish
    if r != 0 && rw.readerWait.Add(r) != 0 {
        runtime_SemacquireRWMutex(&rw.writerSem, false, 0)
    }
}

The clever trick: readerCount going negative signals "a writer is waiting." New RLock() calls seeing a negative value know they must wait.

Level 4: Edge Cases and Pitfalls

Pitfall 1: Lock Copying (Mutex/WaitGroup Must Not Be Copied)

One of the most common mistakes for Go beginners:

type Service struct {
    mu sync.Mutex
    // ... fields
}

// Wrong! Value passing copies the Mutex
func process(s Service) {
    s.mu.Lock()
    // ... operating on copy's lock, original object unprotected
    s.mu.Unlock()
}

// Correct: pass pointer
func process(s *Service) {
    s.mu.Lock()
    defer s.mu.Unlock()
    // ...
}

Why can't locks be copied? Mutex's internal state (whether held, wait queue) is specific to that instance. Copying a held lock results in two goroutines thinking they hold "the same lock"—but they're actually two different locks.

go vet detection: Go's built-in go vet tool detects lock copying:

$ go vet ./...
# example.com/myapp
./main.go:15:17: process passes lock by value: example.com/myapp.Service contains sync.Mutex

All sync types must not be copied: Mutex, RWMutex, WaitGroup, Once, Cond, Pool, Map.

The mechanism enforcing this constraint is the noCopy struct (an empty struct implementing sync.Locker interface); go vet checks whether structs containing noCopy fields are being copied.

Pitfall 2: Deadlock Patterns

Pattern 1: Self-locking (recursive lock death)

Go's Mutex is not reentrant—the same goroutine locking the same Mutex twice deadlocks:

func (s *Service) A() {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.B() // Deadlock! B also needs the lock
}

func (s *Service) B() {
    s.mu.Lock() // Blocks forever—lock already held by current goroutine
    defer s.mu.Unlock()
    // ...
}

Why doesn't Go have reentrant locks? Russ Cox explained clearly in Go issue #14939: "Recursive mutexes do not protect invariants. Mutual exclusion locks protect invariants. If the lock protects some invariant, then no reentrant call is safe to make while the invariant may be broken."

Meaning: if A modifies shared data halfway then calls B, B re-acquiring the lock would see inconsistent intermediate state. Reentrant locks mask this problem rather than solving it.

Fix approaches:

// Approach 1: Split into internal unlocked version
func (s *Service) A() {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.bLocked() // Internal version without locking
}

func (s *Service) B() {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.bLocked()
}

func (s *Service) bLocked() {
    // Assumes caller holds lock
    // ...
}

Pattern 2: AB-BA deadlock

// Goroutine 1          Goroutine 2
mu1.Lock()             mu2.Lock()
mu2.Lock() // waits G2  mu1.Lock() // waits G1
// Deadlock!

Fix: Always lock in consistent order

// Convention: always lock mu1 before mu2
func transferLocked(mu1, mu2 *sync.Mutex) {
    // Sort by address to ensure global consistency
    if uintptr(unsafe.Pointer(mu1)) > uintptr(unsafe.Pointer(mu2)) {
        mu1, mu2 = mu2, mu1
    }
    mu1.Lock()
    mu2.Lock()
    // ...
    mu2.Unlock()
    mu1.Unlock()
}

Pattern 3: Forgetting to unlock in a goroutine

func bad(mu *sync.Mutex) {
    mu.Lock()
    if someCondition {
        return // Forgot Unlock!
    }
    mu.Unlock()
}

Solution: Always use defer

Pitfall 3: RWMutex Writer Starvation

When read operations are very frequent, writers may never acquire the lock:

// Scenario: 100 goroutines continuously RLocking
// 1 goroutine tries to Lock
// If readers never pause, writer can never find a moment when all readers released

Go's RWMutex has protection for this: When a writer arrives (calls Lock), new readers are blocked (because readerCount becomes negative). Existing readers can continue to completion, but no new readers join. This ensures the writer eventually acquires the lock.

However, if existing readers do long operations in their critical sections (e.g., network requests), the writer still waits a long time.

Best practices:

Keep read critical sections short
If read operations involve IO, copy needed data inside the lock, then unlock before doing IO

// Wrong: network request inside read lock
func (c *Cache) GetAndFetch(key string) (string, error) {
    c.mu.RLock()
    defer c.mu.RUnlock()
    if val, ok := c.data[key]; ok {
        return val, nil
    }
    // Network request inside read lock—blocks writer for long time
    return http.Get("http://example.com/" + key)
}

// Correct: minimize critical section
func (c *Cache) GetAndFetch(key string) (string, error) {
    c.mu.RLock()
    val, ok := c.data[key]
    c.mu.RUnlock() // Release immediately

    if ok {
        return val, nil
    }
    // Network request outside lock
    return http.Get("http://example.com/" + key)
}

Pitfall 4: sync.Pool Usage Mistakes

Mistake 1: Forgetting to Reset

var bufPool = sync.Pool{
    New: func() interface{} { return new(bytes.Buffer) },
}

func process(data string) string {
    buf := bufPool.Get().(*bytes.Buffer)
    // Forgot buf.Reset()!
    // buf may still contain data from previous use
    buf.WriteString(data)
    result := buf.String() // May include leftover data from previous use
    bufPool.Put(buf)
    return result
}

Mistake 2: Using after Put

func process() {
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    buf.WriteString("hello")
    bufPool.Put(buf)
    
    // Wrong! buf is back in pool, may be acquired and modified by another goroutine
    fmt.Println(buf.String()) // Data race!
}

Mistake 3: Storing large objects with pointers in Pool

// Large slice referenced by Pool, GC can't reclaim underlying array
var bigBufPool = sync.Pool{
    New: func() interface{} {
        buf := make([]byte, 0, 1<<20) // 1MB
        return &buf
    },
}

// Better approach: limit size of objects returned to pool
func putBuf(buf *[]byte) {
    if cap(*buf) > 1<<20 {
        return // Too large, let GC reclaim
    }
    *buf = (*buf)[:0]
    bigBufPool.Put(buf)
}

Pitfall 5: WaitGroup Reuse Race Condition

var wg sync.WaitGroup

// First round
wg.Add(2)
go func() { defer wg.Done(); work1() }()
go func() { defer wg.Done(); work2() }()
wg.Wait()

// Second round—note: must ensure first round is fully complete before starting
// If after wg.Wait() returns but before wg.Add(2),
// a slow goroutine's Done() hasn't finished (race condition),
// panic "sync: negative WaitGroup counter"
wg.Add(2) // Safe: Wait returning means all Done calls completed

In practice, Go's WaitGroup implementation guarantees that when Wait() returns, all Done() calls have completed, so the above code is safe. But if you have other goroutines that might call Done() between Wait() returning and Add() (a design bug), problems arise.

Pitfall 6: sync.Map Type Safety

sync.Map uses interface{} for key and value types, losing compile-time type checking:

var m sync.Map

m.Store("count", 42)
m.Store("count", "not a number") // Type mismatch only discovered at runtime

val, _ := m.Load("count")
n := val.(int) // If stored value is string, panics here

Go 1.18+ solution—wrap with generics:

type TypedMap[K comparable, V any] struct {
    m sync.Map
}

func (tm *TypedMap[K, V]) Store(key K, value V) {
    tm.m.Store(key, value)
}

func (tm *TypedMap[K, V]) Load(key K) (V, bool) {
    val, ok := tm.m.Load(key)
    if !ok {
        var zero V
        return zero, false
    }
    return val.(V), true
}

Real-World Case: Deadlock Bug in Docker

Docker had a famous deadlock bug (docker/docker#22507): the container's Mutex and network's Mutex formed an AB-BA deadlock. Simplified:

// container.go
func (c *Container) Stop() {
    c.mu.Lock()         // Lock A
    defer c.mu.Unlock()
    c.network.Disconnect(c) // Internally needs Lock B
}

// network.go
func (n *Network) Disconnect(c *Container) {
    n.mu.Lock()         // Lock B
    defer n.mu.Unlock()
    c.UpdateState()     // Needs Lock A -> deadlock!
}

Fix: Reduce lock scope to avoid calling functions that may acquire another lock while holding one lock.

Interview Questions

Is sync.Mutex reentrant? Why not?
- No. Reentrant locks don't protect invariants—functions called while holding the lock may see intermediate state
When are objects in sync.Pool reclaimed?
- May be cleared every GC cycle. Specifically: double buffering: local -> victim -> cleared
When is sync.Map faster than Mutex+map?
- Read-heavy/write-light, or multiple goroutines operating on disjoint key sets
How to detect lock copying?
- go vet tool automatically detects structs containing sync.Mutex etc. being passed by value
What problem does Go 1.9's Mutex starvation mode solve?
- Prevents waiters from being indefinitely preempted by new arrivals. Waiting over 1ms triggers starvation mode, lock handed over directly
Difference between sync.Once and atomic.CompareAndSwap?
- Once guarantees function execution completes before other callers return; CAS only guarantees one executor, doesn't wait for completion

Summary

The sync package is the "low-level but efficient" part of Go's concurrency toolbox. Each primitive has a clear use case:

Primitive	Core Use	Caveats
Mutex	Protecting shared data	Not reentrant, not copyable
RWMutex	Read-heavy shared data	Keep read critical sections short
WaitGroup	Waiting for goroutine group to finish	Add before go
Once	Single initialization	Panic counts as "done"
Pool	Reducing frequent small allocations	Not for connection pools
Map	Specific-pattern concurrent map	Use for read-heavy workloads
Cond	Waiting for condition	Check in for loop
atomic	Single-variable atomic ops	Can't protect multiple variables

Selection criteria:

Need to transfer ownership → channel
Need to protect shared state → mutex
Need to reduce GC → Pool
Need to wait for completion → WaitGroup
Need to do something once → Once

Rate this chapter

4.8 / 5 (37 ratings)

sync Package: Mutex, WaitGroup, Once, Pool

sync Package: Mutex, WaitGroup, Once, Pool

Level 1: What You Need to Know

Mutex: Mutual Exclusion Lock

RWMutex: Read-Write Lock

WaitGroup: Waiting for a Group of Goroutines

sync.Once: Guarantee Single Execution

Practical Example: Concurrency-Safe Cache

Level 2: How It Works Under the Hood

sync.Pool: Object Reuse

sync.Map: Concurrency-Safe Map

Mutex vs Channel: When to Use Which

WaitGroup Internal Implementation

Cond: Condition Variable

Atomic Operations: sync/atomic

Level 3: What the Specification Says

Mutex Implementation: From Spinning to Semaphore

Starvation Mode (Go 1.9+)

sync.Pool and GC Interaction

sync.Once Implementation

Memory Model Guarantees for sync

RWMutex Implementation Details

Level 4: Edge Cases and Pitfalls

Pitfall 1: Lock Copying (Mutex/WaitGroup Must Not Be Copied)

Pitfall 2: Deadlock Patterns

Pitfall 3: RWMutex Writer Starvation

Pitfall 4: sync.Pool Usage Mistakes

Pitfall 5: WaitGroup Reuse Race Condition

Pitfall 6: sync.Map Type Safety

Real-World Case: Deadlock Bug in Docker

Interview Questions

Summary

💬 Comments