Chapter 17

Channel Internals: Send, Receive and Select

Chapter 17: Channel Internals: Send, Receive and Select

"Do not communicate by sharing memory; instead, share memory by communicating."

These words from Rob Pike form the soul of Go's concurrency philosophy. Many programmers dismiss this as marketing copy the first time they read it — after all, data still lives in memory, so how can channels truly bypass "shared memory"?

This chapter dissects channel internals at the source-code level to answer that question, and reveals why channels, when used correctly, make concurrent code safer and more tractable than locks. But we don't shy away from the costs and traps: channels are not a silver bullet, and in certain scenarios they are ten times slower than a mutex. Understanding the reasons behind this allows you to choose the right tool for each job.


Level 1: Philosophy — Why CSP

Two Paradigms of Concurrency

In the history of systems software, concurrency has taken two dominant forms:

Shared Memory: Threads cooperate by accessing the same block of memory. You protect critical sections with mutexes, notify waiters with condition variables, and synchronize lock-free code with atomic operations. C, C++, Java, and Python's threading module all follow this model.

The problem: shared memory makes data ownership implicit and runtime-determined. A lock doesn't protect data itself — it protects access to data. You must remember which data needs protection, where to lock, and where to unlock. Miss one spot and you have a data race; get the lock order wrong and you have a deadlock. This implicit ownership is extremely difficult to maintain correctly in large codebases.

Message Passing: Processes/coroutines cooperate by sending messages; data ownership transfers with each message. Erlang, Elixir, and Rust's channels all follow this model.

Go chose CSP (Communicating Sequential Processes), formalized by Tony Hoare in 1978. CSP's central thesis: cooperation between concurrent entities should be expressed through synchronized communication, not through shared state.

The Core Insight of CSP

CSP does not claim "avoid memory" — it claims "data ownership should be explicit and singular." When you send a value through a channel:

  1. If you send a value, the receiver gets a copy. Both parties own independent copies of the data. There is no sharing.
  2. If you send a pointer, Go's convention (not enforced) is that the sender stops using the pointer after sending; ownership has transferred.

This transforms implicit "lock-protected regions" into explicit "ownership transfers." The code reads like a description of "this data flows from A to B," rather than "take the lock before touching this data."

That is the true meaning of "share memory by communicating": use the timing of communication to implicitly guarantee singular ownership, rather than using locks to explicitly protect shared access.

Channel Semantics

Go channels have three possible states and two varieties:

Type Semantics
Unbuffered channel (make(chan T)) Sender and receiver must both be ready simultaneously before a transfer can complete (synchronous rendezvous)
Buffered channel (make(chan T, n)) Sending does not block when the buffer is not full; receiving does not block when the buffer is not empty

An unbuffered channel is the pure form of CSP: send and receive are a synchronous handshake. A buffered channel introduces asynchrony; the buffer acts as a capacity-limited queue.


Level 2: Internals — The hchan Structure

The hchan Struct

Every make(chan T, n) call allocates an hchan struct on the heap (runtime/chan.go):

hchan {
    qcount   uint           // number of elements currently in the buffer
    dataqsiz uint           // buffer capacity (second argument to make)
    buf      unsafe.Pointer // pointer to the ring buffer
    elemsize uint16         // size of a single element in bytes
    closed   uint32         // whether the channel is closed (0=open, 1=closed)
    elemtype *_type         // element type info (for GC scanning)
    sendx    uint           // send index (write position in the ring buffer)
    recvx    uint           // receive index (read position in the ring buffer)
    recvq    waitq          // queue of goroutines waiting to receive (sudog linked list)
    sendq    waitq          // queue of goroutines waiting to send (sudog linked list)
    lock     mutex          // mutex protecting all hchan fields
}

This struct reveals a key fact: a channel is fundamentally a locked ring queue with two wait queues attached.

Ring Buffer Memory Layout

For buffered channels, buf points to a contiguous memory region of size elemsize * dataqsiz:

buf (elemsize=8, dataqsiz=4)

  recvx=1          sendx=3
     ↓                 ↓
  ┌─────┬─────┬─────┬─────┐
  │ [0] │ [1] │ [2] │ [3] │
  │  -  │  A  │  B  │  -  │   qcount=2
  └─────┴─────┴─────┴─────┘
           ↑
       next element to dequeue

sendx and recvx act as cursors that advance around the array in a circular fashion (using modulo arithmetic). The design's advantage: memory is contiguous and cache-friendly; no dynamic node allocation is needed — only pointer advancement within a fixed-size array.

sudog: The Goroutine Wait Proxy

When a goroutine attempts to send into a full buffer (or receive from an empty buffer), it cannot proceed and must "park." The runtime creates (or retrieves from a pool) a sudog struct:

type sudog struct {
    g        *g            // the waiting goroutine
    next     *sudog        // next node in the linked list
    prev     *sudog        // previous node in the linked list
    elem     unsafe.Pointer // pointer to data to send/receive
    c        *hchan        // the channel being waited on
    // ... other fields (select-related)
}

Both sendq and recvq are of type waitq, which is a doubly-linked list of sudogs:

recvq:
  head → [sudog: g=G1, elem=&x1] → [sudog: g=G2, elem=&x2] → nil
         ↑
       earliest waiting goroutine at the head (FIFO order)

sudogs are retrieved from a global pool (runtime.sudog pool) to avoid frequent allocation. Each goroutine may wait on at most one channel at a time (select is the exception, detailed below).

Three Send Paths

When ch <- val executes, the runtime's chansend function checks three cases in priority order:

Path 1: Direct Delivery

If recvq is non-empty, a goroutine is waiting to receive. The runtime can bypass the buffer entirely and write data directly into the waiting goroutine's memory location:

G_sender  --[direct write to elem]--> G_receiver's stack variable
                                             ↑
                                      sudog.elem points here

This is the fastest path: data is copied only once (sender to receiver), the buffer is never involved. The runtime then calls goready(gp) to remove the receiver from recvq and wake it.

Path 2: Write to Buffer

If the buffer has space (qcount < dataqsiz), data is copied into buf[sendx], sendx is incremented (modulo capacity), qcount is incremented, and the send returns immediately without blocking the sender.

Path 3: Block

The buffer is full (or the channel is unbuffered) and recvq is empty. A sudog is created recording the sending goroutine and its data address, added to sendq, then gopark is called to suspend the current goroutine and release its M (OS thread) for the scheduler to run other goroutines.

Three Receive Paths

The logic of chanrecv mirrors chansend:

  1. sendq non-empty and unbuffered: Copy data directly from the sender's sudog, wake the sender.
  2. sendq non-empty and buffered: Dequeue data from the buffer head, write the sender's sudog data to the buffer tail (preserving FIFO order), wake the sender.
  3. Buffer has data: Read from the buffer; no goroutine switching involved.
  4. Block: Buffer is empty and sendq is empty; create a sudog, add to recvq, call gopark.

Parking and Waking Goroutines

gopark and goready are the core of the channel blocking mechanism:

gopark(unlockf, lock, reason, ...):
  1. Change current goroutine state from _Grunning to _Gwaiting
  2. Call unlockf to release hchan.lock (atomically "park and unlock")
  3. Call schedule(), yield the M, let the scheduler run other goroutines

goready(gp, ...):
  1. Change gp state from _Gwaiting to _Grunnable
  2. Place gp into the current P's local run queue (or global queue)
  3. If an idle P exists, may trigger newproc1 to wake additional M's

The key insight: a goroutine "blocking" on a channel is not a thread blocking — it is a cooperative yield of execution. The blocked goroutine's M continues executing other goroutines. This is the foundation of Go's efficient concurrency model.


Level 3: Code Patterns

Pattern 1: Pipeline

A pipeline is the most classic channel usage, decomposing processing into independent stages:

package main

import "fmt"

// generate produces a sequence of integers
func generate(nums ...int) <-chan int {
    out := make(chan int)
    go func() {
        for _, n := range nums {
            out <- n
        }
        close(out)
    }()
    return out
}

// square squares each integer
func square(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        for n := range in {
            out <- n * n
        }
        close(out)
    }()
    return out
}

func main() {
    // Build pipeline: generate -> square -> square
    c := generate(2, 3, 4, 5)
    c = square(c)
    c = square(c)

    for n := range c {
        fmt.Println(n)  // 16, 81, 256, 625
    }
}

The key rule of pipelines: the producer is responsible for closing the channel; the consumer iterates with range (which exits automatically when the channel closes). Never close a channel from the receiver side — the sender will panic.

Pattern 2: Fan-out and Fan-in

Fan-out distributes work from one channel to multiple workers; fan-in merges results from multiple channels:

package main

import (
    "fmt"
    "sync"
)

// fanOut distributes input to numWorkers workers
func fanOut(in <-chan int, numWorkers int) []<-chan int {
    outputs := make([]<-chan int, numWorkers)
    for i := 0; i < numWorkers; i++ {
        outputs[i] = worker(in)
    }
    return outputs
}

func worker(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        for n := range in {
            out <- n * n  // simulates compute-intensive work
        }
        close(out)
    }()
    return out
}

// fanIn merges multiple channels into one
func fanIn(inputs ...<-chan int) <-chan int {
    var wg sync.WaitGroup
    merged := make(chan int)

    output := func(c <-chan int) {
        defer wg.Done()
        for n := range c {
            merged <- n
        }
    }

    wg.Add(len(inputs))
    for _, c := range inputs {
        go output(c)
    }

    // Close merged after all inputs are exhausted
    go func() {
        wg.Wait()
        close(merged)
    }()

    return merged
}

func main() {
    in := make(chan int)
    go func() {
        for i := 0; i < 10; i++ {
            in <- i
        }
        close(in)
    }()

    outputs := fanOut(in, 3)
    merged := fanIn(outputs...)

    for n := range merged {
        fmt.Println(n)
    }
}

Pattern 3: Timeout and Cancellation (select + time.After)

package main

import (
    "context"
    "fmt"
    "time"
)

func fetchData(ctx context.Context) (string, error) {
    result := make(chan string, 1)

    go func() {
        // simulate a slow operation
        time.Sleep(200 * time.Millisecond)
        result <- "data from server"
    }()

    select {
    case data := <-result:
        return data, nil
    case <-ctx.Done():
        return "", ctx.Err()  // context.DeadlineExceeded or Canceled
    }
}

func main() {
    // Approach 1: time.After (simple cases)
    ch := make(chan int, 1)
    select {
    case v := <-ch:
        fmt.Println("received:", v)
    case <-time.After(100 * time.Millisecond):
        fmt.Println("timeout!")
    }

    // Approach 2: context (recommended — propagates cancellation)
    ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
    defer cancel()

    data, err := fetchData(ctx)
    if err != nil {
        fmt.Println("error:", err)
        return
    }
    fmt.Println("got:", data)
}

Warning: time.After creates a new Timer on every call. Using it inside a tight loop causes Timer leaks (each Timer is not collected by GC until it fires). In loops, use time.NewTimer + timer.Reset:

timer := time.NewTimer(100 * time.Millisecond)
defer timer.Stop()

for {
    timer.Reset(100 * time.Millisecond)
    select {
    case v := <-ch:
        _ = v
    case <-timer.C:
        // timeout
    }
}

Detecting Channel Leaks

A channel leak occurs when a goroutine blocks on a channel send or receive indefinitely and can never exit, causing the goroutine count to grow continuously.

// Leak example: no exit mechanism
func leaky(ch chan int) {
    for {
        v := <-ch  // blocks forever if nobody sends
        _ = v
    }
}

// Correct: done channel provides an exit path
func notLeaky(ch <-chan int, done <-chan struct{}) {
    for {
        select {
        case v := <-ch:
            _ = v
        case <-done:
            return  // external signal to exit
        }
    }
}

Tools for detecting leaks:

import "runtime"

// Print current goroutine count
fmt.Println(runtime.NumGoroutine())

// Use the goleak library (recommended for tests)
// go get go.uber.org/goleak
func TestNoLeak(t *testing.T) {
    defer goleak.VerifyNone(t)
    // ... test code
}

Sending to a Closed Channel Panics

ch := make(chan int, 1)
close(ch)
ch <- 1  // panic: send on closed channel

// Safe send pattern (using recover, but not recommended as a routine approach)
func safeSend(ch chan int, val int) (closed bool) {
    defer func() {
        if r := recover(); r != nil {
            closed = true
        }
    }()
    ch <- val
    return false
}

The better approach is to guarantee architecturally that only the sender closes a channel; receivers never close. Use sync.Once to ensure a channel is closed exactly once:

type SafeChan struct {
    ch   chan int
    once sync.Once
}

func (s *SafeChan) Close() {
    s.once.Do(func() { close(s.ch) })
}

Level 4: Advanced — select Internals and Performance Traps

How select Randomizes Case Selection

When multiple cases are simultaneously ready, Go's select randomly picks one rather than selecting the first ready case in code order. This randomization is intentional: if select always picked the first ready case, some channels might be permanently starved.

// Demonstrating select's randomness
ch1 := make(chan string, 1)
ch2 := make(chan string, 1)
ch1 <- "one"
ch2 <- "two"

// Across multiple runs, select randomly picks ch1 or ch2
select {
case v := <-ch1:
    fmt.Println("ch1:", v)
case v := <-ch2:
    fmt.Println("ch2:", v)
}

Inside the runtime (selectgo in runtime/select.go), the execution flow of select is:

  1. lockAll: Lock all channels participating in the select (sorted by address to avoid deadlock).
  2. Scan ready cases: Iterate all cases, check whether any can execute immediately (channel has data or space).
  3. Shuffle case order: Use fastrandn to generate a random permutation.
  4. Ready case exists: Randomly pick one, execute it, unlockAll, return.
  5. No ready case (and no default): Create a sudog for each case, add to the corresponding channel's wait queue, unlockAll, gopark to suspend.
  6. After waking: Remove sudogs from all wait queues, execute the selected case, unlockAll, return.
select lock ordering (sorted by hchan address):

case <-ch3  (addr: 0xc00001a080)
case <-ch1  (addr: 0xc00001a0a0)   After sorting: ch1, ch2, ch3
case <-ch2  (addr: 0xc00001a060)        ↑
                                    Lock in ascending address order

This sorting is critical: if two goroutines each run a select with the same set of channels, inconsistent lock ordering would cause deadlock. Address-ordered locking guarantees a globally consistent acquisition order.

The Elegant Use of nil Channels in select

Sending to or receiving from a nil channel blocks forever. But within a select, a nil channel's case is always skipped (never selected). This lets you dynamically enable and disable individual cases:

func merge(ch1, ch2 <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for ch1 != nil || ch2 != nil {
            select {
            case v, ok := <-ch1:
                if !ok {
                    ch1 = nil  // disable this case once ch1 is closed
                    continue
                }
                out <- v
            case v, ok := <-ch2:
                if !ok {
                    ch2 = nil
                    continue
                }
                out <- v
            }
        }
    }()
    return out
}

This is an elegant fan-in implementation: when a channel closes, set it to nil so select stops trying to receive from it, avoiding the infinite zero-value reception problem.

channel vs mutex: Performance Comparison and When to Use Each

Both channels and mutexes can achieve concurrent safety, but their performance characteristics are fundamentally different:

// Benchmark: buffered channel vs mutex counter
// BenchmarkChannel-8    5000000    302 ns/op
// BenchmarkMutex-8     20000000     62 ns/op

// Channel-based counter
func chanCounter(n int) {
    ch := make(chan int, 1)
    ch <- 0
    var wg sync.WaitGroup
    for i := 0; i < n; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            v := <-ch
            ch <- v + 1
        }()
    }
    wg.Wait()
}

// Mutex-based counter
func mutexCounter(n int) {
    var mu sync.Mutex
    count := 0
    var wg sync.WaitGroup
    for i := 0; i < n; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            mu.Lock()
            count++
            mu.Unlock()
        }()
    }
    wg.Wait()
}

Why is a channel slower than a mutex in the counter scenario?

  1. More memory operations: Data must be copied from the stack to the ring buffer and back. A mutex counter only modifies a single integer.
  2. Channel operations may trigger the scheduler: Blocking calls gopark/goready, which involve context switching. A mutex with no contention is a single atomic operation.
  3. Channel must hold its own lock: hchan.lock is a full mutex, whereas many mutex implementations use CAS under no-contention conditions.

Selection Principles:

Scenario Recommended
Transferring data ownership channel
Goroutine coordination (notification, signals) channel
Protecting shared state (simple counter, cache) mutex
Collecting results from parallel computation channel (fan-in)
High-frequency, low-latency critical section mutex or atomic

High-Performance Scenario: Lock-Free Ring Queue

When channel overhead becomes a bottleneck (millions of operations per second), consider a CAS-based lock-free ring queue:

// Simplified lock-free single-producer single-consumer queue (SPSC)
type RingBuffer struct {
    buf  []int64
    head uint64  // consumer read position (atomic)
    _    [56]byte // cache line padding
    tail uint64  // producer write position (atomic)
    _    [56]byte
}

func NewRingBuffer(size uint64) *RingBuffer {
    return &RingBuffer{buf: make([]int64, size)}
}

func (r *RingBuffer) Push(val int64) bool {
    tail := atomic.LoadUint64(&r.tail)
    head := atomic.LoadUint64(&r.head)
    if tail-head >= uint64(len(r.buf)) {
        return false  // full
    }
    r.buf[tail%uint64(len(r.buf))] = val
    atomic.StoreUint64(&r.tail, tail+1)
    return true
}

func (r *RingBuffer) Pop() (int64, bool) {
    head := atomic.LoadUint64(&r.head)
    tail := atomic.LoadUint64(&r.tail)
    if head >= tail {
        return 0, false  // empty
    }
    val := r.buf[head%uint64(len(r.buf))]
    atomic.StoreUint64(&r.head, head+1)
    return val, true
}

Note the [56]byte padding: this prevents head and tail from landing on the same CPU cache line (64 bytes), avoiding false sharing. This detail can deliver a 2-3x performance improvement in high-throughput scenarios.

The Subtle Relationship Between select and Goroutine Leaks

A common trap: when a select participates in multiple goroutines waiting on the same channel, only one receives each value:

// Question: do all 5 goroutines exit after close(done)?
done := make(chan struct{})
for i := 0; i < 5; i++ {
    go func(id int) {
        select {
        case <-done:
            fmt.Printf("goroutine %d exiting\n", id)
        }
    }(i)
}

// close broadcasts to all goroutines waiting on the channel
close(done)  // Correct! All goroutines exit.

Closing a channel is a broadcast mechanism: all goroutines waiting on the channel are woken and receive the zero value. This is fundamentally different from sending a value (only one goroutine receives it). The cancellation mechanism of context.Context is built on exactly this property.


Summary

Channels are the central abstraction of Go's concurrency model. At the implementation level, a channel is a locked ring queue plus two wait queues. At the semantic level, it is a carrier of ownership transfer that converts implicit lock protection into explicit data flow.

Understanding hchan's three send paths (direct delivery, write to buffer, block) and select's lockAll/randomization mechanism helps you write correct and efficient concurrent code. In performance-sensitive scenarios, the overhead of channels (lock + possible scheduling switch) may be a bottleneck; in those cases, prefer mutex or atomic operations.

Two core rules to remember: the sender closes the channel; every goroutine must have an exit path.

Rate this chapter
4.9  / 5  (17 ratings)

💬 Comments