Chapter 13

The Go Compiler: From Source to Binary

You have probably never spoken directly to the Go compiler. You type go build, and a few seconds later you have a runnable binary. This transparency is deliberate — the Go compiler is designed to be an invisible tool, letting you focus on the code itself.

But this invisibility comes with a cost: if you do not understand what the compiler is doing, you cannot understand why one piece of code is slower than another, why a certain variable is allocated on the heap instead of the stack, or why a particular function cannot be inlined. More importantly, you cannot make genuinely informed performance optimizations — you are reduced to guessing and relying on intuition.

This chapter opens the black box. We will not only trace the compiler's pipeline — lexing, parsing, type-checking, SSA generation, code generation — but also understand the why behind each step: why was it designed this way? What problem does this design solve? What does Go's approach mean compared to GCC and LLVM?

Level 1: What You Need to Know

Why Understanding the Compiler Matters

Consider these two functionally equivalent pieces of code:

// Version A
func sum(nums []int) int {
    total := 0
    for _, n := range nums {
        total += n
    }
    return total
}

// Version B
func process(data []int) int {
    result := new(int)
    for i := 0; i < len(data); i++ {
        *result += data[i]
    }
    return *result
}

Version A's total is allocated on the stack. Version B's result may be allocated on the heap, requiring GC tracking. This difference originates in the compiler's escape analysis. If you do not understand escape analysis, you may write code with many unnecessary heap allocations, increasing GC pressure and reducing overall throughput.

This is only the tip of the iceberg. The compiler also decides:

Which function calls get inlined (affecting call overhead and register usage)
Which loops get unrolled (affecting branch prediction and cache utilization)
Which objects are "pointer-free" (affecting GC scanning cost)
How to lay out a goroutine's stack frame (affecting stack growth trigger frequency)

Understanding the compiler means understanding the first causes of performance.

Go Compiler Philosophy: Different from GCC and LLVM

Go's compiler (commonly called gc — the Go Compiler, not the Garbage Collector) has fundamental design differences from GCC/LLVM.

GCC/LLVM philosophy: generality first.

GCC supports dozens of languages: C, C++, Fortran, Ada, Go (via gccgo), and more. LLVM is a general-purpose compiler infrastructure — Clang (C/C++), Rust, and Swift all build on it. Their core strength is an extremely mature optimizer — decades of accumulated optimization passes, including vectorization, auto-parallelization, and sophisticated register allocation.

The cost is complexity. LLVM IR (Intermediate Representation) is a general format; to support every feature of every language, it must be very complex. LLVM itself is millions of lines of code. Compile times are relatively long (though much better than early C++), and debugging optimization processes requires specialist expertise.

Go compiler philosophy: speed and predictability first.

Go's gc compiler is designed for Go alone and is roughly 150,000 lines of code. Its design goals are very clear:

Extremely fast compilation. Rob Pike has said on multiple occasions that Go's compilation speed is a design decision, not an accident. Each package compiles only once, there are no circular dependencies, and import paths correspond directly to filesystem paths — all of this minimizes the complexity of the compilation dependency graph.
Predictable optimization behavior. Go's compiler optimizations are conservative. It does not perform aggressive cross-function analysis (except for inlining), does not change your memory layout, and does not generally convert your loops to SIMD instructions. This predictability means you can understand program behavior by reading assembly output.
Self-hosted. The Go compiler is written in Go itself (since Go 1.5). This means improvements to Go directly improve the compiler itself, creating a positive feedback loop.
Tight runtime integration. The Go compiler and Go runtime are tightly coupled. The compiler knows how to generate hooks for goroutine scheduling, how to lay out the pointer maps required by GC, and how to generate the stack-growth prologue code. This coupling makes Go's concurrency and GC efficient, but it also means Go cannot be used as a general-purpose backend like LLVM.

GCC/LLVM architecture:
  Front-end (C/C++/Rust/Swift) → Generic IR → Optimization Passes → Back-end (x86/ARM/...)

Go gc architecture:
  Go source → Dedicated IR (AST/SSA) → Go-specific optimizations → Go-specific back-end
                  ↕ Tightly integrated with runtime (GC, goroutines, defer)

This is not to say Go's compiler is "worse" than GCC/LLVM — they serve different goals. Go's choice is: sacrifice some peak optimization performance in exchange for extremely fast compilation and predictable behavior.

The Compiler's Place in the Go Toolchain

When you run go build main.go, you actually trigger a chain of cooperating tools:

go build
  │
  ├─ compile (the gc compiler)
  │    Each .go file → package-level .a archive
  │    Output: object file (machine code + symbol info + relocation info)
  │
  ├─ asm (assembler)
  │    Processes .s assembly files (e.g., hand-written asm in runtime)
  │
  └─ link (linker)
       Merges all object files → final executable
       Handles symbol resolution, relocation, CGO linking

Go's linker is also purpose-built for Go, unlike GNU ld. It supports Go-specific features like -trimpath (strip source paths), -buildmode=pie (position-independent executable), and -buildmode=plugin (dynamic plugins).

Level 2: The Principle — Compilation Pipeline in Detail

Stage 1: Lexical Analysis (Lexer)

The lexer (or scanner) converts a raw stream of characters into a token sequence. Tokens are the smallest semantic units of the language: keywords, identifiers, literals, operators, delimiters.

Go's lexical analysis has one famous special design: automatic semicolon insertion. Go source usually contains no semicolons, but Go's grammar conceptually has them (each statement ends with a semicolon). The lexer automatically inserts a semicolon when:

The last token on the current line is an identifier, or an integer/float/imaginary/character/string literal
The last token on the current line is break, continue, fallthrough, return, ++, --, ), ], or }

This is why the following code is a compile error:

// Error: the lexer inserts a semicolon before '{', making the if statement incomplete
func f() {
    if x > 0
    {
        fmt.Println("positive")
    }
}

Go's lexer is implemented in src/cmd/compile/internal/syntax/scanner.go. It is extremely fast — a single-pass scan with no backtracking.

Stage 2: Parsing and AST Construction

The parser consumes the token sequence and, following Go's grammar rules (a context-free grammar defined in BNF form), constructs the Abstract Syntax Tree (AST).

The AST is a tree representation of the source code's structure, where each node represents a syntactic construct:

Source: a + b * c

AST:
    BinaryExpr (+)
    ├── Ident (a)
    └── BinaryExpr (*)
        ├── Ident (b)
        └── Ident (c)

Go's parser is a recursive descent parser — each grammar rule corresponds to a function, and the AST is built through recursive function calls. Recursive descent parsers are simple to implement and recover from errors easily; the trade-off is that they cannot handle certain left-recursive grammars (but Go's grammar deliberately avoids left recursion).

You can use the standard library's go/parser and go/ast packages to inspect the AST:

package main

import (
    "go/ast"
    "go/parser"
    "go/token"
    "fmt"
)

func main() {
    src := `package main
func add(a, b int) int {
    return a + b
}`
    fset := token.NewFileSet()
    f, err := parser.ParseFile(fset, "", src, 0)
    if err != nil {
        panic(err)
    }
    ast.Print(fset, f)
}

Running this shows the complete AST node tree, including the type and position of each node. Understanding the AST is essential for writing Go code generation tools like stringer and mockgen.

Stage 3: Type Checking

Type checking is one of the compiler's most complex stages. It traverses the AST to accomplish:

Name Resolution. Links each use of an identifier to its definition. In x := 1; fmt.Println(x), the second x must resolve to the first x's definition.
Type Inference. Infers types for short variable declarations and composite literals. x := 1 → x has type int.
Type Compatibility Checking. Ensures operand types are compatible. var x int = "hello" produces an error.
Interface Satisfaction Verification. Checks whether a type implements an interface. This cannot be done at the AST level — it requires complete type information.
Constant Folding. Evaluates compile-time constant expressions. const x = 2 * 3 + 1 → x = 7, computed at type-check time.

The type checker is implemented in src/cmd/compile/internal/types2 (shared with go/types since Go 1.18). After generics were introduced in Go 1.18, the type checker became significantly more complex, needing to handle type parameter instantiation and constraint checking.

Stage 4: Converting to IR and then SSA

After type checking, the compiler converts the AST into a lower-level Intermediate Representation (IR), then into SSA (Static Single Assignment) form.

Why SSA?

SSA is an IR form where each variable is assigned exactly once. If a variable needs multiple assignments, multiple "versions" of the variable are created (distinguished by subscripts).

Original code:          SSA form:
x = 1                   x₁ = 1
x = x + 2              x₂ = x₁ + 2
y = x * 3              y₁ = x₂ * 3
x = y                   x₃ = y₁

The core advantage of SSA is making data-flow analysis straightforward:

Dead Code Elimination: If a variable version (like x₁) is never read, its defining statement is dead code and can be safely deleted.
Constant Propagation: If x₁ = 5 and x₁ has only one definition, every use of x₁ can be replaced with the constant 5.
Common Subexpression Elimination (CSE): If a + b is computed twice with no modification to a or b in between, the first computation's result can be reused.

Go's SSA generation lives in src/cmd/compile/internal/ssa/. This package has roughly 80,000 lines of code — the largest subpackage in the entire compiler.

You can output intermediate SSA states (very detailed, mainly for compiler development):

GOSSAFUNC=add go build -v .
# Generates ssa.html; open in a browser to see SSA changes through each optimization pass

Stage 5: SSA Optimization Passes

Go's SSA optimizer runs approximately 60 optimization passes in sequence (defined in src/cmd/compile/internal/ssa/compile.go). The most important ones:

Escape Analysis

Escape analysis decides whether each variable is allocated on the stack or the heap. It runs before SSA construction (at the IR stage) and is the most important performance mechanism in Go. The next section covers it in detail.

Inlining

Replaces a function call site with the function's body. Inlining eliminates function call overhead (argument passing, return address push, stack frame setup/teardown) and creates opportunities for subsequent optimizations (such as constant propagation after inlining).

Dead Code Elimination

Removes code that will never execute. Blocks like if false { ... } are deleted. The more powerful form uses SSA-based data-flow analysis to remove computations whose results are never used.

Register Allocation

SSA variables are virtual registers that must ultimately be mapped to physical registers (x86-64 has 16 general-purpose registers). Go uses a graph-coloring algorithm for register allocation.

Nil Check Elimination

If the compiler can prove that a pointer is definitely non-nil at a given code point (e.g., after a successful type assertion), it can eliminate subsequent nil checks, reducing unnecessary branches.

Stage 6: Code Generation

After SSA optimization, the compiler converts SSA into machine code for the target architecture. Go supports these major architectures:

Architecture	GOARCH value	Notes
x86-64	amd64	Most mainstream; primary Go toolchain target
ARM64	arm64	Apple Silicon, AWS Graviton
x86-32	386	32-bit x86, less common
ARM	arm	Embedded systems, Raspberry Pi
RISC-V 64	riscv64	Emerging; supported since Go 1.14
WebAssembly	wasm	Browser/WASI environments

Code generation output is an object file (.o) containing:

Machine instruction sequences
Symbol table (function names, variable names and their addresses)
Relocation table (addresses to be filled in at link time)
DWARF debug information (mapping of functions, variables, line numbers)
Go-specific pclntab (program counter to line number table, used for stack unwinding and panic messages)

Level 3: Code Practice

Escape Analysis: Precisely Understanding the Heap-Stack Boundary

Escape analysis answers the question: does this variable's lifetime exceed the lifetime of the function it lives in? If so, it must be allocated on the heap ("escapes" to the heap); otherwise, it can be allocated on the stack.

Advantages of stack allocation:

Extremely fast allocation (just moving the stack pointer)
No GC pressure (automatically reclaimed when the function returns)
Better locality (near the function's other variables in cache)

Use -gcflags="-m" to see escape analysis output:

go build -gcflags="-m" ./...
# Or for just the current package:
go build -gcflags="-m" .
# More detail (shows escape reasons):
go build -gcflags="-m=2" .

Case 1: Returning a pointer causes escape

// escape1.go
package main

func newInt() *int {
    x := 42      // x will escape
    return &x    // Returns x's address to the caller; x outlives this function
}

func main() {
    p := newInt()
    _ = p
}

$ go build -gcflags="-m" escape1.go
# command-line-arguments
./escape1.go:4:2: moved to heap: x

The compiler reports that x is moved to the heap. The reason is that &x is returned — the caller holds x's address, and the caller's lifetime exceeds newInt()'s, so x must be heap-allocated.

Case 2: Interface boxing causes escape

// escape2.go
package main

import "fmt"

func printValue(v interface{}) {
    fmt.Println(v)
}

func main() {
    x := 42
    printValue(x)  // x escapes: boxing into interface{} requires heap allocation
}

$ go build -gcflags="-m" escape2.go
./escape2.go:11:12: x escapes to heap
./escape2.go:6:14: v does not escape  // fmt.Println has special internal optimization

When boxing an int into interface{}, if the compiler cannot prove that the interface value will not escape, it allocates the underlying data on the heap. This is one of the most common sources of "hidden heap allocations" in Go — high-performance code should avoid unnecessary interface boxing where possible.

Case 3: Closure capture causes escape

// escape3.go
package main

func makeCounter() func() int {
    count := 0          // count will escape
    return func() int {
        count++
        return count
    }
}

func main() {
    counter := makeCounter()
    _ = counter()
}

$ go build -gcflags="-m" escape3.go
./escape3.go:4:2: moved to heap: count
./escape3.go:5:9: func literal escapes to heap

The closure captures count, and the closure itself escapes to the heap (as a return value), so count must also escape.

Case 4: Large objects are forced to the heap

// escape4.go
package main

func largeStack() {
    // Objects exceeding a certain size (currently ~64KB) are forced to the heap
    var buf [1 << 17]byte  // 128KB
    _ = buf[0]
}

$ go build -gcflags="-m" escape4.go
./escape4.go:4:6: moved to heap: buf

Go's default initial stack is only 2-8KB. Although it can grow dynamically, individual objects that are too large are allocated directly on the heap to avoid the high cost of a single large stack growth.

Escape analysis decision tree:

Does variable v need heap allocation?

1. Is v's address passed to a longer-lived scope?
   - return &v → heap
   - assign to global variable → heap
   - store in interface{} and that interface escapes → heap

2. Is v captured by a closure, and does that closure escape? → heap

3. Does v's size exceed the stack size threshold (~64KB)? → heap

4. Can the compiler not determine v's size at compile time?
   - make([]T, n) where n is a runtime value → heap

5. None of the above → stack (safe, zero GC pressure)

Inlining Optimization: Eliminating Function Call Overhead

What is inlining?

Inlining replaces a call site with the callee's function body, eliminating the overhead of function calls:

The call instruction itself (CALL/RET)
Copying arguments and return values through memory
Stack frame setup and teardown (adjusting SP, saving BP)
Potential register save/restore

For short functions (a few instructions), the overhead of the function call itself can account for 30-50% of total execution time.

Go's inlining conditions

Go uses a "budget" model to control inlining: each function has an "inline cost" measured in AST node count, and the default budget is 80 (with -gcflags=-l=4 the limit can be relaxed further).

Factors that prevent inlining:

The function contains recover()
The function contains select (channel multiplexing)
The function is marked with //go:noinline
The function's inline cost exceeds the budget

Viewing inlining decisions

# -m=2 reports each inlining decision in detail with reasons
go build -gcflags="-m=2" .

Example:

// inline1.go
package main

import "fmt"

//go:noinline  // Force no inlining, for comparison
func addNoInline(a, b int) int {
    return a + b
}

func addInline(a, b int) int {  // Simple enough, will be inlined
    return a + b
}

func main() {
    x := addNoInline(1, 2)
    y := addInline(3, 4)
    fmt.Println(x, y)
}

$ go build -gcflags="-m=2" inline1.go
./inline1.go:12:6: can inline addInline with cost 4
./inline1.go:17:14: inlining call to addInline
./inline1.go:6:6: addNoInline cannot be inlined (marked go:noinline)

The cascade effect of inlining

Inlining not only eliminates call overhead but also creates opportunities for subsequent optimizations:

func isPositive(x int) bool {
    return x > 0
}

func classify(x int) string {
    if isPositive(x) {  // isPositive will be inlined
        return "positive"
    }
    return "non-positive"
}

// After inlining, equivalent to:
func classify(x int) string {
    if x > 0 {  // Direct comparison; compiler can optimize further
        return "positive"
    }
    return "non-positive"
}

After inlining, the compiler sees the complete data flow and may perform further constant propagation, dead code elimination, and other optimizations.

Performance comparison experiment

// bench_inline_test.go
package main

import "testing"

//go:noinline
func squareNoInline(x int) int { return x * x }

func squareInline(x int) int { return x * x }

func BenchmarkNoInline(b *testing.B) {
    sum := 0
    for i := 0; i < b.N; i++ {
        sum += squareNoInline(i)
    }
    _ = sum
}

func BenchmarkInline(b *testing.B) {
    sum := 0
    for i := 0; i < b.N; i++ {
        sum += squareInline(i)
    }
    _ = sum
}

$ go test -bench=. -benchmem
BenchmarkNoInline-8    1000000000    0.8 ns/op    0 B/op    0 allocs/op
BenchmarkInline-8      2000000000    0.4 ns/op    0 B/op    0 allocs/op

The inlined version is roughly twice as fast — for such a minimal function, function call overhead truly dominates.

Level 4: Advanced Topics and Edge Cases

Link-Time Optimization and Build Constraints

Build Constraints

Go's build constraints let you provide different implementations for specific platforms, operating systems, architectures, or tags:

//go:build linux && amd64

package syscall

// This file only compiles on Linux + amd64

The new syntax (Go 1.17+) uses //go:build comments; the old syntax uses // +build. Both can coexist for backward compatibility.

Advanced usage of build constraints:

# Build only code with the production tag
go build -tags production .

# Conditional compilation of different database drivers
go build -tags mysql .
go build -tags postgres .

//go:build mysql

package db

import _ "github.com/go-sql-driver/mysql"

func init() {
    // Register MySQL driver
}

Cross-Compilation Internal Mechanics

Go's cross-compilation is one of its most powerful features and works almost out of the box:

# Compile a Linux amd64 binary on macOS
GOOS=linux GOARCH=amd64 go build -o server-linux .

# Compile a Windows binary
GOOS=windows GOARCH=amd64 go build -o server.exe .

# Compile ARM64 (Apple Silicon native)
GOOS=darwin GOARCH=arm64 go build -o server-arm64 .

Internal mechanics: Go's standard library has corresponding implementation files for each GOOS/GOARCH combination (distinguished by build constraints). The runtime package contains extensive assembly code separated by suffixes like _amd64.s and _arm64.s. Go's toolchain includes code generators for all architectures — no external toolchain is needed (but CGO, if used, requires the corresponding C cross-compiler).

CGO and the Complexity of Cross-Compilation

Using CGO makes cross-compilation significantly more complex:

# CGO_ENABLED=0 disables CGO, usually enabling true cross-platform static compilation
CGO_ENABLED=0 GOOS=linux go build -o server-linux .

# Use the pure-Go net package (no dependency on system DNS resolver)
CGO_ENABLED=0 go build -tags netgo .

Compiler Directives

Go passes instructions to the compiler through special comments (//go:xxx). These are not ordinary comments — the compiler parses and acts on them.

//go:noinline: Force no inlining

//go:noinline
func criticalFunc() {
    // Use when you need precise function attribution in profiling
    // Or when inlining would cause unacceptable code size bloat
}

Use case: during profiling, inlining causes functions to disappear in pprof (merged into the caller). Use //go:noinline to preserve function boundaries for accurate profiling.

//go:nosplit: Disable stack growth check

//go:nosplit
func atomicOp() {
    // This function must not have a stack growth prologue
    // Must guarantee this function and its call chain will not trigger stack growth
}

Every function call in Go has a "stack growth check" prologue that verifies the current goroutine's stack has enough room for the current function's frame. If not, stack growth is triggered (runtime.morestack).

//go:nosplit disables this check. The function must guarantee its frame is small (typically <128 bytes) and does not call any function that could trigger stack growth. Used primarily in:

Signal handler functions in the runtime package (cannot switch goroutine stacks during signal handling)
Assembly-implemented low-level atomic operations
Critical paths that interoperate with C code

//go:noescape: Tell the escape analyzer a parameter does not escape

// Declares a function implemented in assembly
//go:noescape
func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)

This directive tells the escape analyzer: the pointer arguments passed to this function will not escape to the heap through it. Since the function body is in assembly, the escape analyzer cannot automatically derive this — it must be declared manually.

Without this directive, the escape analyzer conservatively assumes the passed pointer might escape, causing unnecessary heap allocations.

//go:linkname: Access private symbols across packages

//go:linkname localName importpath.remoteName

This directive allows access to unexported symbols from another package. It is mainly used for internal cooperation between runtime and the standard library. Ordinary code should not use this directive — it breaks Go's encapsulation and has no compatibility guarantees across Go versions.

//go:generate: Code generation

//go:generate stringer -type=Direction
type Direction int
const (
    North Direction = iota
    South
    East
    West
)

Running go generate ./... executes all commands in //go:generate comments. This is not a compiler feature but a go toolchain feature; it is nonetheless an essential part of Go's code generation workflow.

Assembly Output Analysis: Seeing Through the Compiler

The most direct way to inspect compilation results is to look at the generated assembly:

# View assembly for all functions in the main package (Plan 9 assembly format)
go tool compile -S main.go

# Or view actual machine code via objdump
go build -o main . && go tool objdump -s main.main main

Example analysis:

// simple.go
package main

func add(a, b int) int {
    return a + b
}

func main() {
    println(add(3, 4))
}

$ go tool compile -S simple.go 2>&1 | grep -A 10 "\"\"\.add"
"".add STEXT nosplit size=19 args=0x18 locals=0x0 funcid=0x0
    0x0000 00000 TEXT    "".add(SB), NOSPLIT|ABIInternal, $0-24
    0x0000 00000 MOVQ    AX, "".a+8(SP)   // Note: Go 1.17+ uses register-based calling convention
    ...

Since Go 1.17, Go uses a register-based calling convention (Register-based ABI): function arguments and return values are passed in registers rather than all through the stack. This was a major performance improvement, speeding up common function calls by roughly 5-10%.

Compiler Version and Performance Evolution

Understanding the evolution of compiler versions helps you make informed technology choices and set performance expectations:

Go version	Key compiler improvements
1.5	Compiler migrated from C to Go (self-hosting complete)
1.7	SSA back-end introduced; major AMD64 performance gains
1.9	Inlining improvements; support for more complex function inlining
1.12	Escape analysis rewritten based on SSA; more precise
1.17	Register-based calling convention (AMD64 first); reduced stack access
1.18	Generics support; type checker rewritten
1.20	PGO (Profile-Guided Optimization) experimental support
1.21	PGO officially available; automatic inlining optimization

PGO (Profile-Guided Optimization)

Go 1.21 introduced official PGO support. PGO uses profiling data from production environments to guide the compiler toward more aggressive optimizations:

# Step 1: collect profiling data
go build -o myapp .
./myapp &  # run the program
curl http://localhost:6060/debug/pprof/profile > cpu.prof

# Step 2: recompile using the profile data
go build -pgo=cpu.prof -o myapp-pgo .

PGO's main benefits come from: more aggressive inlining (relaxed budget for hot functions) and better function layout (concentrating hot code to improve cache hit rates). Google reports that PGO yields roughly 2-7% performance improvements in production environments.

Understanding the full picture of Go's compiler lets you make more informed code decisions — not based on folklore "best practices," but on precise knowledge of how your tools actually work. The next chapter dives into the memory allocator, which is tightly coupled with the compiler's escape analysis as two mutually reinforcing systems.

Rate this chapter

4.7 / 5 (28 ratings)

The Go Compiler: From Source to Binary

The Go Compiler: From Source to Binary

Level 1: What You Need to Know

Why Understanding the Compiler Matters

Go Compiler Philosophy: Different from GCC and LLVM

The Compiler's Place in the Go Toolchain

Level 2: The Principle — Compilation Pipeline in Detail

Stage 1: Lexical Analysis (Lexer)

Stage 2: Parsing and AST Construction

Stage 3: Type Checking

Stage 4: Converting to IR and then SSA

Stage 5: SSA Optimization Passes

Stage 6: Code Generation

Level 3: Code Practice

Escape Analysis: Precisely Understanding the Heap-Stack Boundary

Inlining Optimization: Eliminating Function Call Overhead

Level 4: Advanced Topics and Edge Cases

Link-Time Optimization and Build Constraints

Compiler Directives

Assembly Output Analysis: Seeing Through the Compiler

Compiler Version and Performance Evolution

💬 Comments