Chapter 22

CGo: The Cost of Dancing with C

There is an unwritten consensus in the Go ecosystem: if your project uses CGo, you have chosen a path that demands exceptional care. CGo lets a Go program call C code, and that capability is sometimes irreplaceable — the world holds an enormous number of mature C libraries (SQLite, OpenSSL, BLAS/LAPACK, libpcap, Linux kernel interfaces) that have been refined over decades, and the equivalent pure-Go implementations either do not exist or have a significant quality gap.

But CGo's costs are real, and frequently underestimated. Every call that crosses the Go/C boundary takes roughly 100 nanoseconds. The goroutine must switch to a system stack. The garbage collector's pause times can increase. Cross-compilation becomes nearly impossible. Build times grow noticeably. Debugger support becomes unreliable.

This chapter is not an argument against CGo. It is a guide to using it correctly when you genuinely need it.

Level 1: What You Need to Know

When CGo Is Necessary

Case 1: Calling mature C libraries

There are C libraries that have been battle-tested for decades, and whose reliability pure-Go implementations cannot match in the short term:

SQLite: The most widely deployed database in the world. mattn/go-sqlite3 wraps it via CGo and is the most widely used SQLite driver in the Go ecosystem. A pure-Go implementation (modernc/sqlite) exists, but historically has had a meaningful performance gap, and differences in behavioral edge cases.
OpenSSL / BoringSSL: In FIPS 140-2/3 regulated environments (finance, healthcare, government), a certified cryptographic implementation is mandatory, and Go's standard library crypto package is not in scope for those certifications.
Graphics and audio libraries: OpenGL (go-gl), Vulkan, PortAudio — no independent pure-Go implementations exist.
ML acceleration: Calling cuBLAS, MKL, or similar hardware-accelerated libraries to operate on GPUs from Go.

Case 2: OS-level APIs

Some OS APIs have no better calling mechanism:

Specific ioctl commands (character device driver interaction)
Linux's io_uring interface (before Go provides higher-level wrappers)
Special system calls for interacting with kernel modules

Case 3: Legacy system integration

Enterprises with large existing C/C++ codebases use CGo as a bridge during incremental migration to Go — letting Go code call existing C libraries while gradually replacing them, rather than rewriting everything at once.

CGo Is Not Free: A Cost Overview

Before deciding to use CGo, understand its costs clearly:

Cost dimension	What it means
Call overhead	~100ns per Go→C call (vs. ~1ns for a Go function call)
Build time	Introduces the C toolchain; build times increase 2–10×
Cross-compilation	`CGO_ENABLED=0` enables full cross-compilation; with CGo you need a C cross-compiler for the target platform
GC pressure	C memory is invisible to the GC; requires manual `C.malloc`/`C.free`; risk of memory leaks
Debugging	Mixed stack frames; `dlv`/`gdb` behave unpredictably at C frames
Static analysis	`go vet`, `staticcheck`, etc. have limited coverage over CGo code
Docker image size	CGo prevents using a `scratch` base image; requires `alpine` or `debian`

Level 2: Principles

The Mechanics of a CGo Call

Understanding CGo's overhead requires understanding its call path. A single Go→C call goes through these steps:

Go goroutine (user stack, 2KB–1GB dynamic)
    ↓
1. entersyscall / cgocall entry point
    ↓
2. Switch goroutine state to _Csyscall
    ↓
3. Save Go register state
    ↓
4. Switch to M (OS thread) system stack (fixed size, default 8MB)
    ↓
5. Execute C function (on system stack)
    ↓
6. C function returns
    ↓
7. Switch back to Go goroutine stack
    ↓
8. Restore goroutine state to _Grunning
    ↓
9. Continue executing Go code

Why must we switch to the system stack?

Go goroutines use a segmented/copying stack — they start at 2KB and grow on demand. C code knows nothing about this mechanism; it assumes the stack is contiguous and of fixed size (C calling convention). If a C function runs on Go's small stack, any C function with deep recursion or large local variables could overflow the stack.

Therefore, every Go→C call must switch to the M's system stack (typically 8MB, similar to an OS thread's default stack size). This switch requires saving and restoring registers and updating scheduler state — roughly 60–100ns of overhead.

GC and CGo interaction

When a goroutine is executing C code:

The Go GC does not know when the C code will finish
When GC triggers, it marks this goroutine as _Csyscall and proceeds without waiting
This means a long-running CGo call delays when the GC can reclaim memory, increasing heap peak usage

The Go runtime tracks whether execution is inside a CGo call via getg().m.incgo and similar mechanisms, but this tracking also has overhead.

Stack Layout During a CGo Call

OS Thread M:
┌─────────────────────────────────────────────────────┐
│  System stack (fixed 8MB)                           │
│  ┌─────────────────────────────────────────────┐   │
│  │  C frame N                                   │   │
│  │  C frame N-1                                 │   │
│  │  ...                                         │   │
│  │  CGo bridge code (cgocall / cgocallback)     │   │
│  └─────────────────────────────────────────────┘   │
│                                                     │
│  Goroutine G:                                       │
│  ┌─────────────────────────────────────────────┐   │
│  │  Go frames (goroutine stack, 2KB–1GB)        │   │
│  │  callCgoFunc(...)  ← call origin             │   │
│  │  [goroutine suspended, waiting for C return] │   │
│  └─────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

The Boundary Between C Memory and Go's GC

Go's GC manages only memory allocated on the Go heap. Memory allocated by C.malloc lives on the C heap, completely invisible to the GC:

Go heap (GC managed):
  ┌──────────┐    ┌──────────┐    ┌──────────┐
  │ Go object│    │ Go object│    │ Go object│
  └──────────┘    └──────────┘    └──────────┘
       ↑                                ↑
       GC marks and collects

C heap (libc managed):
  ┌──────────┐    ┌──────────┐
  │ C.malloc │    │ C.malloc │   ← GC cannot see this
  │ memory   │    │ memory   │   ← must manually C.free
  └──────────┘    └──────────┘

A critical rule: C code must not retain a pointer into Go memory beyond the duration of the call. If C holds a pointer to Go memory, the GC is unaware of it and may move or collect that memory. This is one of the strictest constraints in the CGo rules.

CGo's Impact on the Build System

CGo introduces the C toolchain into Go's build pipeline:

go build with CGo:
  1. go tool cgo preprocesses C preambles in .go files
  2. Generates _cgo_gotypes.go, _cgo_export.h, and other intermediate files
  3. Calls the C compiler (gcc/clang) to compile C code into .o files
  4. Linker combines Go objects and C objects into the final binary

Build time comparison (medium project):
  CGO_ENABLED=0:  ~3 seconds
  CGO_ENABLED=1:  ~15–30 seconds (extra C compilation time)

Cross-compilation impact:

# Pure Go: effortless cross-compilation
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build

# CGo: requires a C cross-compiler for the target platform
GOOS=linux GOARCH=arm64 CC=aarch64-linux-gnu-gcc CGO_ENABLED=1 go build
# error: aarch64-linux-gnu-gcc: command not found

Level 3: Code Practice

Calling C Functions from Go

The most basic CGo usage — embed C code in a Go file and call it via the C. prefix:

package main

/*
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

// C function: compute the dot product of two vectors
double dot_product(const double* a, const double* b, int n) {
    double sum = 0.0;
    for (int i = 0; i < n; i++) {
        sum += a[i] * b[i];
    }
    return sum;
}

// C function: reverse a string (returns newly allocated string; caller must free)
char* reverse_string(const char* s) {
    int len = strlen(s);
    char* result = (char*)malloc(len + 1);
    if (!result) return NULL;
    for (int i = 0; i < len; i++) {
        result[i] = s[len - 1 - i];
    }
    result[len] = '\0';
    return result;
}
*/
import "C"
import (
    "fmt"
    "unsafe"
)

func DotProduct(a, b []float64) float64 {
    if len(a) != len(b) || len(a) == 0 {
        return 0
    }
    return float64(C.dot_product(
        (*C.double)(unsafe.Pointer(&a[0])),
        (*C.double)(unsafe.Pointer(&b[0])),
        C.int(len(a)),
    ))
}

func ReverseString(s string) string {
    cstr := C.CString(s)          // Go string → C string (heap allocation, must free)
    defer C.free(unsafe.Pointer(cstr))

    reversed := C.reverse_string(cstr) // C allocates the result
    if reversed == nil {
        return ""
    }
    defer C.free(unsafe.Pointer(reversed)) // must free!

    return C.GoString(reversed) // C string → Go string (copies to Go heap)
}

func main() {
    a := []float64{1, 2, 3, 4}
    b := []float64{5, 6, 7, 8}
    fmt.Printf("Dot product: %.2f\n", DotProduct(a, b)) // 70.00

    fmt.Println(ReverseString("Hello, CGo!")) // !oGC ,olleH
}

Key rules:

import "C" must immediately follow the C code comment with no blank lines between them
C.CString allocates C memory; you must call C.free on it
C.GoString copies a C string to the Go heap; safe to use freely

Passing Go Slices to C

A Go slice's backing array can be passed directly to C, but CGo rules must be respected:

package main

/*
#include <stdlib.h>

// Multiply each array element by 2 in place
void double_array(int* arr, int n) {
    for (int i = 0; i < n; i++) {
        arr[i] *= 2;
    }
}

// Compute sum of array
long long sum_array(const int* arr, int n) {
    long long sum = 0;
    for (int i = 0; i < n; i++) {
        sum += arr[i];
    }
    return sum;
}
*/
import "C"
import (
    "fmt"
    "unsafe"
)

func DoubleSlice(s []int32) {
    if len(s) == 0 {
        return
    }
    C.double_array((*C.int)(unsafe.Pointer(&s[0])), C.int(len(s)))
}

func SumSlice(s []int32) int64 {
    if len(s) == 0 {
        return 0
    }
    return int64(C.sum_array((*C.int)(unsafe.Pointer(&s[0])), C.int(len(s))))
}

func main() {
    data := []int32{1, 2, 3, 4, 5}
    fmt.Println("Before:", data)        // [1 2 3 4 5]
    DoubleSlice(data)
    fmt.Println("After:", data)         // [2 4 6 8 10]
    fmt.Println("Sum:", SumSlice(data)) // 30
}

Safety of slice passing: while a C function executes, the Go GC will not move Go heap objects (Go does not currently use a moving GC), so temporarily passing a pointer to a slice's backing array is safe. But if C stores that pointer for later use, the CGo rules are violated.

Memory Management: C.malloc and C.free

package main

/*
#include <stdlib.h>
#include <string.h>

typedef struct {
    char*  name;
    int    age;
    double salary;
} Employee;

Employee* create_employee(const char* name, int age, double salary) {
    Employee* e = (Employee*)malloc(sizeof(Employee));
    if (!e) return NULL;
    e->name = strdup(name);  // strdup internally malloc's
    e->age = age;
    e->salary = salary;
    return e;
}

void free_employee(Employee* e) {
    if (e) {
        free(e->name);  // free internal pointer first
        free(e);        // then free the struct
    }
}
*/
import "C"
import (
    "fmt"
    "runtime"
    "unsafe"
)

// Employee wraps a C Employee struct
type Employee struct {
    ptr *C.Employee
}

// NewEmployee creates a C Employee and registers a finalizer
func NewEmployee(name string, age int, salary float64) *Employee {
    cname := C.CString(name)
    defer C.free(unsafe.Pointer(cname))

    cptr := C.create_employee(cname, C.int(age), C.double(salary))
    if cptr == nil {
        return nil
    }

    e := &Employee{ptr: cptr}
    // Register a finalizer: when Go's GC collects e, automatically call free_employee.
    // Note: finalizers are not guaranteed to run promptly; don't rely on them for
    // critical resource cleanup.
    runtime.SetFinalizer(e, func(emp *Employee) {
        C.free_employee(emp.ptr)
    })
    return e
}

func (e *Employee) Name() string { return C.GoString(e.ptr.name) }
func (e *Employee) Age() int     { return int(e.ptr.age) }

// Close explicitly frees C memory (preferred over relying solely on the finalizer)
func (e *Employee) Close() {
    if e.ptr != nil {
        C.free_employee(e.ptr)
        e.ptr = nil
        runtime.SetFinalizer(e, nil) // cancel the finalizer to prevent double-free
    }
}

func main() {
    emp := NewEmployee("Alice", 30, 95000.0)
    if emp == nil {
        fmt.Println("Failed to create employee")
        return
    }
    defer emp.Close() // explicit resource management

    fmt.Printf("Name: %s, Age: %d\n", emp.Name(), emp.Age())
}

CGO_ENABLED=0: Building Pure Go

For deployments requiring static linking and no external dependencies, disabling CGo is the right choice:

# Build a statically linked binary with no C library dependencies
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o myapp ./cmd/myapp

# Verify: no dynamic link dependencies
file myapp
# myapp: ELF 64-bit LSB executable, x86-64, statically linked

# Docker: use a scratch base image (minimalist)
# Dockerfile:
# FROM scratch
# COPY myapp /myapp
# ENTRYPOINT ["/myapp"]

Conditional compilation: if your package must work with and without CGo, use build tags:

//go:build cgo
// +build cgo

// cgo_impl.go — used when CGo is available
package mydb

import "C"

func openDB(path string) (*DB, error) {
    // Use real C SQLite library
    ...
}

//go:build !cgo
// +build !cgo

// pure_impl.go — used when CGo is not available
package mydb

func openDB(path string) (*DB, error) {
    // Use pure-Go SQLite implementation (modernc/sqlite)
    ...
}

Calling Go Functions from C (Callbacks)

Having C call back into Go is the most complex CGo scenario, because the call direction is reversed:

package main

/*
#include <stdio.h>

// Declare the Go function (CGo generates the implementation)
extern void goCallback(int value);

// C function: iterate an array, call the callback for each element
void process_array(int* arr, int n) {
    for (int i = 0; i < n; i++) {
        goCallback(arr[i]);
    }
}
*/
import "C"
import (
    "fmt"
    "unsafe"
)

//export goCallback
func goCallback(value C.int) {
    fmt.Printf("Callback received: %d\n", int(value))
}

func main() {
    data := []C.int{10, 20, 30, 40, 50}
    C.process_array(&data[0], C.int(len(data)))
}

Constraints of //export:

In a file that uses //export, you cannot have non-exported C code in the preamble (C functions called only from Go, not exported)
Exported Go functions called from C go through the cgocallback path — the reverse of the CGo call path
You cannot use recover in a //export function — C does not understand Go's panic mechanism

Level 4: Advanced Topics and Edge Cases

go-sqlite3 Internals

mattn/go-sqlite3 is one of the most important CGo projects in the Go ecosystem, and studying its implementation teaches a great deal.

Build mechanism: go-sqlite3 bundles the complete SQLite C source (the ~230,000-line sqlite3.c amalgamation) inside the repository and compiles it directly via CGo. This means:

No system-installed libsqlite3 is required
The SQLite version is controlled by go-sqlite3
But every go build must compile those 230,000 lines of C (~30 extra seconds)

Connection lifecycle (simplified):

func (d *SQLiteDriver) Open(dsn string) (driver.Conn, error) {
    var db *C.sqlite3
    cdsn := C.CString(dsn)
    defer C.free(unsafe.Pointer(cdsn))

    rv := C.sqlite3_open_v2(cdsn, &db,
        C.SQLITE_OPEN_FULLMUTEX|C.SQLITE_OPEN_READWRITE|C.SQLITE_OPEN_CREATE,
        nil)
    if rv != C.SQLITE_OK {
        return nil, fmt.Errorf("sqlite3: open %s: %d", dsn, rv)
    }

    conn := &SQLiteConn{db: db}
    runtime.SetFinalizer(conn, (*SQLiteConn).Close)
    return conn, nil
}

Query execution path: each db.Query crosses the Go/C boundary multiple times:

sqlite3_prepare_v2 (compile SQL to bytecode)
sqlite3_bind_* (bind parameters — one CGo call per parameter)
sqlite3_step (advance one row — one CGo call per row)
sqlite3_column_* (read column values — one CGo call per column per row)
sqlite3_finalize (release statement)

For a query returning 100 rows of 10 columns, reading the results alone requires approximately 1,000 CGo calls. This is one of the primary performance bottlenecks of go-sqlite3 under high concurrency.

Profiling: Identifying CGo Overhead

When using go tool pprof to analyze a CGo-heavy program, CGo calls appear as runtime.cgocall in CPU profiles:

# Capture a CPU profile
go test -cpuprofile=cpu.prof -bench=. ./...

# Analyze
go tool pprof cpu.prof
(pprof) top 20
(pprof) web  # open a flame graph in the browser

Things to look for in the profile:

runtime.cgocall: entry point for every Go→C call
runtime.cgocallbackg: entry point for every C→Go callback
syscall.cgocaller: CGo calls at the syscall layer

Batching optimization: when CGo calls are the hot spot, process data in batches at the C level:

// Inefficient: N CGo calls from a Go loop
for _, v := range data {
    C.process_one(C.int(v))
}

// Efficient: one CGo call processes the entire batch
C.process_batch((*C.int)(unsafe.Pointer(&data[0])), C.int(len(data)))

purego: Dynamic Library Calls Without CGo

purego, developed by the Ebitengine (Go game engine) team, lets you call system dynamic libraries (.so/.dylib/.dll) without CGo:

package main

import (
    "fmt"
    "github.com/ebitengine/purego"
)

func main() {
    // On macOS, load the system libSystem
    libc, err := purego.Dlopen("/usr/lib/libSystem.B.dylib", purego.RTLD_NOW|purego.RTLD_GLOBAL)
    if err != nil {
        panic(err)
    }

    var strlen func(string) int
    purego.RegisterLibFunc(&strlen, libc, "strlen")

    fmt.Println(strlen("hello")) // 5
}

How purego works: it uses the platform ABI (calling convention) to construct a call frame directly, then jumps to the dynamic library function via syscall or platform-specific assembly — completely bypassing CGo's bridge mechanism.

Advantages:

Works with CGO_ENABLED=0
No C compiler required; full cross-compilation capability is preserved
Lower call overhead than CGo (~20–30ns vs ~100ns)

Disadvantages:

Only works with dynamic libraries (not static libraries)
Type mappings must be handled manually (no automatic type conversion as with CGo)
Cannot use C function pointers as callbacks

WebAssembly as a CGo Alternative

For some C libraries, you can compile them to WebAssembly and call them from Go through a WASM runtime:

C library source → emscripten/wasi-sdk → .wasm file → Go WASM runtime (wazero) → call

wazero is a pure-Go WASM runtime with no CGo:

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/tetratelabs/wazero"
    "github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"
)

func main() {
    ctx := context.Background()

    // Create a WASM runtime (pure Go)
    r := wazero.NewRuntime(ctx)
    defer r.Close(ctx)

    wasi_snapshot_preview1.MustInstantiate(ctx, r)

    // Load the compiled WASM module (originally a C library)
    wasmBytes, _ := os.ReadFile("mylib.wasm")
    mod, _ := r.Instantiate(ctx, wasmBytes)

    // Call a function exported from the WASM module
    addFn := mod.ExportedFunction("add")
    results, _ := addFn.Call(ctx, 5, 3)
    fmt.Println("5 + 3 =", results[0]) // 8
}

When it's appropriate:

You need sandbox isolation (WASM provides memory isolation)
Cross-platform portability is required (WASM is architecture-neutral)
The C library's functionality is reasonably self-contained (limited I/O)

When it's not:

High-performance computation (WASM JIT is 1.5–3× slower than native)
C libraries that make heavy use of system calls (WASI's syscall support is limited)

Panic Safety in CGo Callbacks

C does not understand Go panics. If a //export function panics, the program crashes rather than unwinding normally. Exported functions must catch all panics:

//export safeGoCallback
func safeGoCallback(value C.int) (result C.int) {
    defer func() {
        if r := recover(); r != nil {
            // Log the error, but do not let the panic propagate into C
            fmt.Fprintf(os.Stderr, "panic in CGo callback: %v\n", r)
            result = -1 // return an error code instead
        }
    }()

    v := processValue(int(value))
    return C.int(v)
}

Production Best Practices

1. Isolate CGo code

Concentrate all CGo code in a dedicated internal package (e.g., internal/clib) and expose a pure-Go interface to the rest of the application:

myproject/
├── cmd/
│   └── myapp/main.go
├── internal/
│   └── clib/
│       ├── clib.go      ← CGo code lives here
│       ├── wrapper.go   ← pure-Go wrappers
│       └── sqlite.h
└── pkg/
    └── database/
        └── db.go        ← uses internal/clib; upper layers don't know about CGo

2. Always test the CGO_ENABLED=0 path

Test both builds in CI:

# .github/workflows/ci.yml
- name: Test with CGo
  run: CGO_ENABLED=1 go test ./...

- name: Test without CGo
  run: CGO_ENABLED=0 go test ./...

3. Memory leak detection

Use AddressSanitizer to detect memory leaks in C code:

CGO_CFLAGS="-fsanitize=address -g" CGO_LDFLAGS="-fsanitize=address" \
    go test -count=1 ./...

4. Limit concurrent CGo calls

High-concurrency CGo calls create many OS threads (each blocking CGo call occupies one M). Use a semaphore to limit this:

var cgoSem = make(chan struct{}, 16) // at most 16 concurrent CGo calls

func callCWithLimit(data []byte) {
    cgoSem <- struct{}{}
    defer func() { <-cgoSem }()
    C.process((*C.uchar)(unsafe.Pointer(&data[0])), C.int(len(data)))
}

5. Prefer C.malloc for data that must outlive the call

If C needs to hold data beyond the return of a single call, allocate it with C.malloc, not by passing a pointer to Go memory:

// Wrong: C retains a pointer to Go memory (violates CGo rules)
func badPattern(s string) {
    cstr := C.CString(s)
    C.store_for_later(cstr) // if C stores cstr, this is undefined behavior!
    // After this function returns, the Go GC may move/collect the memory behind cstr
}

// Correct: if C needs to keep the data, C owns the allocation
func goodPattern(s string) {
    cstr := C.CString(s)          // C.malloc'd memory
    C.store_for_later(cstr)       // C now owns this memory
    // Do NOT defer C.free here; C is responsible for calling free when done
}

Summary

CGo is a double-edged sword:

What it can do: call decades of accumulated C libraries, access OS-level APIs, integrate legacy systems — these needs exist in real engineering and cannot always be avoided.
What it costs: ~100ns per call across the boundary, build complexity, limited cross-compilation, dual-track memory management between GC and C — these costs are real and cannot be ignored.

Before choosing CGo, ask yourself three questions:

Does a pure-Go implementation of sufficient quality exist? (modernc/sqlite, cloudflare/circl, etc.)
Can I use purego to call a dynamic library, avoiding the C compiler dependency at build time?
Can I use wazero to load a WASM module, preserving cross-platform capability?

Only when all three paths are genuinely blocked is CGo the right answer. And when you do use CGo, the memory management rules, callback safety, batch-processing patterns, and code isolation principles covered in this chapter are your baseline protection against falling into deep and costly traps.

Rate this chapter

4.5 / 5 (9 ratings)

CGo: The Cost of Dancing with C

CGo: The Cost of Dancing with C

Level 1: What You Need to Know

When CGo Is Necessary

CGo Is Not Free: A Cost Overview

Level 2: Principles

The Mechanics of a CGo Call

Stack Layout During a CGo Call

The Boundary Between C Memory and Go's GC

CGo's Impact on the Build System

Level 3: Code Practice

Calling C Functions from Go

Passing Go Slices to C

Memory Management: C.malloc and C.free

CGO_ENABLED=0: Building Pure Go

Calling Go Functions from C (Callbacks)

Level 4: Advanced Topics and Edge Cases

go-sqlite3 Internals

Profiling: Identifying CGo Overhead

purego: Dynamic Library Calls Without CGo

WebAssembly as a CGo Alternative

Panic Safety in CGo Callbacks

Production Best Practices

Summary

💬 Comments