CGo: The Cost of Dancing with C
CGo: The Cost of Dancing with C
There is an unwritten consensus in the Go ecosystem: if your project uses CGo, you have chosen a path that demands exceptional care. CGo lets a Go program call C code, and that capability is sometimes irreplaceable — the world holds an enormous number of mature C libraries (SQLite, OpenSSL, BLAS/LAPACK, libpcap, Linux kernel interfaces) that have been refined over decades, and the equivalent pure-Go implementations either do not exist or have a significant quality gap.
But CGo's costs are real, and frequently underestimated. Every call that crosses the Go/C boundary takes roughly 100 nanoseconds. The goroutine must switch to a system stack. The garbage collector's pause times can increase. Cross-compilation becomes nearly impossible. Build times grow noticeably. Debugger support becomes unreliable.
This chapter is not an argument against CGo. It is a guide to using it correctly when you genuinely need it.
Level 1: What You Need to Know
When CGo Is Necessary
Case 1: Calling mature C libraries
There are C libraries that have been battle-tested for decades, and whose reliability pure-Go implementations cannot match in the short term:
- SQLite: The most widely deployed database in the world.
mattn/go-sqlite3wraps it via CGo and is the most widely used SQLite driver in the Go ecosystem. A pure-Go implementation (modernc/sqlite) exists, but historically has had a meaningful performance gap, and differences in behavioral edge cases. - OpenSSL / BoringSSL: In FIPS 140-2/3 regulated environments (finance, healthcare, government), a certified cryptographic implementation is mandatory, and Go's standard library
cryptopackage is not in scope for those certifications. - Graphics and audio libraries: OpenGL (
go-gl), Vulkan, PortAudio — no independent pure-Go implementations exist. - ML acceleration: Calling cuBLAS, MKL, or similar hardware-accelerated libraries to operate on GPUs from Go.
Case 2: OS-level APIs
Some OS APIs have no better calling mechanism:
- Specific
ioctlcommands (character device driver interaction) - Linux's
io_uringinterface (before Go provides higher-level wrappers) - Special system calls for interacting with kernel modules
Case 3: Legacy system integration
Enterprises with large existing C/C++ codebases use CGo as a bridge during incremental migration to Go — letting Go code call existing C libraries while gradually replacing them, rather than rewriting everything at once.
CGo Is Not Free: A Cost Overview
Before deciding to use CGo, understand its costs clearly:
| Cost dimension | What it means |
|---|---|
| Call overhead | ~100ns per Go→C call (vs. ~1ns for a Go function call) |
| Build time | Introduces the C toolchain; build times increase 2–10× |
| Cross-compilation | CGO_ENABLED=0 enables full cross-compilation; with CGo you need a C cross-compiler for the target platform |
| GC pressure | C memory is invisible to the GC; requires manual C.malloc/C.free; risk of memory leaks |
| Debugging | Mixed stack frames; dlv/gdb behave unpredictably at C frames |
| Static analysis | go vet, staticcheck, etc. have limited coverage over CGo code |
| Docker image size | CGo prevents using a scratch base image; requires alpine or debian |
Level 2: Principles
The Mechanics of a CGo Call
Understanding CGo's overhead requires understanding its call path. A single Go→C call goes through these steps:
Go goroutine (user stack, 2KB–1GB dynamic)
↓
1. entersyscall / cgocall entry point
↓
2. Switch goroutine state to _Csyscall
↓
3. Save Go register state
↓
4. Switch to M (OS thread) system stack (fixed size, default 8MB)
↓
5. Execute C function (on system stack)
↓
6. C function returns
↓
7. Switch back to Go goroutine stack
↓
8. Restore goroutine state to _Grunning
↓
9. Continue executing Go code
Why must we switch to the system stack?
Go goroutines use a segmented/copying stack — they start at 2KB and grow on demand. C code knows nothing about this mechanism; it assumes the stack is contiguous and of fixed size (C calling convention). If a C function runs on Go's small stack, any C function with deep recursion or large local variables could overflow the stack.
Therefore, every Go→C call must switch to the M's system stack (typically 8MB, similar to an OS thread's default stack size). This switch requires saving and restoring registers and updating scheduler state — roughly 60–100ns of overhead.
GC and CGo interaction
When a goroutine is executing C code:
- The Go GC does not know when the C code will finish
- When GC triggers, it marks this goroutine as
_Csyscalland proceeds without waiting - This means a long-running CGo call delays when the GC can reclaim memory, increasing heap peak usage
The Go runtime tracks whether execution is inside a CGo call via getg().m.incgo and similar mechanisms, but this tracking also has overhead.
Stack Layout During a CGo Call
OS Thread M:
┌─────────────────────────────────────────────────────┐
│ System stack (fixed 8MB) │
│ ┌─────────────────────────────────────────────┐ │
│ │ C frame N │ │
│ │ C frame N-1 │ │
│ │ ... │ │
│ │ CGo bridge code (cgocall / cgocallback) │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ Goroutine G: │
│ ┌─────────────────────────────────────────────┐ │
│ │ Go frames (goroutine stack, 2KB–1GB) │ │
│ │ callCgoFunc(...) ← call origin │ │
│ │ [goroutine suspended, waiting for C return] │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
The Boundary Between C Memory and Go's GC
Go's GC manages only memory allocated on the Go heap. Memory allocated by C.malloc lives on the C heap, completely invisible to the GC:
Go heap (GC managed):
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Go object│ │ Go object│ │ Go object│
└──────────┘ └──────────┘ └──────────┘
↑ ↑
GC marks and collects
C heap (libc managed):
┌──────────┐ ┌──────────┐
│ C.malloc │ │ C.malloc │ ← GC cannot see this
│ memory │ │ memory │ ← must manually C.free
└──────────┘ └──────────┘
A critical rule: C code must not retain a pointer into Go memory beyond the duration of the call. If C holds a pointer to Go memory, the GC is unaware of it and may move or collect that memory. This is one of the strictest constraints in the CGo rules.
CGo's Impact on the Build System
CGo introduces the C toolchain into Go's build pipeline:
go build with CGo:
1. go tool cgo preprocesses C preambles in .go files
2. Generates _cgo_gotypes.go, _cgo_export.h, and other intermediate files
3. Calls the C compiler (gcc/clang) to compile C code into .o files
4. Linker combines Go objects and C objects into the final binary
Build time comparison (medium project):
CGO_ENABLED=0: ~3 seconds
CGO_ENABLED=1: ~15–30 seconds (extra C compilation time)
Cross-compilation impact:
# Pure Go: effortless cross-compilation
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build
# CGo: requires a C cross-compiler for the target platform
GOOS=linux GOARCH=arm64 CC=aarch64-linux-gnu-gcc CGO_ENABLED=1 go build
# error: aarch64-linux-gnu-gcc: command not found
Level 3: Code Practice
Calling C Functions from Go
The most basic CGo usage — embed C code in a Go file and call it via the C. prefix:
package main
/*
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
// C function: compute the dot product of two vectors
double dot_product(const double* a, const double* b, int n) {
double sum = 0.0;
for (int i = 0; i < n; i++) {
sum += a[i] * b[i];
}
return sum;
}
// C function: reverse a string (returns newly allocated string; caller must free)
char* reverse_string(const char* s) {
int len = strlen(s);
char* result = (char*)malloc(len + 1);
if (!result) return NULL;
for (int i = 0; i < len; i++) {
result[i] = s[len - 1 - i];
}
result[len] = '\0';
return result;
}
*/
import "C"
import (
"fmt"
"unsafe"
)
func DotProduct(a, b []float64) float64 {
if len(a) != len(b) || len(a) == 0 {
return 0
}
return float64(C.dot_product(
(*C.double)(unsafe.Pointer(&a[0])),
(*C.double)(unsafe.Pointer(&b[0])),
C.int(len(a)),
))
}
func ReverseString(s string) string {
cstr := C.CString(s) // Go string → C string (heap allocation, must free)
defer C.free(unsafe.Pointer(cstr))
reversed := C.reverse_string(cstr) // C allocates the result
if reversed == nil {
return ""
}
defer C.free(unsafe.Pointer(reversed)) // must free!
return C.GoString(reversed) // C string → Go string (copies to Go heap)
}
func main() {
a := []float64{1, 2, 3, 4}
b := []float64{5, 6, 7, 8}
fmt.Printf("Dot product: %.2f\n", DotProduct(a, b)) // 70.00
fmt.Println(ReverseString("Hello, CGo!")) // !oGC ,olleH
}
Key rules:
import "C"must immediately follow the C code comment with no blank lines between themC.CStringallocates C memory; you must callC.freeon itC.GoStringcopies a C string to the Go heap; safe to use freely
Passing Go Slices to C
A Go slice's backing array can be passed directly to C, but CGo rules must be respected:
package main
/*
#include <stdlib.h>
// Multiply each array element by 2 in place
void double_array(int* arr, int n) {
for (int i = 0; i < n; i++) {
arr[i] *= 2;
}
}
// Compute sum of array
long long sum_array(const int* arr, int n) {
long long sum = 0;
for (int i = 0; i < n; i++) {
sum += arr[i];
}
return sum;
}
*/
import "C"
import (
"fmt"
"unsafe"
)
func DoubleSlice(s []int32) {
if len(s) == 0 {
return
}
C.double_array((*C.int)(unsafe.Pointer(&s[0])), C.int(len(s)))
}
func SumSlice(s []int32) int64 {
if len(s) == 0 {
return 0
}
return int64(C.sum_array((*C.int)(unsafe.Pointer(&s[0])), C.int(len(s))))
}
func main() {
data := []int32{1, 2, 3, 4, 5}
fmt.Println("Before:", data) // [1 2 3 4 5]
DoubleSlice(data)
fmt.Println("After:", data) // [2 4 6 8 10]
fmt.Println("Sum:", SumSlice(data)) // 30
}
Safety of slice passing: while a C function executes, the Go GC will not move Go heap objects (Go does not currently use a moving GC), so temporarily passing a pointer to a slice's backing array is safe. But if C stores that pointer for later use, the CGo rules are violated.
Memory Management: C.malloc and C.free
package main
/*
#include <stdlib.h>
#include <string.h>
typedef struct {
char* name;
int age;
double salary;
} Employee;
Employee* create_employee(const char* name, int age, double salary) {
Employee* e = (Employee*)malloc(sizeof(Employee));
if (!e) return NULL;
e->name = strdup(name); // strdup internally malloc's
e->age = age;
e->salary = salary;
return e;
}
void free_employee(Employee* e) {
if (e) {
free(e->name); // free internal pointer first
free(e); // then free the struct
}
}
*/
import "C"
import (
"fmt"
"runtime"
"unsafe"
)
// Employee wraps a C Employee struct
type Employee struct {
ptr *C.Employee
}
// NewEmployee creates a C Employee and registers a finalizer
func NewEmployee(name string, age int, salary float64) *Employee {
cname := C.CString(name)
defer C.free(unsafe.Pointer(cname))
cptr := C.create_employee(cname, C.int(age), C.double(salary))
if cptr == nil {
return nil
}
e := &Employee{ptr: cptr}
// Register a finalizer: when Go's GC collects e, automatically call free_employee.
// Note: finalizers are not guaranteed to run promptly; don't rely on them for
// critical resource cleanup.
runtime.SetFinalizer(e, func(emp *Employee) {
C.free_employee(emp.ptr)
})
return e
}
func (e *Employee) Name() string { return C.GoString(e.ptr.name) }
func (e *Employee) Age() int { return int(e.ptr.age) }
// Close explicitly frees C memory (preferred over relying solely on the finalizer)
func (e *Employee) Close() {
if e.ptr != nil {
C.free_employee(e.ptr)
e.ptr = nil
runtime.SetFinalizer(e, nil) // cancel the finalizer to prevent double-free
}
}
func main() {
emp := NewEmployee("Alice", 30, 95000.0)
if emp == nil {
fmt.Println("Failed to create employee")
return
}
defer emp.Close() // explicit resource management
fmt.Printf("Name: %s, Age: %d\n", emp.Name(), emp.Age())
}
CGO_ENABLED=0: Building Pure Go
For deployments requiring static linking and no external dependencies, disabling CGo is the right choice:
# Build a statically linked binary with no C library dependencies
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o myapp ./cmd/myapp
# Verify: no dynamic link dependencies
file myapp
# myapp: ELF 64-bit LSB executable, x86-64, statically linked
# Docker: use a scratch base image (minimalist)
# Dockerfile:
# FROM scratch
# COPY myapp /myapp
# ENTRYPOINT ["/myapp"]
Conditional compilation: if your package must work with and without CGo, use build tags:
//go:build cgo
// +build cgo
// cgo_impl.go — used when CGo is available
package mydb
import "C"
func openDB(path string) (*DB, error) {
// Use real C SQLite library
...
}
//go:build !cgo
// +build !cgo
// pure_impl.go — used when CGo is not available
package mydb
func openDB(path string) (*DB, error) {
// Use pure-Go SQLite implementation (modernc/sqlite)
...
}
Calling Go Functions from C (Callbacks)
Having C call back into Go is the most complex CGo scenario, because the call direction is reversed:
package main
/*
#include <stdio.h>
// Declare the Go function (CGo generates the implementation)
extern void goCallback(int value);
// C function: iterate an array, call the callback for each element
void process_array(int* arr, int n) {
for (int i = 0; i < n; i++) {
goCallback(arr[i]);
}
}
*/
import "C"
import (
"fmt"
"unsafe"
)
//export goCallback
func goCallback(value C.int) {
fmt.Printf("Callback received: %d\n", int(value))
}
func main() {
data := []C.int{10, 20, 30, 40, 50}
C.process_array(&data[0], C.int(len(data)))
}
Constraints of //export:
- In a file that uses
//export, you cannot have non-exported C code in the preamble (C functions called only from Go, not exported) - Exported Go functions called from C go through the
cgocallbackpath — the reverse of the CGo call path - You cannot use
recoverin a//exportfunction — C does not understand Go's panic mechanism
Level 4: Advanced Topics and Edge Cases
go-sqlite3 Internals
mattn/go-sqlite3 is one of the most important CGo projects in the Go ecosystem, and studying its implementation teaches a great deal.
Build mechanism: go-sqlite3 bundles the complete SQLite C source (the ~230,000-line sqlite3.c amalgamation) inside the repository and compiles it directly via CGo. This means:
- No system-installed
libsqlite3is required - The SQLite version is controlled by go-sqlite3
- But every
go buildmust compile those 230,000 lines of C (~30 extra seconds)
Connection lifecycle (simplified):
func (d *SQLiteDriver) Open(dsn string) (driver.Conn, error) {
var db *C.sqlite3
cdsn := C.CString(dsn)
defer C.free(unsafe.Pointer(cdsn))
rv := C.sqlite3_open_v2(cdsn, &db,
C.SQLITE_OPEN_FULLMUTEX|C.SQLITE_OPEN_READWRITE|C.SQLITE_OPEN_CREATE,
nil)
if rv != C.SQLITE_OK {
return nil, fmt.Errorf("sqlite3: open %s: %d", dsn, rv)
}
conn := &SQLiteConn{db: db}
runtime.SetFinalizer(conn, (*SQLiteConn).Close)
return conn, nil
}
Query execution path: each db.Query crosses the Go/C boundary multiple times:
sqlite3_prepare_v2(compile SQL to bytecode)sqlite3_bind_*(bind parameters — one CGo call per parameter)sqlite3_step(advance one row — one CGo call per row)sqlite3_column_*(read column values — one CGo call per column per row)sqlite3_finalize(release statement)
For a query returning 100 rows of 10 columns, reading the results alone requires approximately 1,000 CGo calls. This is one of the primary performance bottlenecks of go-sqlite3 under high concurrency.
Profiling: Identifying CGo Overhead
When using go tool pprof to analyze a CGo-heavy program, CGo calls appear as runtime.cgocall in CPU profiles:
# Capture a CPU profile
go test -cpuprofile=cpu.prof -bench=. ./...
# Analyze
go tool pprof cpu.prof
(pprof) top 20
(pprof) web # open a flame graph in the browser
Things to look for in the profile:
runtime.cgocall: entry point for every Go→C callruntime.cgocallbackg: entry point for every C→Go callbacksyscall.cgocaller: CGo calls at the syscall layer
Batching optimization: when CGo calls are the hot spot, process data in batches at the C level:
// Inefficient: N CGo calls from a Go loop
for _, v := range data {
C.process_one(C.int(v))
}
// Efficient: one CGo call processes the entire batch
C.process_batch((*C.int)(unsafe.Pointer(&data[0])), C.int(len(data)))
purego: Dynamic Library Calls Without CGo
purego, developed by the Ebitengine (Go game engine) team, lets you call system dynamic libraries (.so/.dylib/.dll) without CGo:
package main
import (
"fmt"
"github.com/ebitengine/purego"
)
func main() {
// On macOS, load the system libSystem
libc, err := purego.Dlopen("/usr/lib/libSystem.B.dylib", purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
var strlen func(string) int
purego.RegisterLibFunc(&strlen, libc, "strlen")
fmt.Println(strlen("hello")) // 5
}
How purego works: it uses the platform ABI (calling convention) to construct a call frame directly, then jumps to the dynamic library function via syscall or platform-specific assembly — completely bypassing CGo's bridge mechanism.
Advantages:
- Works with
CGO_ENABLED=0 - No C compiler required; full cross-compilation capability is preserved
- Lower call overhead than CGo (~20–30ns vs ~100ns)
Disadvantages:
- Only works with dynamic libraries (not static libraries)
- Type mappings must be handled manually (no automatic type conversion as with CGo)
- Cannot use C function pointers as callbacks
WebAssembly as a CGo Alternative
For some C libraries, you can compile them to WebAssembly and call them from Go through a WASM runtime:
C library source → emscripten/wasi-sdk → .wasm file → Go WASM runtime (wazero) → call
wazero is a pure-Go WASM runtime with no CGo:
package main
import (
"context"
"fmt"
"os"
"github.com/tetratelabs/wazero"
"github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"
)
func main() {
ctx := context.Background()
// Create a WASM runtime (pure Go)
r := wazero.NewRuntime(ctx)
defer r.Close(ctx)
wasi_snapshot_preview1.MustInstantiate(ctx, r)
// Load the compiled WASM module (originally a C library)
wasmBytes, _ := os.ReadFile("mylib.wasm")
mod, _ := r.Instantiate(ctx, wasmBytes)
// Call a function exported from the WASM module
addFn := mod.ExportedFunction("add")
results, _ := addFn.Call(ctx, 5, 3)
fmt.Println("5 + 3 =", results[0]) // 8
}
When it's appropriate:
- You need sandbox isolation (WASM provides memory isolation)
- Cross-platform portability is required (WASM is architecture-neutral)
- The C library's functionality is reasonably self-contained (limited I/O)
When it's not:
- High-performance computation (WASM JIT is 1.5–3× slower than native)
- C libraries that make heavy use of system calls (WASI's syscall support is limited)
Panic Safety in CGo Callbacks
C does not understand Go panics. If a //export function panics, the program crashes rather than unwinding normally. Exported functions must catch all panics:
//export safeGoCallback
func safeGoCallback(value C.int) (result C.int) {
defer func() {
if r := recover(); r != nil {
// Log the error, but do not let the panic propagate into C
fmt.Fprintf(os.Stderr, "panic in CGo callback: %v\n", r)
result = -1 // return an error code instead
}
}()
v := processValue(int(value))
return C.int(v)
}
Production Best Practices
1. Isolate CGo code
Concentrate all CGo code in a dedicated internal package (e.g., internal/clib) and expose a pure-Go interface to the rest of the application:
myproject/
├── cmd/
│ └── myapp/main.go
├── internal/
│ └── clib/
│ ├── clib.go ← CGo code lives here
│ ├── wrapper.go ← pure-Go wrappers
│ └── sqlite.h
└── pkg/
└── database/
└── db.go ← uses internal/clib; upper layers don't know about CGo
2. Always test the CGO_ENABLED=0 path
Test both builds in CI:
# .github/workflows/ci.yml
- name: Test with CGo
run: CGO_ENABLED=1 go test ./...
- name: Test without CGo
run: CGO_ENABLED=0 go test ./...
3. Memory leak detection
Use AddressSanitizer to detect memory leaks in C code:
CGO_CFLAGS="-fsanitize=address -g" CGO_LDFLAGS="-fsanitize=address" \
go test -count=1 ./...
4. Limit concurrent CGo calls
High-concurrency CGo calls create many OS threads (each blocking CGo call occupies one M). Use a semaphore to limit this:
var cgoSem = make(chan struct{}, 16) // at most 16 concurrent CGo calls
func callCWithLimit(data []byte) {
cgoSem <- struct{}{}
defer func() { <-cgoSem }()
C.process((*C.uchar)(unsafe.Pointer(&data[0])), C.int(len(data)))
}
5. Prefer C.malloc for data that must outlive the call
If C needs to hold data beyond the return of a single call, allocate it with C.malloc, not by passing a pointer to Go memory:
// Wrong: C retains a pointer to Go memory (violates CGo rules)
func badPattern(s string) {
cstr := C.CString(s)
C.store_for_later(cstr) // if C stores cstr, this is undefined behavior!
// After this function returns, the Go GC may move/collect the memory behind cstr
}
// Correct: if C needs to keep the data, C owns the allocation
func goodPattern(s string) {
cstr := C.CString(s) // C.malloc'd memory
C.store_for_later(cstr) // C now owns this memory
// Do NOT defer C.free here; C is responsible for calling free when done
}
Summary
CGo is a double-edged sword:
- What it can do: call decades of accumulated C libraries, access OS-level APIs, integrate legacy systems — these needs exist in real engineering and cannot always be avoided.
- What it costs: ~100ns per call across the boundary, build complexity, limited cross-compilation, dual-track memory management between GC and C — these costs are real and cannot be ignored.
Before choosing CGo, ask yourself three questions:
- Does a pure-Go implementation of sufficient quality exist? (
modernc/sqlite,cloudflare/circl, etc.) - Can I use
puregoto call a dynamic library, avoiding the C compiler dependency at build time? - Can I use
wazeroto load a WASM module, preserving cross-platform capability?
Only when all three paths are genuinely blocked is CGo the right answer. And when you do use CGo, the memory management rules, callback safety, batch-processing patterns, and code isolation principles covered in this chapter are your baseline protection against falling into deep and costly traps.