Chapter 9

The Secret of Function Calls

Every time you write result = some_function(a, b), you're asking the CPU to do something surprisingly intricate: pause its current work, travel to a completely different location in memory, execute a chunk of code, come back with a result, and restore everything to exactly how it was before—without disturbing a single variable you were using.

This deceptively simple operation is built on a carefully designed mechanism: the call stack and stack frames. Understanding this mechanism lets you read assembly code fluently, understand why deep recursion causes stack overflows, and grasp why certain bugs mysteriously corrupt seemingly unrelated data.

Core Concepts

The Stack: Last In, First Out

Every program gets a dedicated region of memory for function calls called the stack. Think of it as a physical stack of plates: you always add to the top and take from the top. Whatever went in last comes out first.

Memory layout (high addresses at top):
┌───────────────────────────┐ ← Stack bottom (high address)
│    main()'s data          │
│───────────────────────────│
│    add()'s data           │  ← pushed when add() is called
│───────────────────────────│
│    (free space)           │  ← RSP points here (stack top)
│                           │
└───────────────────────────┘ ← Low addresses

On x86, the stack grows downward (toward lower addresses). push decrements RSP; pop increments it. This feels backwards at first, but it's a convention so universal that all operating systems, compilers, and debuggers agree on it.

Anatomy of a Stack Frame

Each function call carves out a chunk of stack space called a stack frame. It holds three kinds of things:

A typical stack frame (high to low address):

┌──────────────────────────────────────────────────┐ ← previous frame's bottom
│         saved RBP (caller's base pointer)         │ ← RBP points here
│──────────────────────────────────────────────────│
│              return address                       │ ← pushed automatically by CALL
│──────────────────────────────────────────────────│
│         arguments (if more than 6)                │
│──────────────────────────────────────────────────│
│              local variables                      │
│──────────────────────────────────────────────────│
│         saved callee-saved registers              │ ← RSP points here (frame top)
└──────────────────────────────────────────────────┘

The return address is the heart of the whole system. The CALL instruction does two things atomically: pushes the address of the instruction immediately after the call onto the stack, then jumps to the function's entry point. RET does the inverse: pops the return address and jumps to it. The whole mechanism is elegant because the CPU doesn't need to know where it came from—the return address is always right there on the stack.

Calling Conventions: Who Does What

When calling a function, questions arise: how are arguments passed? who saves which registers? who cleans up? Calling conventions answer all of these. On x86-64 Linux and macOS, the standard is System V AMD64 ABI:

Argument passing (first 6 integer/pointer arguments):
┌─────────┬──────────┐
│ 1st arg │   RDI    │
│ 2nd arg │   RSI    │
│ 3rd arg │   RDX    │
│ 4th arg │   RCX    │
│ 5th arg │   R8     │
│ 6th arg │   R9     │
│ 7th+    │ on stack │
└─────────┴──────────┘

Return value: RAX (integer/pointer), XMM0 (float/double)

Caller-saved (RDI, RSI, RDX, RCX, R8, R9, R10, R11): if you need these across a call, save them yourself before calling.
Callee-saved (RBX, RBP, R12-R15): if a function uses these, it must restore their original values before returning.

A Complete Example: C Code and Assembly Side by Side

// C code
int add(int a, int b) {
    return a + b;
}

int main() {
    int result = add(3, 5);
    return result;
}

Corresponding x86-64 assembly (simplified):

main:
    ; Set up arguments
    mov  edi, 3         ; 1st argument → RDI
    mov  esi, 5         ; 2nd argument → RSI

    ; Make the call
    call add            ; ① push address of next instruction (return addr)
                        ; ② jump to add's entry point

    ; After call returns, result is in RAX
    ret

add:
    ; At this point: RDI=3, RSI=5
    mov  eax, edi       ; EAX = a = 3
    add  eax, esi       ; EAX = 3 + 5 = 8
    ret                 ; ① pop return address from stack
                        ; ② jump back to instruction after CALL in main

Stack state during the call:

Before CALL:              During add():
┌──────────────────┐      ┌──────────────────┐
│ main's locals    │      │ main's locals    │
│──────────────────│  →   │──────────────────│
│ (RSP)            │      │ return address   │ ← RBP (add's frame)
│                  │      │──────────────────│
│                  │      │ (RSP)            │
└──────────────────┘      └──────────────────┘

Recursion: The Stack Under Stress

A recursive function calls itself, creating a new stack frame on each call. They pile up, each waiting for the next to return:

int factorial(int n) {
    if (n <= 1) return 1;        // base case
    return n * factorial(n - 1); // recursive call
}

Calling factorial(4) builds this call stack:

Stack (high → low address):

┌──────────────────────────┐
│ factorial(4) frame       │  n=4, waiting for factorial(3)
│  return addr → main      │
├──────────────────────────┤
│ factorial(3) frame       │  n=3, waiting for factorial(2)
│  return addr → fact(4)   │
├──────────────────────────┤
│ factorial(2) frame       │  n=2, waiting for factorial(1)
│  return addr → fact(3)   │
├──────────────────────────┤
│ factorial(1) frame       │  n=1 → returns 1 immediately
│  return addr → fact(2)   │
└──────────────────────────┘

Results bubble back up: 1 → 2 → 6 → 24.

Why Stack Overflow Happens

Every process has a fixed stack size limit (8 MB on Linux by default, 1 MB on Windows). If recursion has no working base case, frames pile up until the stack exceeds its boundary. The OS detects the memory violation and kills the process with a segmentation fault—commonly reported as a stack overflow.

# Stack overflow in Python
def infinite():
    return infinite()

infinite()
# RecursionError: maximum recursion depth exceeded
# Python limits to ~1000 frames by default to give you a clean error
# instead of crashing the interpreter

Try It Yourself

View a live call stack with GDB:

gcc -g -o factorial factorial.c
gdb factorial
(gdb) break factorial
(gdb) run
(gdb) bt          # backtrace: show all frames
# Output:
# #0  factorial (n=1) at factorial.c:2
# #1  factorial (n=2) at factorial.c:3
# #2  factorial (n=3) at factorial.c:3
# #3  factorial (n=4) at factorial.c:3
# #4  main ()     at factorial.c:8

Or in Python, use traceback to inspect the call stack at any point:

import traceback

def a():
    b()
def b():
    c()
def c():
    traceback.print_stack()

a()
# Output:
#   File "...", line 10, in <module>  a()
#   File "...", line 2,  in a         b()
#   File "...", line 5,  in b         c()
#   File "...", line 8,  in c         traceback.print_stack()

🔬 Going Deeper

Stack overflow as a security vulnerability

The call stack has a famous security weakness baked into its design. Local variables and the return address live in the same stack frame, at adjacent memory addresses. If a function accepts an input into a local buffer (say, char buf[64]) without checking the length, an attacker can write past the end of the buffer, overwriting the return address with an address of their choosing. When the function returns, the CPU "returns" to the attacker's code instead. This is stack buffer overflow, the most historically prevalent class of security vulnerability. Modern defenses include stack canaries (a random value placed between locals and the return address, checked before returning), ASLR (randomizing where the stack lives in memory), and non-executable stack pages.

Tail call optimization

If the last thing a recursive function does is call itself (a tail call), a smart compiler can reuse the current stack frame instead of creating a new one—turning recursion into a loop with no stack growth risk. Haskell and Scheme guarantee this optimization. C compilers (GCC, Clang) perform it under specific conditions at -O2 and higher. Python deliberately does not, to preserve clean error tracebacks.

// Tail-recursive factorial (GCC -O2 may convert this to a loop)
int factorial_tail(int n, int acc) {
    if (n <= 1) return acc;
    return factorial_tail(n - 1, n * acc);  // tail position: nothing after the call
}

Where to learn more

Computer Systems: A Programmer's Perspective (CSAPP) — Section 3.7 is the gold standard explanation of stack frames, calling conventions, and x86-64 assembly, all tied together with worked examples.
Hacking: The Art of Exploitation by Jon Erickson — Approaches the call stack from an attacker's perspective. Understanding how exploits work is the fastest way to understand why the stack is designed the way it is.
OSDev Wiki, "Calling Conventions" (osdev.org) — A concise reference table for calling conventions across x86, x86-64, ARM, and other platforms. Bookmark it for whenever you write code that crosses language or OS boundaries.

Rate this chapter

4.7 / 5 (38 ratings)

The Secret of Function Calls

The Secret of Function Calls

Core Concepts

The Stack: Last In, First Out

Anatomy of a Stack Frame

Calling Conventions: Who Does What

A Complete Example: C Code and Assembly Side by Side

Recursion: The Stack Under Stress

Why Stack Overflow Happens

Try It Yourself

🔬 Going Deeper

💬 Comments