Chapter 5

Instructions Are the CPU's Language

Instructions Are the CPU's Language

Think about a vending machine. It's a computer of sortsโ€”it has buttons, logic, and memory. But it can only do a handful of things: accept money, dispense items, make change. Now think about your smartphone. It can do almost anything. Yet under the hood, the chip inside your phone also only knows how to do a limited set of operationsโ€”just a somewhat larger limited set.

That finite list of operations is called the Instruction Set Architecture, or ISA. Every command a CPU can understand is baked into the chip design from the very beginning. Every Python loop, every JavaScript function, every line of C code you've ever written eventually gets translated into some combination of these instructions before the CPU will touch it.

Core Concepts

The ISA Is a Contract

Think of the ISA as a legal contract between chip designers and software developers: "Here's the list of commands my chip understands. If your software translates itself into these commands, it'll run on my hardware."

The two dominant ISAs today couldn't be more different in philosophy:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   x86-64  vs  ARM (AArch64)                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”ค
โ”‚ Feature        โ”‚ x86-64              โ”‚ ARM (AArch64)         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Philosophy     โ”‚ CISC (complex)      โ”‚ RISC (reduced)        โ”‚
โ”‚ Instruction    โ”‚ Variable (1-15 B)   โ”‚ Fixed (4 bytes)       โ”‚
โ”‚ Registers      โ”‚ 16 general purpose  โ”‚ 31 general purpose    โ”‚
โ”‚ Memory ops     โ”‚ Direct in most instrโ”‚ Load/store only       โ”‚
โ”‚ Power usage    โ”‚ Higher              โ”‚ Lower                 โ”‚
โ”‚ Common use     โ”‚ PCs, servers        โ”‚ Phones, Apple Silicon โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

x86's "complex" instructions can do a lot in one shotโ€”like adding a number directly from memory to another memory location. ARM's "reduced" instructions require you to load data into a register first, then operate on it. ARM's regularity makes pipeline design simpler and power consumption much lower, which is exactly why your phone runs an ARM chip and not an x86 one.

The Four Families of Instructions

No matter how large or exotic an instruction set gets, almost everything falls into four families:

1. Data Movement (MOV) โ€” Shuffling data around

MOV  RAX, 42        ; Load the number 42 into register RAX
MOV  [RBX], RAX     ; Write the value in RAX to the memory address in RBX

2. Arithmetic and Logic (ADD/SUB/AND/OR) โ€” Doing math

ADD  RAX, RBX       ; RAX = RAX + RBX
SUB  RCX, 1         ; RCX = RCX - 1
AND  RAX, 0xFF      ; Keep only the lowest 8 bits of RAX

3. Compare and Jump (CMP/JMP) โ€” Controlling flow

CMP  RAX, 0         ; Compare RAX to 0 (sets flags, no output)
JE   done           ; Jump to "done" if they were Equal
JMP  loop_start     ; Unconditional jump to loop_start

4. Call and Return (CALL/RET) โ€” Function calls

CALL my_function    ; Call a function (saves return address first)
RET                 ; Return from function

The Three-Layer Tower

Between the code you write and the raw bits the CPU executes, there are multiple layers of translation:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚       High-Level Language (C/Python) โ”‚  โ† You write this
โ”‚   int sum = a + b;                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚  compiler / interpreter
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Assembly Language            โ”‚  โ† Humans can read this
โ”‚   ADD RAX, RBX                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚  assembler
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    Machine Code (binary)             โ”‚  โ† CPU executes this
โ”‚   01001000 00000001 11000011         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

What does a simple C function look like once compiled? Here's a concrete example:

// C: add two integers
int add(int a, int b) {
    return a + b;
}

And here's the x86-64 assembly GCC produces for it (with optimizations enabled):

add:
    ; By calling convention: first arg in EDI, second in ESI
    mov    eax, edi     ; put 'a' into EAX
    add    eax, esi     ; EAX = EAX + b
    ret                 ; return (return value lives in EAX)

Three lines. Your entire return a + b expression maps to exactly three instructions. That's the tower in action.

Try It Yourself

On Linux or macOS, you can watch C code become assembly in real time:

# Compile to assembly (-O0 = no optimization, easier to read)
gcc -O0 -S -o add.s add.c

# Read the output
cat add.s

If you prefer Python, the dis module shows you CPython's own internal instruction set:

import dis

def add(a, b):
    return a + b

dis.dis(add)
# Output:
#   LOAD_FAST    0 (a)
#   LOAD_FAST    1 (b)
#   BINARY_OP    0 (+)
#   RETURN_VALUE

This isn't x86 machine codeโ€”it's CPython's virtual machine instructions. Python has its own mini-ISA that runs inside a software interpreter.

For a visual, browser-based experience, open Compiler Explorer at godbolt.org. Paste any C, C++, or Rust code and instantly see the assembly output from dozens of compilers and optimization levels side by side.

๐Ÿ”ฌ Going Deeper

Why do CISC and RISC still coexist?

In theory, RISC's simplicity should have won decades ago. But x86 carries an enormous legacyโ€”billions of programs compiled for it. Intel and AMD's solution was clever: internally, modern x86 chips secretly decompose complex x86 instructions into simpler micro-operations (micro-ops) that look a lot like RISC instructions. The chip presents an x86 face to the world while internally running a RISC-style pipeline. CISC on the outside, RISC on the inside.

Instructions go way beyond basic math

Modern ISAs include massive extension families. x86's SSE and AVX instructions let you add eight floating-point numbers simultaneously with a single instructionโ€”this is called SIMD (Single Instruction, Multiple Data). Your video codec, neural network inference, and image processing pipelines all rely heavily on SIMD. ARM's equivalents are called NEON and SVE. Without SIMD, running a large language model locally would be orders of magnitude slower.

Where to go next

Rate this chapter
4.9  / 5  (64 ratings)

๐Ÿ’ฌ Comments