The Life of an Instruction
The Life of an Instruction
Imagine ordering a steak at a restaurant. From the moment you speak to the waiter until the plate arrives at your table, a sequence of handoffs happens: the waiter writes your order, runs it to the kitchen, a chef reads the ticket, prepares ingredients, cooks the steak, plates it, and hands it off to the runner. Each station has a dedicated job. Nothing is redundant; everything is specialized.
A CPU executing an instruction works the same way. Computer scientists call this the instruction cycleโa fixed sequence of stages that every instruction passes through, from birth to completion. The classic version has five stages: Fetch, Decode, Execute, Memory Access, and Write Back. Every instruction lives and dies in these five workshops.
Core Concepts
The Five Stages
Let's follow a single concrete instruction through its entire life: ADD R1, R2, R3 โ which means "add the values in R2 and R3, and store the result in R1."
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ The Five Stages of Instruction Execution โ
โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Stage โ What Happens โ
โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ IF Fetch โ Use PC to retrieve the instruction from memory โ
โ ID Decode โ Parse the binary; figure out op and operands โ
โ EX Execute โ The ALU adds R2 and R3 together โ
โ MEM Memory โ Read/write main memory if needed (ADD skips) โ
โ WB Write Backโ Write the result into the destination reg (R1) โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Stage 1 โ Instruction Fetch (IF)
The CPU contains a special register called the Program Counter (PC). Its only job is to remember the address of the next instruction in memory. During IF, the CPU uses PC as an address, reaches into memory, and pulls out the instruction bytes, loading them into the Instruction Register (IR).
Once fetched, the PC automatically advances by the instruction size (4 bytes for ARM, variable for x86). Think of PC as a bookmark that always points to the next thing to read.
Stage 2 โ Instruction Decode (ID)
The IR now holds raw binary. For example, in a MIPS-style encoding:
00000000001100100000000010000000
The decoder breaks it apart:
Opcode 000000 โ arithmetic instruction
RS 00011 โ first source: register R3
RT 00010 โ second source: register R2
RD 00001 โ destination: register R1
Shamt 00000 โ shift amount: 0
Funct 100000 โ specific operation: ADD
Simultaneously, the decoder reads the actual current values out of R2 and R3 from the register file, so they're ready for the next stage.
Stage 3 โ Execute (EX)
The Arithmetic Logic Unit (ALU) takes the two values that were read in the decode stage and performs the actual computation:
R2 = 7
R3 = 5
โโโโโโโ
7 โโโโโโโบโ โ
โ ALU โโโโบ 12
5 โโโโโโโบโ โ
โโโโโโโ
For jump instructions, the EX stage instead computes the branch target address and updates the PC directly. The ALU is the muscle; IF and ID just set it up.
Stage 4 โ Memory Access (MEM)
If the instruction needs to interact with main memoryโa load like LW R1, 100(R2) or a store like SW R1, 100(R2)โthat happens here. This stage talks directly to the memory subsystem.
Our ADD R1, R2, R3 doesn't touch memory at all, so this stage is a pass-through: the result just flows forward to the final stage.
Stage 5 โ Write Back (WB)
The result is written back to its destination register. For our instruction, that means placing the value 12 into R1.
R1 โ 12 โ An instruction's life is complete.
Full Flow Diagram
Memory CPU
โโโโโโโโโโโโโโโโ PC โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ addr 0x1000 โโโโโโโโโโโโโโโโโ Program Counter PC = 0x1000 โ
โ ADD R1,R2,R3 โ โ โ
โ addr 0x1004 โ IF โ 1. Fetch: retrieve from 0x1000 โ
โ ... โโโโโโโโโโโโโโโโบโ IR โ mem[PC] โ
โโโโโโโโโโโโโโโโ โ PC โ PC + 4 (now 0x1004) โ
โ โ
โ 2. Decode: parse IR, read R2, R3 โ
โ op=ADD rd=R1 rs=R2 rt=R3 โ
โ โ
โ 3. Execute: ALU computes R2 + R3 โ
โ result = 7 + 5 = 12 โ
โ โ
โ 4. Memory: no memory op (pass) โ
โ โ
โ 5. Write Back: R1 โ 12 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Try It Yourself
Here's a toy Python simulation of the five-stage cycle:
# A minimal CPU simulator: 4 registers, only ADD instruction
registers = [0, 7, 5, 0] # R0=0, R1=7, R2=5, R3=0
memory = [
("ADD", 3, 1, 2), # ADD R3, R1, R2 โ R3 = R1 + R2
("ADD", 0, 3, 1), # ADD R0, R3, R1 โ R0 = R3 + R1
]
PC = 0
while PC < len(memory):
# Stage 1: Fetch
instruction = memory[PC]
PC += 1
print(f"IF: fetched {instruction}")
# Stage 2: Decode
op, rd, rs, rt = instruction
a, b = registers[rs], registers[rt]
print(f"ID: op={op} src={a},{b} dest=R{rd}")
# Stage 3: Execute
if op == "ADD":
result = a + b
print(f"EX: result = {result}")
# Stage 4: Memory (nothing to do)
print(f"MEM: (no memory op)")
# Stage 5: Write Back
registers[rd] = result
print(f"WB: R{rd} <- {result}\n")
print("Final register state:", registers)
Running this shows each instruction marching through all five stages. This is structurally identical to what real hardware doesโjust at the speed of Python instead of a few nanoseconds.
๐ฌ Going Deeper
The PC is the heart of control flow
The Program Counter is the CPU's "you are here" marker, but it doesn't always step forward quietly. A JMP instruction forcibly sets PC to a new address. A CALL saves the current PC onto the stack and then jumps to a function. Loops, branches, and function calls are, at their core, all just manipulations of the PC. Every if, while, and function you've ever written compiles down to instructions that move the PC around.
How interrupts cut in line
Your program is running fine when suddenly a key is pressed. How does the CPU know? Through interrupts. When a hardware interrupt fires, the CPU finishes its current instruction, saves the PC (and other state) somewhere safe, then jumps to a special interrupt handler routine. After the handler finishes, everything is restored and the original program continuesโnone the wiser. Interrupts can inject themselves between any two instructions, which is what makes keyboards, timers, and network cards work.
Five stages is just the beginning
The five-stage pipeline we've described is the classic MIPS textbook model. Modern x86 chips like Intel's Core series or AMD's Zen architecture have well over a dozen internal stagesโsome designs approach 20 or more. More stages means higher theoretical throughput. But it also means that when something goes wrong (like a branch prediction miss), the CPU has to flush more work in progress, wasting more cycles. Pipeline depth is an engineering tradeoff, not a free lunch.
Where to learn more
- Computer Organization and Design by Patterson & Hennessy โ Chapter 4 is the definitive textbook treatment of the five-stage pipeline, complete with diagrams and worked examples.
- Computer Systems: A Programmer's Perspective (CSAPP) โ Appendix B walks through the design of a full pipelined processor (Y86-64) from scratch.
- nandgame.com โ Build a CPU from logic gates upward, as an interactive puzzle game. You'll feel every stage click into place.