Processes: Programs Come Alive
Processes: Programs Come Alive
A musical score is just ink on paper. When an orchestra performs it, it comes alive โ there is sound, rhythm, memory, and progression through time. A program is the static file sitting on your disk. A process is that score being performed: dynamic, consuming resources, changing state moment by moment.
The same score can be performed by two orchestras simultaneously without interference. Likewise, you can open two Chrome windows at once โ both stem from the same executable file, yet they run as completely independent processes. If you think in object-oriented terms: a program is the class, a process is an instance. One class, many possible instances.
Core Concepts
ELF: What a Program Actually Looks Like
On Linux, executable programs use the ELF (Executable and Linkable Format) file format. You can verify this instantly:
$ file /bin/ls
/bin/ls: ELF 64-bit LSB pie executable, x86-64, ...
Inside every ELF file are several named sections. The three most important:
ELF File Layout
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ELF Header (magic + arch) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ .text section โ
โ Machine instructions (R/O) โ
โ This is where your code is โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ .data section โ
โ Initialized global vars โ
โ e.g.: int x = 42; โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ .bss section โ
โ Uninitialized global vars โ
โ e.g.: int y; โ
โ (zero bytes on disk!) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Other sections (symtab...) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The .bss name comes from a 1950s assembler directive โ "Block Started by Symbol" โ and has stuck ever since. It takes up almost no disk space: the file just records "I need X bytes of zeroed memory here." When the OS loads the program, it allocates and zeroes that region on demand.
Process Address Space: From 0 to Max
When a program is loaded, the OS creates an isolated virtual address space for it. On 64-bit Linux the layout looks like this (low addresses at the bottom):
High address
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0xFFFFFFFFFFFFFFFF
โ Kernel space (invisible) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค 0x7FFFFFFFFFFFFFFF
โ Stack โ โ grows downward
โ Function call frames โ
โ Local variables โ
โ ... โ
โ โ โ
โ โ
โ โ โ
โ Heap โ โ grows upward
โ malloc / new allocations โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ .bss (zeroed globals) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ .data (initialized globals)โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ .text (code) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
Low address ~0x0000000000400000
Stack and heap grow toward each other with a large gap between them. If you malloc endlessly without freeing, or recurse too deeply, they collide โ producing the Out of Memory or Stack Overflow errors you have almost certainly encountered.
Creating Processes: fork and exec
Linux creates new processes with the classic fork + exec two-step:
Parent process (shell)
โ
โ fork()
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ Child (exact copy of parent)
โ โ
โ โ exec("ls", ...)
โ โ Replace self with new program
โ โ
โ wait() โ ls starts executing
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ ls finishes, exit()
โ
shell resumes
fork() creates a copy-on-write snapshot of the parent โ no actual memory is copied until the child tries to write something, at which point only that one page is duplicated. exec() then replaces the entire address space with the new program's image and jumps to its entry point.
The Process State Machine
Every process passes through five states during its lifetime:
fork()
โ
โผ
โโโโโโโโโ
โ New โ
โโโโโฌโโโโ
โ Enters ready queue
โผ
โโโโโโโโโ CPU frees up
โ Ready โโโโโโโโโโโโโโโโโโโโโ
โโโโโฌโโโโ โ
โ Scheduler picks it โ
โผ โ
โโโโโโโโโ Time slice ends โ
โRunningโโโโโโโโโโโโโโโโโโโโโ
โโโโโฌโโโโ
โ โ
wait โ โ exit()
for I/Oโผ โผ
โโโโโโโโโ โโโโโโโโโโโโโ
โBlockedโ โ Terminatedโ
โโโโโฌโโโโ โโโโโโโโโโโโโ
โ I/O completes
โโโโโโโโบ Ready
"Blocked" does not mean the process is crashed. It simply means it's waiting for something โ a disk read to finish, a network packet to arrive, keyboard input. Once that event occurs, the OS moves it back to the ready queue.
PCB: The Process's Identity Card
The OS tracks every process using a data structure called the PCB (Process Control Block). It contains:
- PID (process ID)
- Current state (ready / running / blocked)
- CPU register snapshot (saved when the process is switched out)
- Memory map (page table pointer)
- List of open file descriptors
- Signal handlers
- Parent PID and child process list
In the Linux kernel the PCB is the task_struct defined in include/linux/sched.h. It has hundreds of fields โ everything the kernel needs to know about a process is in there.
Hands-On Verification
# View the process tree with PIDs
pstree -p
# See the address space of the current shell
cat /proc/$$/maps
# Inspect ELF sections of a binary
readelf -S /bin/ls | grep -E "\.text|\.data|\.bss"
# Quick size summary of each section
size /bin/ls
# Output:
# text data bss dec hex filename
# 143416 4824 4664 152904 25548 /bin/ls
# Demonstrate fork in Python
python3 -c "
import os
pid = os.fork()
if pid == 0:
print(f'Child: PID={os.getpid()}, parent={os.getppid()}')
os._exit(0)
else:
print(f'Parent: PID={os.getpid()}, child={pid}')
os.wait()
"
๐ฌ Going Deeper
How Clever Is Copy-on-Write?
If fork() actually duplicated all 500 MB of a parent's memory, the operation would take hundreds of milliseconds โ completely unacceptable for a shell spawning commands every second. Copy-on-Write solves this: after fork(), parent and child share every memory page, all marked read-only. The moment either side tries to write to a page, the CPU raises a page fault, the kernel copies just that one page, and execution continues. In the extremely common fork() + exec() pattern (spawning a new program), the child never writes to the old pages at all โ they get discarded immediately when exec() replaces the address space.
Process Isolation Is the Foundation of Security
Every process lives in its own virtual address space, enforced by the CPU's page tables. Process A's virtual address 0x401000 and Process B's virtual address 0x401000 map to completely different physical pages. The OS can also intentionally map the same physical page into multiple processes โ this is how shared memory (via mmap or shmget) works, enabling high-speed inter-process communication without copying.
Recommended Reading:
- Computer Systems: A Programmer's Perspective (CSAPP) โ Chapter 8, Exceptional Control Flow. The fork/exec/wait trio is explained more precisely here than anywhere else.
- Operating Systems: Three Easy Pieces (OSTEP) โ Part II, Virtualization. Address spaces, paging, and the TLB are all covered in a logical sequence that builds one concept cleanly on top of the last.
- Linux kernel source
kernel/fork.cโ Thecopy_process()function is the real implementation offork(). Reading it alongside OSTEP's paging chapters is a genuinely rewarding experience.