Mini Shell
Chapter 18: Build a Mini Shell in C
Every programmer who truly understands Linux should build their own shell. It doesn't have to be perfect, but it will make you genuinely understand the difference between fork() and exec(), why pipes require closing unused fds, and how signals propagate across process groups. This chapter implements a mini shell supporting pipes, redirection, and built-in commands in roughly 400 lines of C, then uses GDB debugging to deepen understanding.
1. Why Build Your Own Shell
A shell is one of the thinnest wrappers over the OS. It does almost nothing "magical" โ most functionality is a composition of a few syscalls: fork() to create a child, execvp() to replace its image, waitpid() to reap it, and pipe() + dup2() to implement pipes and redirection. Once you understand these, bash/zsh behavior stops being mysterious.
- Understand fundamentals: why fork/exec are separate (Unix philosophy history)
- Learn syscalls: real usage of pipe, dup2, open, waitpid, sigaction
- Kernel contribution foundation: understanding shell is a prerequisite for contributing to tty/pty drivers
- Interview preparation: high-frequency system programming interview topic at top companies
2. Shell REPL Loop
The core of a shell is a REPL (Read-Eval-Print-Loop): read user input, parse the command, execute it, display the result, repeat forever.
/* repl.c โ ๆ็ฎ REPL ้ชจๆถ */
#include
#include
#include
#include
#include
int main(void) {
char *line;
/* readline ๆไพ่ก็ผ่พๅๅๅฒ่ฎฐๅฝ */
while ((line = readline("mysh$ ")) != NULL) {
if (*line) {
add_history(line); /* ๅ ๅ
ฅๅๅฒ่ฎฐๅฝ๏ผไธไธ็ฎญๅคดๅฏ็ฟป๏ผ */
/* TODO: ่งฃๆๅนถๆง่ก line */
printf("got: %s\n", line);
}
free(line); /* readline ๅ้
๏ผ่ฐ็จ่
่ด่ดฃ้ๆพ */
}
printf("\n"); /* Ctrl+D ้ๅบๆถๆข่ก */
return 0;
}
readline: The readline library provides line editing (arrow keys, Ctrl+A/E), history (up/down arrows), and Tab completion hooks. Install:
apt install libreadline-dev(Debian) oryum install readline-devel(RHEL). Compile with-lreadline.
3. fork + execvp + waitpid
Unix process creation uses the "fork-then-exec" pattern: fork() duplicates the current process (Copy-on-Write โ physical memory is shared until a write occurs), the child calls execvp() to replace its address space with the new program, and the parent calls waitpid() to wait for and reap the child (preventing zombie processes).
/* exec_cmd.c โ fork/execvp/waitpid ๅบ็กๅฝไปคๆง่ก */
#include
#include
#include
#include
#include
#include
#include
/* ๆง่กๅๆกๅฝไปค๏ผๆ ็ฎก้ๆ ้ๅฎๅ๏ผ
* argv: NULL็ปๅฐพ็ๅๆฐๆฐ็ป๏ผๅฆ {"ls", "-la", NULL}
* ่ฟๅๅผ๏ผๅญ่ฟ็จ้ๅบ็ถๆ๏ผๅคฑ่ดฅ่ฟๅ-1
*/
int exec_simple(char **argv) {
if (argv == NULL || argv[0] == NULL)
return 0;
pid_t pid = fork();
if (pid
## 4. Built-in Commands
Built-in commands must execute inside the shell process itself โ they cannot be forked โ because they need to modify the shell's own state (current directory, environment variables, etc.).
```c
/* builtins.c โ ๅ
็ฝฎๅฝไปคๅฎ็ฐ */
#include
#include
#include
#include
#include
#define HISTORY_MAX 100
static char *history[HISTORY_MAX];
static int history_count = 0;
/* ๆทปๅ ๅฐๅๅฒ่ฎฐๅฝ */
void history_add(const char *line) {
if (history_count
## 5. Signal Handling
Shell signal handling follows one core principle: **the shell ignores SIGINT (Ctrl+C), but child processes (foreground programs) do not ignore it**. This makes Ctrl+C interrupt only the foreground program, not the shell. This is achieved via `sigaction()` and process groups.
```c
/* signals.c โ Shell ไฟกๅทๅค็ */
#include
#include
#include
#include
/* Shell ่ช่บซๅๅงๅ๏ผๅฟฝ็ฅไบคไบๅผไฟกๅท */
void shell_init_signals(void) {
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
/* SIG_IGN: Shell ่ช่บซๅฟฝ็ฅ SIGINT๏ผCtrl+C๏ผๅ SIGQUIT๏ผCtrl+\๏ผ */
sa.sa_handler = SIG_IGN;
sigaction(SIGINT, &sa, NULL);
sigaction(SIGQUIT, &sa, NULL);
/* SIGTSTP๏ผCtrl+Z๏ผ๏ผShell ไนๅฟฝ็ฅ๏ผ่ฎฉๅญ่ฟ็จๅค็ */
sigaction(SIGTSTP, &sa, NULL);
/* SIGTTOU/SIGTTIN๏ผๅๅฐ่ฟ็จ่ฏปๅ็ป็ซฏไบง็๏ผShell ๅฟฝ็ฅ */
sigaction(SIGTTOU, &sa, NULL);
sigaction(SIGTTIN, &sa, NULL);
}
/* fork ๅ๏ผๅญ่ฟ็จๆขๅค้ป่ฎคไฟกๅทๅค็ๅนถ่ฎพ็ฝฎ่ฟ็จ็ป */
void child_init_signals(pid_t pgid) {
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = SIG_DFL; /* ๆขๅค้ป่ฎค่กไธบ๏ผSIGINT โ ็ปๆญข๏ผ */
sigaction(SIGINT, &sa, NULL);
sigaction(SIGQUIT, &sa, NULL);
sigaction(SIGTSTP, &sa, NULL);
sigaction(SIGTTOU, &sa, NULL);
sigaction(SIGTTIN, &sa, NULL);
/* ๅฐๅญ่ฟ็จๆพๅ
ฅ่ชๅทฑ็่ฟ็จ็ป๏ผpgid==0 โ ไฝฟ็จ่ช่บซ PID๏ผ */
setpgid(0, pgid);
/* ๅฐๅๅฐๆงๅถๆ่ฝฌ็งป็ปๅญ่ฟ็จ็่ฟ็จ็ป */
tcsetpgrp(STDIN_FILENO, getpgrp());
}
/* sigaction vs signal ็ๅ
ณ้ฎๅบๅซ๏ผ
* signal(): ่กไธบๅจไธๅ Unix ๅฎ็ฐ้ดไธไธ่ด๏ผไฟกๅทๅค็ๆ้ดไธ่ชๅจๅฑ่ฝ
* sigaction(): POSIXๆ ๅ๏ผ่กไธบไธ่ด๏ผๆฏๆ SA_RESTART๏ผ่ชๅจ้ๅฏ่ขซไธญๆญ็็ณป็ป่ฐ็จ๏ผ
*
* SA_RESTART ๅพ้่ฆ๏ผๆฒกๆๅฎ๏ผไฟกๅทไผๅฏผ่ด read()/write() ่ฟๅ EINTR๏ผ
* ้่ฆๆๅจ้่ฏใๆไบ SA_RESTART๏ผ็ณป็ป่ฐ็จไผ่ชๅจ้ๆฐๆง่กใ
*/
6. Redirection Implementation
Redirection is fundamentally about using dup2() after fork but before exec to replace the child's standard fds (0/1/2). dup2(newfd, oldfd) duplicates newfd as oldfd, closing the original newfd; after this, the program reads/writes through oldfd but actually operates on the file pointed to by newfd.
/* redirect.c โ ้ๅฎๅๅฎ็ฐ */
#include
#include
#include
#include
#include
#include
#include
typedef struct {
char **argv;
char *redir_in; /* " file" ็ๆไปถๅ */
char *redir_append; /* ">> file" ็ๆไปถๅ */
int stderr_to_stdout; /* "2>&1" ๆ ๅฟ */
} Cmd;
/* ๅจๅญ่ฟ็จไธญๆง่ก้ๅฎๅ๏ผfork ไนๅ่ฐ็จ๏ผ */
void apply_redirects(Cmd *cmd) {
int fd;
/* ่พๅ
ฅ้ๅฎๅ๏ผredir_in) {
fd = open(cmd->redir_in, O_RDONLY);
if (fd redir_in, strerror(errno));
exit(1);
}
dup2(fd, STDIN_FILENO); /* ๆ ๅ่พๅ
ฅ โ ๆๅๆไปถ */
close(fd); /* ๅ
ณ้ญๅ fd๏ผๅทฒ่ขซ dup2 ๅคๅถ๏ผ */
}
/* ่พๅบ้ๅฎๅ๏ผ> file๏ผๆชๆญๆจกๅผ๏ผ */
if (cmd->redir_out) {
fd = open(cmd->redir_out,
O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd redir_out, strerror(errno));
exit(1);
}
dup2(fd, STDOUT_FILENO); /* ๆ ๅ่พๅบ โ ๆๅๆไปถ */
close(fd);
}
/* ่ฟฝๅ ้ๅฎๅ๏ผ>> file */
if (cmd->redir_append) {
fd = open(cmd->redir_append,
O_WRONLY | O_CREAT | O_APPEND, 0644);
if (fd redir_append, strerror(errno));
exit(1);
}
dup2(fd, STDOUT_FILENO);
close(fd);
}
/* 2>&1๏ผๅฐ stderr ้ๅฎๅๅฐ stdout ๅฝๅๆๅ็็ฎๆ */
if (cmd->stderr_to_stdout)
dup2(STDOUT_FILENO, STDERR_FILENO);
}
7. Pipe Implementation
pipe(pipefd[2]) creates an anonymous pipe: pipefd[0] is the read end, pipefd[1] is the write end. Implementing ls | grep txt requires two child processes: the left child redirects its stdout to pipefd[1]; the right child redirects its stdin from pipefd[0]. Critical: both sides must close the end they don't use, or the reader will never see EOF.
/* pipe_demo.c โ "ls | grep txt" ็ๅฎๆด C ๅฎ็ฐ */
#include
#include
#include
#include
int main(void) {
int pipefd[2];
/* ๅๅปบ็ฎก้๏ผpipefd[0]=่ฏป็ซฏ pipefd[1]=ๅ็ซฏ */
if (pipe(pipefd) **Most Common Pipe Bug:** Forgetting to close the write end of the pipe in the parent process (or in any child that doesn't use it). The result is that the reader's `read()` blocks forever โ as long as any process holds the write end open, the kernel will not send EOF. Debug with `lsof | grep pipe` to find lingering pipe fds.
## 8. Complete Mini Shell Code
Below is the complete mini shell implementation (~230 lines of core code), supporting: simple commands, single-level pipes, input/output/append redirection, built-in commands (cd/pwd/export/history/exit), and Ctrl+C signal handling.
```c
/* mysh.c โ ๅฎๆด่ฟทไฝ Shell
* ็ผ่ฏ๏ผgcc -Wall -Wextra -g -o mysh mysh.c -lreadline
* ่ฟ่ก๏ผ./mysh
*/
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
/* โโโโโโโโโโโโโโ ๅธธ้ไธๆฐๆฎ็ปๆ โโโโโโโโโโโโโโ */
#define MAX_ARGS 64
#define MAX_CMDS 16 /* ็ฎก้ไธญๆๅคๅฝไปคๆฐ */
#define HIST_MAX 100
typedef struct {
char *argv[MAX_ARGS]; /* ๅๆฐๅ่กจ๏ผNULL็ปๅฐพ */
char *redir_in; /* file */
char *redir_append; /* >> file */
int stderr_redir; /* 2>&1 */
} Cmd;
typedef struct {
Cmd cmds[MAX_CMDS]; /* ็ฎก้ไธญ็ๅไธชๅฝไปค */
int count; /* ๅฝไปคๆฐ้ */
} Pipeline;
static int last_exit_code = 0;
/* โโโโโโโโโโโโโโ ไฟกๅทๅๅงๅ โโโโโโโโโโโโโโ */
static void init_signals(void) {
struct sigaction sa = {0};
sa.sa_handler = SIG_IGN;
sigaction(SIGINT, &sa, NULL);
sigaction(SIGQUIT, &sa, NULL);
sigaction(SIGTSTP, &sa, NULL);
sigaction(SIGTTOU, &sa, NULL);
sigaction(SIGTTIN, &sa, NULL);
}
/* โโโโโโโโโโโโโโ ่ฏๆณๅๆ๏ผtokenize๏ผ โโโโโโโโโโโโโโ */
/* ๅฐ่พๅ
ฅ่กๅๅฒไธบ token ๆฐ็ป๏ผ่ฟๅ token ๆฐ้ */
static int tokenize(char *line, char **tokens, int max_tokens) {
int n = 0;
char *tok = strtok(line, " \t\n");
while (tok != NULL && n redir_in = tokens[++i];
} else if (strcmp(t, ">") == 0) {
if (i + 1 redir_out = tokens[++i];
} else if (strcmp(t, ">>") == 0) {
if (i + 1 redir_append = tokens[++i];
} else if (strcmp(t, "2>&1") == 0) {
cmd->stderr_redir = 1;
} else {
if (argc argv[argc++] = t;
}
i++;
}
cmd->argv[argc] = NULL;
return argc;
}
/* ๅฐ token ๆฐ็ปๆ็ฎก้็ฌฆ'|'ๅๅฒ๏ผๅกซๅ
Pipeline ็ปๆ */
static void parse_pipeline(char **tokens, int ntokens, Pipeline *pl) {
pl->count = 0;
int start = 0;
for (int i = 0; i count cmds[pl->count++]);
}
start = i + 1;
}
}
}
/* โโโโโโโโโโโโโโ ๅ
็ฝฎๅฝไปค โโโโโโโโโโโโโโ */
static int builtin_cd(char **argv) {
const char *dir = argv[1] ? argv[1] : (getenv("HOME") ? getenv("HOME") : "/");
if (chdir(dir) != 0) {
fprintf(stderr, "cd: %s: %s\n", dir, strerror(errno));
return 1;
}
return 0;
}
static int builtin_pwd(char **argv) {
(void)argv;
char buf[4096];
if (!getcwd(buf, sizeof(buf))) { perror("getcwd"); return 1; }
puts(buf);
return 0;
}
static int run_builtin(Cmd *cmd) {
char **av = cmd->argv;
if (!av[0]) return -1;
if (strcmp(av[0], "cd") == 0) return builtin_cd(av);
if (strcmp(av[0], "pwd") == 0) return builtin_pwd(av);
if (strcmp(av[0], "history") == 0) {
HIST_ENTRY **list = history_list();
if (list) for (int i = 0; list[i]; i++)
printf("%4d %s\n", i + 1, list[i]->line);
return 0;
}
if (strcmp(av[0], "exit") == 0) exit(av[1] ? atoi(av[1]) : 0);
return -1; /* ไธๆฏๅ
็ฝฎๅฝไปค */
}
/* โโโโโโโโโโโโโโ ๅญ่ฟ็จ๏ผๅบ็จ้ๅฎๅ โโโโโโโโโโโโโโ */
static void apply_redirects(Cmd *cmd) {
int fd;
if (cmd->redir_in) {
fd = open(cmd->redir_in, O_RDONLY);
if (fd redir_in); exit(1); }
dup2(fd, STDIN_FILENO); close(fd);
}
if (cmd->redir_out) {
fd = open(cmd->redir_out, O_WRONLY|O_CREAT|O_TRUNC, 0644);
if (fd redir_out); exit(1); }
dup2(fd, STDOUT_FILENO); close(fd);
}
if (cmd->redir_append) {
fd = open(cmd->redir_append, O_WRONLY|O_CREAT|O_APPEND, 0644);
if (fd redir_append); exit(1); }
dup2(fd, STDOUT_FILENO); close(fd);
}
if (cmd->stderr_redir)
dup2(STDOUT_FILENO, STDERR_FILENO);
}
/* โโโโโโโโโโโโโโ ๆง่ก Pipeline โโโโโโโโโโโโโโ */
static int exec_pipeline(Pipeline *pl) {
/* ๅๆกๅฝไปค๏ผๅ
ๅฐ่ฏๅ
็ฝฎๅฝไปค */
if (pl->count == 1) {
int ret = run_builtin(&pl->cmds[0]);
if (ret >= 0) return ret;
}
int n = pl->count;
int pipes[MAX_CMDS - 1][2]; /* n-1 ไธช็ฎก้ */
pid_t pids[MAX_CMDS];
/* ้ขๅ
ๅๅปบๆๆ็ฎก้ */
for (int i = 0; i 0) {
dup2(pipes[i-1][0], STDIN_FILENO);
}
/* ่ฟๆฅ็ฎก้๏ผๅๅ
ฅไธไธไธชๅฝไปค */
if (i cmds[i]);
/* exec */
execvp(pl->cmds[i].argv[0], pl->cmds[i].argv);
fprintf(stderr, "mysh: %s: %s\n",
pl->cmds[i].argv[0], strerror(errno));
exit(127);
}
}
/* ็ถ่ฟ็จ๏ผๅ
ณ้ญๆๆ็ฎก้ fd */
for (int i = 0; i 0 && tokens[0][0] == '#') {
free(line); continue;
}
/* ่งฃๆ็ฎก้ */
Pipeline pl;
parse_pipeline(tokens, ntok, &pl);
/* ๆง่ก */
if (pl.count > 0 && pl.cmds[0].argv[0] != NULL)
last_exit_code = exec_pipeline(&pl);
free(line);
}
return last_exit_code;
}
9. Makefile Build
# Makefile โ ่ฟทไฝ Shell ๆๅปบๆไปถ
/* ไฟๅญไธบ Makefile๏ผๆณจๆ๏ผ้
ๆน่กๅฟ
้กป็จ Tab ็ผฉ่ฟ๏ผไธ่ฝ็จ็ฉบๆ ผ๏ผ */
CC = gcc
CFLAGS = -Wall -Wextra -Wpedantic -g -std=c11
LDFLAGS = -lreadline
TARGET = mysh
SRC = mysh.c
.PHONY: all clean install bear
all: $(TARGET)
$(TARGET): $(SRC)
$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
# bear ็ๆ compile_commands.json๏ผไพ clangd/LSP ไฝฟ็จ๏ผ
bear: $(SRC)
bear -- $(CC) $(CFLAGS) -o $(TARGET) $(SRC) $(LDFLAGS)
# ๅฎ่ฃ
ๅฐ ~/bin
install: $(TARGET)
install -m 755 $(TARGET) $(HOME)/bin/$(TARGET)
clean:
rm -f $(TARGET) compile_commands.json
# ่ฟ่ก๏ผๅธฆ ASAN ๅ
ๅญๆฃๆต๏ผ
asan:
$(CC) $(CFLAGS) -fsanitize=address,undefined \
-o $(TARGET)-asan $(SRC) $(LDFLAGS)
./$(TARGET)-asan
compile_commands.json: Generated via
bear -- make, this compilation database enables clangd (the LSP server for VSCode/Neovim) to provide accurate code completion, go-to-definition, and diagnostics โ standard practice in modern C development.
10. GDB Debugging
Debugging a shell especially requires attention to child process behavior after fork. GDB defaults to following the parent after fork, but set follow-fork-mode child switches it to follow the child instead.
# ๅฏๅจ GDB
gdb ./mysh
# ๅบๆฌๅฝไปค้ๆฅ
(gdb) b main # ๅจ main ๅฝๆฐ่ฎพๆญ็น
(gdb) b mysh.c:120 # ๅจ็ฌฌ 120 ่ก่ฎพๆญ็น
(gdb) b exec_pipeline # ๅจๅฝๆฐ่ฎพๆญ็น
(gdb) r # ่ฟ่ก็จๅบ๏ผrun๏ผ
(gdb) n # ๅๆญฅๆง่ก๏ผnext๏ผไธ่ฟๅ
ฅๅฝๆฐ๏ผ
(gdb) s # ๅๆญฅๆง่ก๏ผstep๏ผ่ฟๅ
ฅๅฝๆฐ๏ผ
(gdb) c # ็ปง็ปญ่ฟ่ก๏ผcontinue๏ผ
(gdb) p pid # ๆๅฐๅ้ pid ็ๅผ
(gdb) p *cmd # ๆๅฐ็ปๆไฝๅ
ๅฎน
(gdb) p cmd->argv[0] # ๆๅฐๆ้ๆๅ
(gdb) x/s buf # ไปฅๅญ็ฌฆไธฒๆ ผๅผๆพ็คบๅ
ๅญ
(gdb) bt # ๆพ็คบ่ฐ็จๆ ๏ผbacktrace๏ผ
(gdb) frame 2 # ๅๆขๅฐ่ฐ็จๆ ็ฌฌ2ๅธง
(gdb) info locals # ๆพ็คบๅฝๅๅฝๆฐๆๆๅฑ้จๅ้
(gdb) info registers # ๆพ็คบๅฏๅญๅจๅผ
(gdb) watch pids[0] # ็่งๅ้๏ผๅผๆนๅๆถๅไธ๏ผ
# ่ฐ่ฏ fork ๅ็ๅญ่ฟ็จ
(gdb) set follow-fork-mode child # fork ๅ่ท่ธชๅญ่ฟ็จ
(gdb) set follow-fork-mode parent # fork ๅ่ท่ธช็ถ่ฟ็จ๏ผ้ป่ฎค๏ผ
(gdb) set detach-on-fork off # fork ๅไธคไธช่ฟ็จ้ฝ่ฐ่ฏ๏ผๅคๆ๏ผ
# ๅฎไพ๏ผ่ฐ่ฏ็ฎก้ๆ่ตท้ฎ้ข
(gdb) b exec_pipeline
(gdb) r
# ่พๅ
ฅ "ls | grep txt" ่งฆๅๆญ็น
(gdb) n # ๅๆญฅๅฐ pipe() ่ฐ็จ
(gdb) p pipes[0][0] # ๆฅ็็ฎก้่ฏป็ซฏ fd ๅท
(gdb) p pipes[0][1] # ๆฅ็็ฎก้ๅ็ซฏ fd ๅท
# ็กฎ่ฎค fork ๅ็ถ่ฟ็จๆญฃ็กฎๅ
ณ้ญไบๆๆ็ฎก้ fd
# ่ฐ่ฏๅ
ๅญ้ฎ้ข๏ผ้
ๅ AddressSanitizer๏ผ
# ็ผ่ฏ๏ผgcc -fsanitize=address -g -o mysh-asan mysh.c -lreadline
./mysh-asan
# ASAN ไผๅจๅบ็ฐ use-after-free/buffer-overflow ๆถๆๅฐๅฎๆดๆฅๅ
11. Extension Exercises
After completing the basic version, try these extensions:
- && ||: Parse
&&(run next only if previous succeeded) and||(run next only if previous failed), checklast_exit_codeto decide - $?: Expand
$?to the last exit code; do string substitution after tokenizing but before parsing - Tab completion: Register
rl_completion_entry_function, useglob()for path expansion, useopendir/readdirto enumerate executables in PATH - Background jobs &: Detect trailing
&, skipwaitpidafter fork, useSIGCHLDsignal to asynchronously reap the child - Multi-stage pipes: The current implementation already supports multi-stage pipes (limited by
MAX_CMDS); verify thatls | grep txt | wc -lworks correctly - Variable expansion: After tokenizing, walk tokens and replace
$VARwith the value ofgetenv("VAR")
Chapter Summary: The mini shell covers the core patterns of Unix systems programming: fork-exec-wait (process creation), pipe+dup2 (inter-process communication), sigaction (signal handling), open+dup2 (I/O redirection). Master these four syscall combinations and you understand 80% of bash's core implementation โ and have a solid foundation for the kernel contributions in Chapter 19.
Previous
โ Ch17: Syscalls
Next
Ch19: Kernel Contrib โ