Chapter 18

Mini Shell

Chapter 18: Build a Mini Shell in C

Every programmer who truly understands Linux should build their own shell. It doesn't have to be perfect, but it will make you genuinely understand the difference between fork() and exec(), why pipes require closing unused fds, and how signals propagate across process groups. This chapter implements a mini shell supporting pipes, redirection, and built-in commands in roughly 400 lines of C, then uses GDB debugging to deepen understanding.

1. Why Build Your Own Shell

A shell is one of the thinnest wrappers over the OS. It does almost nothing "magical" — most functionality is a composition of a few syscalls: fork() to create a child, execvp() to replace its image, waitpid() to reap it, and pipe() + dup2() to implement pipes and redirection. Once you understand these, bash/zsh behavior stops being mysterious.

2. Shell REPL Loop

The core of a shell is a REPL (Read-Eval-Print-Loop): read user input, parse the command, execute it, display the result, repeat forever.

/* repl.c — 最简 REPL 骨架 */
#include 
#include 
#include 
#include 
#include 

int main(void) {
    char *line;

    /* readline 提供行编辑和历史记录 */
    while ((line = readline("mysh$ ")) != NULL) {
        if (*line) {
            add_history(line);      /* 加入历史记录(上下箭头可翻) */
            /* TODO: 解析并执行 line */
            printf("got: %s\n", line);
        }
        free(line);                 /* readline 分配,调用者负责释放 */
    }

    printf("\n");                   /* Ctrl+D 退出时换行 */
    return 0;
}

readline: The readline library provides line editing (arrow keys, Ctrl+A/E), history (up/down arrows), and Tab completion hooks. Install: apt install libreadline-dev (Debian) or yum install readline-devel (RHEL). Compile with -lreadline.

3. fork + execvp + waitpid

Unix process creation uses the "fork-then-exec" pattern: fork() duplicates the current process (Copy-on-Write — physical memory is shared until a write occurs), the child calls execvp() to replace its address space with the new program, and the parent calls waitpid() to wait for and reap the child (preventing zombie processes).

/* exec_cmd.c — fork/execvp/waitpid 基础命令执行 */
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* 执行单条命令(无管道无重定向)
 * argv: NULL结尾的参数数组,如 {"ls", "-la", NULL}
 * 返回值:子进程退出状态,失败返回-1
 */
int exec_simple(char **argv) {
    if (argv == NULL || argv[0] == NULL)
        return 0;

    pid_t pid = fork();
    if (pid 
  
## 4. Built-in Commands


  
Built-in commands must execute inside the shell process itself — they cannot be forked — because they need to modify the shell's own state (current directory, environment variables, etc.).


  
```c
/* builtins.c — 内置命令实现 */
#include 
#include 
#include 
#include 
#include 

#define HISTORY_MAX 100
static char *history[HISTORY_MAX];
static int   history_count = 0;

/* 添加到历史记录 */
void history_add(const char *line) {
    if (history_count 
  
## 5. Signal Handling


  
Shell signal handling follows one core principle: **the shell ignores SIGINT (Ctrl+C), but child processes (foreground programs) do not ignore it**. This makes Ctrl+C interrupt only the foreground program, not the shell. This is achieved via `sigaction()` and process groups.


  
```c
/* signals.c — Shell 信号处理 */
#include 
#include 
#include 
#include 

/* Shell 自身初始化:忽略交互式信号 */
void shell_init_signals(void) {
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));

    /* SIG_IGN: Shell 自身忽略 SIGINT(Ctrl+C)和 SIGQUIT(Ctrl+\) */
    sa.sa_handler = SIG_IGN;
    sigaction(SIGINT,  &sa, NULL);
    sigaction(SIGQUIT, &sa, NULL);

    /* SIGTSTP(Ctrl+Z):Shell 也忽略,让子进程处理 */
    sigaction(SIGTSTP, &sa, NULL);

    /* SIGTTOU/SIGTTIN:后台进程读写终端产生,Shell 忽略 */
    sigaction(SIGTTOU, &sa, NULL);
    sigaction(SIGTTIN, &sa, NULL);
}

/* fork 后,子进程恢复默认信号处理并设置进程组 */
void child_init_signals(pid_t pgid) {
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = SIG_DFL;   /* 恢复默认行为(SIGINT → 终止) */

    sigaction(SIGINT,  &sa, NULL);
    sigaction(SIGQUIT, &sa, NULL);
    sigaction(SIGTSTP, &sa, NULL);
    sigaction(SIGTTOU, &sa, NULL);
    sigaction(SIGTTIN, &sa, NULL);

    /* 将子进程放入自己的进程组(pgid==0 → 使用自身 PID) */
    setpgid(0, pgid);

    /* 将前台控制权转移给子进程的进程组 */
    tcsetpgrp(STDIN_FILENO, getpgrp());
}

/* sigaction vs signal 的关键区别:
 * signal():   行为在不同 Unix 实现间不一致,信号处理期间不自动屏蔽
 * sigaction(): POSIX标准,行为一致,支持 SA_RESTART(自动重启被中断的系统调用)
 *
 * SA_RESTART 很重要:没有它,信号会导致 read()/write() 返回 EINTR,
 * 需要手动重试。有了 SA_RESTART,系统调用会自动重新执行。
 */

6. Redirection Implementation

Redirection is fundamentally about using dup2() after fork but before exec to replace the child's standard fds (0/1/2). dup2(newfd, oldfd) duplicates newfd as oldfd, closing the original newfd; after this, the program reads/writes through oldfd but actually operates on the file pointed to by newfd.

/* redirect.c — 重定向实现 */
#include 
#include 
#include 
#include 
#include 
#include 
#include 

typedef struct {
    char **argv;
    char  *redir_in;      /* " file" 的文件名 */
    char  *redir_append;  /* ">> file" 的文件名 */
    int    stderr_to_stdout; /* "2>&1" 标志 */
} Cmd;

/* 在子进程中执行重定向(fork 之后调用) */
void apply_redirects(Cmd *cmd) {
    int fd;

    /* 输入重定向:redir_in) {
        fd = open(cmd->redir_in, O_RDONLY);
        if (fd redir_in, strerror(errno));
            exit(1);
        }
        dup2(fd, STDIN_FILENO);   /* 标准输入 → 指向文件 */
        close(fd);                /* 关闭原 fd(已被 dup2 复制) */
    }

    /* 输出重定向:> file(截断模式) */
    if (cmd->redir_out) {
        fd = open(cmd->redir_out,
                  O_WRONLY | O_CREAT | O_TRUNC, 0644);
        if (fd redir_out, strerror(errno));
            exit(1);
        }
        dup2(fd, STDOUT_FILENO);  /* 标准输出 → 指向文件 */
        close(fd);
    }

    /* 追加重定向:>> file */
    if (cmd->redir_append) {
        fd = open(cmd->redir_append,
                  O_WRONLY | O_CREAT | O_APPEND, 0644);
        if (fd redir_append, strerror(errno));
            exit(1);
        }
        dup2(fd, STDOUT_FILENO);
        close(fd);
    }

    /* 2>&1:将 stderr 重定向到 stdout 当前指向的目标 */
    if (cmd->stderr_to_stdout)
        dup2(STDOUT_FILENO, STDERR_FILENO);
}

7. Pipe Implementation

pipe(pipefd[2]) creates an anonymous pipe: pipefd[0] is the read end, pipefd[1] is the write end. Implementing ls | grep txt requires two child processes: the left child redirects its stdout to pipefd[1]; the right child redirects its stdin from pipefd[0]. Critical: both sides must close the end they don't use, or the reader will never see EOF.

/* pipe_demo.c — "ls | grep txt" 的完整 C 实现 */
#include 
#include 
#include 
#include 

int main(void) {
    int pipefd[2];

    /* 创建管道:pipefd[0]=读端  pipefd[1]=写端 */
    if (pipe(pipefd)  **Most Common Pipe Bug:**     Forgetting to close the write end of the pipe in the parent process (or in any child that doesn't use it). The result is that the reader's `read()` blocks forever — as long as any process holds the write end open, the kernel will not send EOF. Debug with `lsof | grep pipe` to find lingering pipe fds.


  
  
## 8. Complete Mini Shell Code


  
Below is the complete mini shell implementation (~230 lines of core code), supporting: simple commands, single-level pipes, input/output/append redirection, built-in commands (cd/pwd/export/history/exit), and Ctrl+C signal handling.


  
```c
/* mysh.c — 完整迷你 Shell
 * 编译:gcc -Wall -Wextra -g -o mysh mysh.c -lreadline
 * 运行:./mysh
 */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* ────────────── 常量与数据结构 ────────────── */
#define MAX_ARGS    64
#define MAX_CMDS    16     /* 管道中最多命令数 */
#define HIST_MAX   100

typedef struct {
    char *argv[MAX_ARGS];  /* 参数列表,NULL结尾 */
    char *redir_in;        /*  file */
    char *redir_append;    /* >> file */
    int   stderr_redir;    /* 2>&1 */
} Cmd;

typedef struct {
    Cmd  cmds[MAX_CMDS];   /* 管道中的各个命令 */
    int  count;            /* 命令数量 */
} Pipeline;

static int last_exit_code = 0;

/* ────────────── 信号初始化 ────────────── */
static void init_signals(void) {
    struct sigaction sa = {0};
    sa.sa_handler = SIG_IGN;
    sigaction(SIGINT,  &sa, NULL);
    sigaction(SIGQUIT, &sa, NULL);
    sigaction(SIGTSTP, &sa, NULL);
    sigaction(SIGTTOU, &sa, NULL);
    sigaction(SIGTTIN, &sa, NULL);
}

/* ────────────── 词法分析(tokenize) ────────────── */
/* 将输入行分割为 token 数组,返回 token 数量 */
static int tokenize(char *line, char **tokens, int max_tokens) {
    int n = 0;
    char *tok = strtok(line, " \t\n");
    while (tok != NULL && n redir_in = tokens[++i];
        } else if (strcmp(t, ">") == 0) {
            if (i + 1 redir_out = tokens[++i];
        } else if (strcmp(t, ">>") == 0) {
            if (i + 1 redir_append = tokens[++i];
        } else if (strcmp(t, "2>&1") == 0) {
            cmd->stderr_redir = 1;
        } else {
            if (argc argv[argc++] = t;
        }
        i++;
    }
    cmd->argv[argc] = NULL;
    return argc;
}

/* 将 token 数组按管道符'|'分割,填充 Pipeline 结构 */
static void parse_pipeline(char **tokens, int ntokens, Pipeline *pl) {
    pl->count = 0;
    int start = 0;

    for (int i = 0; i count cmds[pl->count++]);
            }
            start = i + 1;
        }
    }
}

/* ────────────── 内置命令 ────────────── */
static int builtin_cd(char **argv) {
    const char *dir = argv[1] ? argv[1] : (getenv("HOME") ? getenv("HOME") : "/");
    if (chdir(dir) != 0) {
        fprintf(stderr, "cd: %s: %s\n", dir, strerror(errno));
        return 1;
    }
    return 0;
}

static int builtin_pwd(char **argv) {
    (void)argv;
    char buf[4096];
    if (!getcwd(buf, sizeof(buf))) { perror("getcwd"); return 1; }
    puts(buf);
    return 0;
}

static int run_builtin(Cmd *cmd) {
    char **av = cmd->argv;
    if (!av[0]) return -1;
    if (strcmp(av[0], "cd")      == 0) return builtin_cd(av);
    if (strcmp(av[0], "pwd")     == 0) return builtin_pwd(av);
    if (strcmp(av[0], "history") == 0) {
        HIST_ENTRY **list = history_list();
        if (list) for (int i = 0; list[i]; i++)
            printf("%4d  %s\n", i + 1, list[i]->line);
        return 0;
    }
    if (strcmp(av[0], "exit") == 0) exit(av[1] ? atoi(av[1]) : 0);
    return -1;   /* 不是内置命令 */
}

/* ────────────── 子进程:应用重定向 ────────────── */
static void apply_redirects(Cmd *cmd) {
    int fd;
    if (cmd->redir_in) {
        fd = open(cmd->redir_in, O_RDONLY);
        if (fd redir_in); exit(1); }
        dup2(fd, STDIN_FILENO); close(fd);
    }
    if (cmd->redir_out) {
        fd = open(cmd->redir_out, O_WRONLY|O_CREAT|O_TRUNC, 0644);
        if (fd redir_out); exit(1); }
        dup2(fd, STDOUT_FILENO); close(fd);
    }
    if (cmd->redir_append) {
        fd = open(cmd->redir_append, O_WRONLY|O_CREAT|O_APPEND, 0644);
        if (fd redir_append); exit(1); }
        dup2(fd, STDOUT_FILENO); close(fd);
    }
    if (cmd->stderr_redir)
        dup2(STDOUT_FILENO, STDERR_FILENO);
}

/* ────────────── 执行 Pipeline ────────────── */
static int exec_pipeline(Pipeline *pl) {
    /* 单条命令:先尝试内置命令 */
    if (pl->count == 1) {
        int ret = run_builtin(&pl->cmds[0]);
        if (ret >= 0) return ret;
    }

    int n = pl->count;
    int pipes[MAX_CMDS - 1][2];   /* n-1 个管道 */
    pid_t pids[MAX_CMDS];

    /* 预先创建所有管道 */
    for (int i = 0; i  0) {
                dup2(pipes[i-1][0], STDIN_FILENO);
            }
            /* 连接管道:写入下一个命令 */
            if (i cmds[i]);

            /* exec */
            execvp(pl->cmds[i].argv[0], pl->cmds[i].argv);
            fprintf(stderr, "mysh: %s: %s\n",
                    pl->cmds[i].argv[0], strerror(errno));
            exit(127);
        }
    }

    /* 父进程:关闭所有管道 fd */
    for (int i = 0; i  0 && tokens[0][0] == '#') {
            free(line); continue;
        }

        /* 解析管道 */
        Pipeline pl;
        parse_pipeline(tokens, ntok, &pl);

        /* 执行 */
        if (pl.count > 0 && pl.cmds[0].argv[0] != NULL)
            last_exit_code = exec_pipeline(&pl);

        free(line);
    }

    return last_exit_code;
}

9. Makefile Build

# Makefile — 迷你 Shell 构建文件
/* 保存为 Makefile(注意:配方行必须用 Tab 缩进,不能用空格) */
CC      = gcc
CFLAGS  = -Wall -Wextra -Wpedantic -g -std=c11
LDFLAGS = -lreadline
TARGET  = mysh
SRC     = mysh.c

.PHONY: all clean install bear

all: $(TARGET)

$(TARGET): $(SRC)
	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)

# bear 生成 compile_commands.json(供 clangd/LSP 使用)
bear: $(SRC)
	bear -- $(CC) $(CFLAGS) -o $(TARGET) $(SRC) $(LDFLAGS)

# 安装到 ~/bin
install: $(TARGET)
	install -m 755 $(TARGET) $(HOME)/bin/$(TARGET)

clean:
	rm -f $(TARGET) compile_commands.json

# 运行(带 ASAN 内存检测)
asan:
	$(CC) $(CFLAGS) -fsanitize=address,undefined \
	    -o $(TARGET)-asan $(SRC) $(LDFLAGS)
	./$(TARGET)-asan

compile_commands.json: Generated via bear -- make, this compilation database enables clangd (the LSP server for VSCode/Neovim) to provide accurate code completion, go-to-definition, and diagnostics — standard practice in modern C development.

10. GDB Debugging

Debugging a shell especially requires attention to child process behavior after fork. GDB defaults to following the parent after fork, but set follow-fork-mode child switches it to follow the child instead.

# 启动 GDB
gdb ./mysh

# 基本命令速查
(gdb) b main              # 在 main 函数设断点
(gdb) b mysh.c:120        # 在第 120 行设断点
(gdb) b exec_pipeline     # 在函数设断点
(gdb) r                   # 运行程序(run)
(gdb) n                   # 单步执行(next,不进入函数)
(gdb) s                   # 单步执行(step,进入函数)
(gdb) c                   # 继续运行(continue)
(gdb) p pid               # 打印变量 pid 的值
(gdb) p *cmd              # 打印结构体内容
(gdb) p cmd->argv[0]      # 打印指针成员
(gdb) x/s buf             # 以字符串格式显示内存
(gdb) bt                  # 显示调用栈(backtrace)
(gdb) frame 2             # 切换到调用栈第2帧
(gdb) info locals         # 显示当前函数所有局部变量
(gdb) info registers      # 显示寄存器值
(gdb) watch pids[0]       # 监视变量(值改变时停下)

# 调试 fork 后的子进程
(gdb) set follow-fork-mode child    # fork 后跟踪子进程
(gdb) set follow-fork-mode parent   # fork 后跟踪父进程(默认)
(gdb) set detach-on-fork off        # fork 后两个进程都调试(复杂)

# 实例:调试管道挂起问题
(gdb) b exec_pipeline
(gdb) r
# 输入 "ls | grep txt" 触发断点
(gdb) n  # 单步到 pipe() 调用
(gdb) p pipes[0][0]  # 查看管道读端 fd 号
(gdb) p pipes[0][1]  # 查看管道写端 fd 号
# 确认 fork 后父进程正确关闭了所有管道 fd

# 调试内存问题(配合 AddressSanitizer)
# 编译:gcc -fsanitize=address -g -o mysh-asan mysh.c -lreadline
./mysh-asan
# ASAN 会在出现 use-after-free/buffer-overflow 时打印完整报告

11. Extension Exercises

After completing the basic version, try these extensions:

Chapter Summary: The mini shell covers the core patterns of Unix systems programming: fork-exec-wait (process creation), pipe+dup2 (inter-process communication), sigaction (signal handling), open+dup2 (I/O redirection). Master these four syscall combinations and you understand 80% of bash's core implementation — and have a solid foundation for the kernel contributions in Chapter 19.

  Previous
  ← Ch17: Syscalls


  Next
  Ch19: Kernel Contrib →
Rate this chapter
4.7  / 5  (11 ratings)

💬 Comments