Pipes and File Descriptors
Chapter 11: Pipes, Redirection, and File Descriptors Deep Dive
Linux's "everything is a file" philosophy finds its fullest expression in pipes and file descriptors. Understanding how file descriptors work — how processes read and write data streams through numbered handles 0/1/2, how pipes ferry bytes through kernel buffers — is the core foundation for writing efficient, reliable Shell scripts. This chapter dissects the complete implementation of redirection and pipes from a kernel perspective.
11.1 File Descriptor Basics: 0 / 1 / 2
A file descriptor (FD) is a non-negative integer index the kernel assigns to each open file or data stream. Every process has its own FD table. Three descriptors are open by default at process startup:
- 0 — stdin:standard input, defaults to keyboard
- 1 — stdout:standard output, defaults to terminal
- 2 — stderr:standard error, defaults to terminal
Process (bash, PID 1234) ┌──────────────────────────────────────────┐ │ FD Table │ │ ┌────┬──────────────────────────────┐ │ │ │ 0 │ → /dev/pts/0 (stdin) │ │ │ │ 1 │ → /dev/pts/0 (stdout) │ │ │ │ 2 │ → /dev/pts/0 (stderr) │ │ │ │ 3 │ → /var/log/app.log (custom) │ │ │ └────┴──────────────────────────────┘ │ └──────────────────────────────────────────┘
After: exec 1>app.log ┌──────────────────────────────────────────┐ │ FD Table │ │ ┌────┬──────────────────────────────┐ │ │ │ 0 │ → /dev/pts/0 (stdin) │ │ │ │ 1 │ → app.log (stdout now!) │ │ │ │ 2 │ → /dev/pts/0 (stderr) │ │ │ └────┴──────────────────────────────┘ │ └──────────────────────────────────────────┘
The kernel exposes a process's FD table through `/proc/PID/fd/`, where each entry is a symlink to the actual file. `lsof -p` presents the same information in a more readable format.
# 查看当前 bash 进程的文件描述符
ls -la /proc/$$/fd
# lrwx------ 1 user user 64 Apr 25 10:00 0 -> /dev/pts/0
# lrwx------ 1 user user 64 Apr 25 10:00 1 -> /dev/pts/0
# lrwx------ 1 user user 64 Apr 25 10:00 2 -> /dev/pts/0
# 查看某个进程的 FD(以 PID 1234 为例)
ls -la /proc/1234/fd
# 用 lsof 查看进程打开的文件描述符
lsof -p $$
# 或过滤只看 FD 列
lsof -p $$ | awk 'NR==1 || $4 ~ /^[0-9]/'
# 查看 FD 数量限制
ulimit -n # 当前软限制(通常 1024)
cat /proc/sys/fs/file-max # 系统级最大 FD 数
11.2 Redirection Operators — Complete Reference
Redirection is fundamentally about modifying a child process's FD table after fork() but before exec(), pointing standard streams to files instead of the terminal. Shell provides concise syntax to accomplish this kernel operation.
# === 输出重定向 ===
command > file # stdout 覆盖写入 file(FD1 → file)
command >> file # stdout 追加写入 file
command 2> file # stderr 覆盖写入 file(FD2 → file)
command 2>> file # stderr 追加写入 file
# === 合并 stderr 到 stdout ===
command 2>&1 # 将 FD2 复制为 FD1 的副本(两者指向同一目标)
command > file 2>&1 # stdout+stderr 都写入 file(顺序重要!)
command 2>&1 > file # 错误写法:stderr 先指向旧 stdout(终端),再重定向 stdout
# bash 4+ 的简写(等价于 > file 2>&1)
command &> file
command &>> file # 追加版本
# === 丢弃输出 ===
command > /dev/null # 丢弃 stdout
command 2> /dev/null # 丢弃 stderr
command > /dev/null 2>&1 # 丢弃所有输出
command &> /dev/null # 简写
# === 输入重定向 ===
command 不能覆盖已存在的文件
echo "test" > existing.txt # 报错:cannot overwrite existing file
echo "test" >| existing.txt # 强制覆盖(绕过 noclobber)
set +C # 关闭 noclobber
# === 实用组合示例 ===
# 同时记录 stdout 和 stderr 到各自文件
command > out.log 2> err.log
# 编译并只看错误
make 2>&1 | grep -i error
# 丢弃 stdout,只留 stderr
command > /dev/null
# 测试命令是否成功(不显示任何输出)
if grep -q "pattern" file 2>/dev/null; then
echo "found"
fi
Pitfall: Order of 2>&1
command > file 2>&1andcommand 2>&1 > fileare completely different. Shell processes redirections left to right: the first sets FD1 to file then copies FD2 from FD1 (also file); the second copies FD2 from the current FD1 (terminal) then sets FD1 to file — stderr still goes to the terminal.
| Operator | Effect | Equivalent |
|---|---|---|
| > file | stdout overwrite | 1> file |
| >> file | stdout append | 1>> file |
| 2> file | stderr overwrite | — |
| 2>&1 | stderr → stdout target | — |
| &> file | stdout+stderr write | > file 2>&1 |
| stdin from file | 0 | |
| > | file | force overwrite (bypass noclobber) |
11.3 Pipe Internals: Kernel Buffer and Subprocesses
The pipe | is one of Unix's greatest inventions. The kernel creates a circular buffer (default 65536 bytes, i.e., 64 KB) for each pipe: the left command's stdout connects to the write end, the right command's stdin connects to the read end. Both commands run concurrently; the kernel coordinates data flow.
# 基本管道:ls 的 stdout → grep 的 stdin
ls -la | grep ".sh"
# 多级管道
cat /var/log/syslog | grep "error" | sort | uniq -c | sort -rn | head -20
# |& 同时传递 stdout 和 stderr(bash 4+)
command |& grep "ERROR"
# 等价于:command 2>&1 | grep "ERROR"
# 查看管道缓冲区大小(Linux 默认 65536 字节)
cat /proc/sys/fs/pipe-max-size
# 管道状态:当缓冲区满时,写端阻塞;缓冲区空时,读端阻塞
# 利用这个特性可以实现背压(backpressure)
# 管道的退出状态问题
ls nonexistent | wc -l # ls 失败,但整个管道退出码是 wc 的退出码(0!)
echo $? # 0 — 掩盖了 ls 的错误
# 用 PIPESTATUS 获取管道中每个命令的退出码(bash 专有)
ls nonexistent | wc -l
echo "${PIPESTATUS[@]}" # 例:2 0 (ls 失败=2,wc 成功=0)
# set -o pipefail:让管道返回最右非零退出码(推荐!)
set -o pipefail
ls nonexistent | wc -l
echo $? # 现在是 2(ls 的退出码)
Variable Scope in Pipelines
Commands on the right side of a pipe run in a subshell, meaning variables assigned inside the pipeline vanish after it ends. This is one of the most common "lost variable" traps in bash scripts:
# 陷阱示例:count 在子 shell 中赋值,父 shell 看不见
count=0
echo "a b c" | while read word; do
count=$((count + 1))
done
echo "count=$count" # 输出:count=0 — 变量丢失!
# 解决方案 1:用进程替换(避免管道子 shell)
count=0
while read word; do
count=$((count + 1))
done /tmp/count.txt
count=$(cat /tmp/count.txt)
# 检查当前是否在子 shell 中
echo "BASH_SUBSHELL=$BASH_SUBSHELL" # 0=当前shell,1=子shell,2=子子shell
(echo "inside subshell: BASH_SUBSHELL=$BASH_SUBSHELL")
11.4 Here-Document: Elegant Multi-Line Input
A here-document (`
11.5 Here-String: Single-Line String to stdin
`
11.6 Process Substitution: ()
Process substitution is an advanced bash/zsh feature that presents a command's output or input as a file path (via /dev/fd/N or /proc/self/fd/N). This solves cases where a command needs two file arguments that pipes cannot satisfy.
# (command):将文件输出传给命令(可写)
# tee 同时写入文件和进一步处理
tee >(gzip > backup.gz) /dev/null
# 同时发送到两个处理管道
command | tee >(grep "ERROR" > errors.log) >(grep "WARN" > warnings.log) > /dev/null
# 结合使用:从进程替换读取并写入进程替换
cmp **Why Not Temporary Files?** Process substitution beats temporary files on three counts: 1) no manual cleanup needed; 2) data flows in memory without hitting disk (great for large files); 3) concurrent execution — both sides run simultaneously. The downside: bash/zsh only, not POSIX `sh`.
## 11.7 Manipulating File Descriptors with exec
The `exec` shell builtin can not only replace the current process — it can directly modify the current shell's FD table without replacing the process. This is the foundation for script-level log redirection and a core technique in advanced shell programming.
```bash
# === 打开自定义文件描述符 ===
exec 3>output.txt # 打开 FD3 用于写(覆盖)
exec 4>>append.txt # 打开 FD4 用于追加写
exec 5bidirectional.txt # 打开 FD6 读写(少用)
# 向 FD3 写入
echo "line 1" >&3
echo "line 2" >&3
# 从 FD5 读取
read line &- # 关闭 FD3(写端)
exec 5"$LOGFILE"
exec 2>&1
echo "This goes to $LOGFILE" # 写入日志
ls /nonexistent # 错误也写入日志
# === 保存并恢复原始 stdout/stderr ===
exec 3>&1 # 将当前 stdout 保存到 FD3
exec 4>&2 # 将当前 stderr 保存到 FD4
exec 1>/tmp/script.log 2>&1 # 重定向
echo "In log"
ls /nonexistent
exec 1>&3 # 恢复 stdout
exec 2>&4 # 恢复 stderr
exec 3>&- # 关闭临时 FD3
exec 4>&- # 关闭临时 FD4
echo "Back to terminal" # 现在输出到终端
# === 使用高编号 FD 避免冲突 ===
# bash 4.1+ 支持 {var} 自动分配 FD(避免硬编码)
exec {myfd}>output.txt
echo "Using auto FD: $myfd" >&$myfd
exec {myfd}>&- # 关闭
Production-Grade Log Redirection Script
#!/usr/bin/env bash
# 生产级脚本:同时输出到终端和日志文件
# 使用 exec + tee 实现双重输出
LOGFILE="/var/log/deploy-$(date +%Y%m%d-%H%M%S).log"
mkdir -p "$(dirname "$LOGFILE")"
# 技巧:用 tee 将所有输出同时发送到终端和日志文件
exec 1> >(tee -a "$LOGFILE") 2>&1
echo "[$(date '+%Y-%m-%d %H:%M:%S')] === Deploy started ==="
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Log file: $LOGFILE"
# 以下所有输出自动记录到日志
echo "Step 1: Pulling latest code..."
git pull origin main
echo "Step 2: Installing dependencies..."
npm install --production
echo "Step 3: Restarting service..."
systemctl restart myapp
echo "[$(date '+%Y-%m-%d %H:%M:%S')] === Deploy completed ==="
11.8 Named Pipes (FIFO): Persistent Pipes
Anonymous pipes (|) last only as long as the command, but a named pipe (FIFO) is a special file in the filesystem that lets unrelated processes communicate via a file path. FIFOs use the same kernel buffer but persist by name.
# 创建命名管道
mkfifo /tmp/mypipe
ls -la /tmp/mypipe
# prw-r--r-- 1 user user 0 Apr 25 10:00 /tmp/mypipe
# 'p' 表示这是 FIFO 文件
# === 生产者 / 消费者模式 ===
# 终端 1(消费者,先启动,会阻塞等待数据)
cat /tmp/mypipe
# 终端 2(生产者,写入数据后消费者自动退出)
echo "Hello from producer" > /tmp/mypipe
# === 后台消费者 + 实时日志 ===
mkfifo /tmp/logpipe
# 后台:持续读取管道数据并写入文件
while true; do
cat /tmp/logpipe >> /var/log/myapp.log 2>/dev/null || break
done &
LOG_READER_PID=$!
# 应用程序写日志到管道
echo "App started" > /tmp/logpipe
echo "Processing..." > /tmp/logpipe
# 清理
kill $LOG_READER_PID
rm /tmp/logpipe
# === 匿名管道 vs 命名管道对比 ===
# 匿名管道:只能用于有亲缘关系的进程(父子关系),命令行中自动创建
# 命名管道:任意进程都可通过文件路径通信,需要 mkfifo 显式创建
# 共同点:都是内核缓冲区,读写同步,单向流动
# 利用 FIFO 实现简单的进程间信号
mkfifo /tmp/ready_signal
# 进程 A:完成初始化后发信号
echo "ready" > /tmp/ready_signal
# 进程 B:等待信号再开始工作
read signal
## 11.9 tee: Splitting Output Streams
`tee` works like a T-pipe fitting: it reads from stdin and simultaneously writes to stdout and one or more files. This makes it the perfect tool for "tapping into" a pipeline to record data midstream.
```bash
# 基本用法:输出到终端的同时保存到文件
ls -la | tee file_list.txt
# -a:追加写(不覆盖)
command | tee -a output.log
# 写入多个文件
command | tee file1.txt file2.txt file3.txt
# 管道继续处理
ls -la | tee /tmp/raw_list.txt | grep "\.sh" | wc -l
# 结合 sudo 写入需要权限的文件(常见用法)
echo "new content" | sudo tee /etc/somefile.conf
# 注意:不能用 sudo echo "..." > /etc/somefile(重定向由普通用户执行)
# tee + 进程替换:一份输出,多个处理管道
command | tee >(grep "ERROR" | mail -s "Errors" [email protected]) \
>(grep "WARN" >> warnings.log) \
> full.log
# 实时监控并记录日志(显示到终端同时写文件)
tail -f /var/log/nginx/access.log | tee -a /tmp/monitoring.log | grep "500"
# 配合时间戳记录构建日志
make 2>&1 | tee >(awk '{print strftime("[%H:%M:%S]"), $0}' > build.log)
11.10 /dev Special Files: null / zero / random / tcp
Linux's /dev directory contains virtual device files backed by no physical hardware — they are special data sources or sinks provided by the kernel. Using them well produces elegant shell scripts.
# === /dev/null — 黑洞设备 ===
# 读取:立即返回 EOF(空文件)
# 写入:丢弃所有数据
command > /dev/null 2>&1 # 丢弃所有输出
cat /dev/null > file.txt # 清空文件(比 echo -n > file 更语义化)
: > file.txt # 同样效果,更简洁
# === /dev/zero — 零字节生成器 ===
# 无限提供值为 0 的字节(NUL 字符)
# 创建指定大小的空文件(比 fallocate 更通用)
dd if=/dev/zero of=emptyfile bs=1M count=100 # 创建 100MB 全零文件
# 清零敏感数据文件(安全删除前)
dd if=/dev/zero of=secrets.txt bs=1 count=$(stat -c%s secrets.txt)
# === /dev/random 与 /dev/urandom — 随机源 ===
# /dev/random:阻塞式,熵池不足时等待(适合密钥生成)
# /dev/urandom:非阻塞式,熵池不足时用伪随机(适合一般场景)
# 生成随机密码(32字符,base64编码)
head -c 24 /dev/urandom | base64
# 生成随机十六进制字符串
head -c 16 /dev/urandom | xxd -p | tr -d '\n'
# 生成随机 UUID(手动实现)
cat /proc/sys/kernel/random/uuid # 更简单的方式
# 用 $RANDOM 生成简单随机数(范围 0-32767)
echo $RANDOM
echo $(( RANDOM % 100 )) # 0-99 的随机数
# === /dev/tcp — bash 内置 TCP 客户端 ===
# bash 特有功能(非设备文件,由 bash 内部处理)
# 格式:/dev/tcp/host/port
# 测试端口是否开放(比 nc 更便携,不需要安装额外工具)
if (: /dev/null; then
echo "Port 80 is open"
else
echo "Port 80 is closed"
fi
# 发送简单 HTTP 请求
exec 3<>/dev/tcp/example.com/80
echo -e "GET / HTTP/1.0\r\nHost: example.com\r\n\r\n" >&3
cat &-
# 检查服务是否存活(用于监控脚本)
check_port() {
local host=$1 port=$2
(: /dev/null
}
check_port localhost 3306 && echo "MySQL is up" || echo "MySQL is down"
11.11 Subshells and Pipeline Environment Isolation
Understanding when subshells are created and how they isolate from the parent shell is essential for debugging "mysteriously disappearing" variables in shell scripts. Here are the main scenarios that spawn a subshell:
# === 显式子 shell:圆括号 () ===
# 子 shell 继承父 shell 的变量,但修改不影响父 shell
x=10
(
echo "In subshell: x=$x" # 10(继承)
x=999
echo "Modified in subshell: x=$x" # 999
)
echo "In parent: x=$x" # 10(不受影响)
# 子 shell 的 BASH_SUBSHELL 变量
echo "Parent: $BASH_SUBSHELL" # 0
(echo "Child: $BASH_SUBSHELL") # 1
((echo "Grandchild: $BASH_SUBSHELL")) # 注意:这是算术表达式,不是子 shell!
( ( echo "Grandchild: $BASH_SUBSHELL" ) ) # 2(正确嵌套子 shell)
# === 花括号组合命令 {} — 当前 Shell 执行 ===
# 变量修改对父 shell 可见!
y=10
{
y=999
echo "In group: y=$y" # 999
}
echo "After group: y=$y" # 999(变量保留!)
# 注意:{} 内最后一条命令后需要分号,且 { 后需要空格
# === 管道中的子 shell ===
# 管道的每个组件(默认)都在子 shell 中
VAR=""
echo "hello" | VAR=$(cat); echo "VAR=$VAR" # VAR=(空,子 shell)
# 正确做法:用进程替换避免子 shell
VAR=$(echo "hello"); echo "VAR=$VAR" # VAR=hello
# === 命令替换 $() 也是子 shell ===
result=$(
x=100
echo $((x * 2))
)
echo "result=$result" # result=200
echo "x=$x" # x=(空,x 在子 shell 中定义)
# === 后台命令 & 也是子 shell ===
bg_var=""
(bg_var="set in background") &
wait
echo "bg_var=$bg_var" # 空
Chapter Summary: This chapter traced the complete working chain of file descriptors, redirection, and pipes from the kernel level. Key points: redirection is FD table modification between fork and exec; pipes are 64 KB kernel buffers; the order of
2>&1matters critically; pipe right-side commands run in subshells (variables don't propagate);exec N>filepersistently manipulates the current shell's FDs; named pipes enable cross-process communication. With these mechanisms mastered, the next chapter's script engineering will come naturally.
Previous
← Ch10: Functions
Next
Ch12: Engineering →