Final Project
Chapter 20: Final Project — Production-Grade DevOps Script System
This chapter synthesizes every skill from the book into a complete, production-grade operations scripting system. It covers service health monitoring, resource alerting, log analysis, automated deployment, backup and recovery, webhook notifications, systemd integration, and bats automated testing. Every line of code maps back to a specific earlier chapter. After finishing this chapter you will have a deployable ops infrastructure foundation.
1. Project Overview and Architecture
The core design principles of this ops scripting system: single responsibility (each script does one thing), observable (unified log format), idempotent (repeated execution has no side effects), testable (critical logic covered by bats tests).
| Module | Function | Source Chapter |
|---|---|---|
| lib/common.sh | Shared function library | Ch9, Ch10, Ch11 |
| monitor/health_check.sh | HTTP/TCP/process monitoring | Ch7, Ch5 |
| monitor/resource_alert.sh | CPU/memory/disk alerting | Ch14, Ch8 |
| monitor/log_analyzer.sh | Log keyword scanning | Ch4, Ch11 |
| alert/webhook.sh | DingTalk/Feishu notifications | Ch7, Ch12 |
| deploy/rolling_deploy.sh | Rolling deploy and rollback | Ch5, Ch12 |
| backup/backup.sh | rsync+encrypted backup | Ch8, Ch15 |
| systemd/ | systemd service/timer units | Ch13 |
| tests/ | bats automated tests | Ch12 |
2. Project Directory Structure
ops-scripts/ ├── lib/ │ └── common.sh # shared function library ├── monitor/ │ ├── health_check.sh # HTTP/TCP/process checks │ ├── resource_alert.sh # resource threshold alerting │ └── log_analyzer.sh # log keyword analysis ├── alert/ │ └── webhook.sh # DingTalk/Feishu/email notify ├── deploy/ │ └── rolling_deploy.sh # rolling deploy and rollback ├── backup/ │ ├── backup.sh # rsync snapshot backup+encryption │ └── restore.sh # restore script ├── systemd/ │ ├── ops-monitor.service │ ├── ops-monitor.timer │ └── ops-backup.service ├── tests/ │ ├── test_common.bats │ ├── test_webhook.bats │ └── test_backup.bats ├── config.env # config file (not committed to git) └── config.env.example # config template
3. Shared Function Library (lib/common.sh)
Every script sources the common library via source "$(dirname "$0")/../lib/common.sh". The library provides unified log formatting, dependency checking, mutex locking, and HTTP helper functions.
#!/usr/bin/env bash
# lib/common.sh — 公共函数库 / shared function library
set -euo pipefail
# ── 颜色常量(仅 tty 时启用)────────────────────────────
if [[ -t 2 ]]; then
RED='\033[0;31m'; YELLOW='\033[1;33m'
GREEN='\033[0;32m'; CYAN='\033[0;36m'
NC='\033[0m'
else
RED=''; YELLOW=''; GREEN=''; CYAN=''; NC=''
fi
# ── 日志函数 ─────────────────────────────────────────────
LOG_FILE="${LOG_FILE:-/var/log/ops-scripts/ops.log}"
_log() {
local level="$1"; shift
local ts
ts=$(date '+%Y-%m-%dT%H:%M:%S%z')
# 同时输出到 stderr 和日志文件
printf '%s [%s] %s\n' "$ts" "$level" "$*" | tee -a "$LOG_FILE" >&2
}
log_info() { _log "INFO " "${CYAN}$*${NC}"; }
log_warn() { _log "WARN " "${YELLOW}$*${NC}"; }
log_error() { _log "ERROR" "${RED}$*${NC}"; }
die() {
log_error "$*"
exit 1
}
# ── 依赖检查 ─────────────────────────────────────────────
require_cmd() {
local cmd
for cmd in "$@"; do
command -v "$cmd" &>/dev/null || die "Required command not found: $cmd"
done
}
# ── 互斥锁 ───────────────────────────────────────────────
LOCK_DIR="${LOCK_DIR:-/tmp/ops-locks}"
mkdir -p "$LOCK_DIR"
lock_file() {
local name="$1"
local lock="$LOCK_DIR/${name}.lock"
if ! mkdir "$lock" 2>/dev/null; then
local pid
pid=$(cat "$lock/pid" 2>/dev/null || echo "unknown")
die "Lock held by PID $pid: $lock"
fi
echo $$ > "$lock/pid"
# 注册退出时自动释放锁
trap "unlock_file '$name'" EXIT INT TERM
}
unlock_file() {
local name="$1"
rm -rf "$LOCK_DIR/${name}.lock"
}
# ── HTTP 工具 ─────────────────────────────────────────────
# http_post URL JSON_BODY — 发送 POST 请求,返回 HTTP 状态码
http_post() {
local url="$1"
local body="$2"
curl -s -o /dev/null -w "%{http_code}" \
-H 'Content-Type: application/json' \
--data "$body" \
--max-time 10 \
"$url"
}
# ── 配置加载 ──────────────────────────────────────────────
load_config() {
local cfg="${1:-config.env}"
[[ -f "$cfg" ]] || die "Config file not found: $cfg"
# 安全加载:只允许 KEY=VALUE 格式,过滤注释和空行
while IFS='=' read -r key value; do
[[ "$key" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]] || continue
export "$key"="$value"
done
## 4. Service Health Monitoring (monitor/health_check.sh)
The health check script supports three check types: HTTP status code check, TCP port reachability, and process liveness check. Configuration comes from `config.env`; failures call `alert/webhook.sh` to send alerts.
```bash
#!/usr/bin/env bash
# monitor/health_check.sh
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd curl nc pgrep
ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
FAIL_COUNT=0
# ── HTTP 检查 ─────────────────────────────────────────────
check_http() {
local name="$1" url="$2" expected="${3:-200}"
local code
code=$(curl -sfo /dev/null -w "%{http_code}" \
--max-time 10 --connect-timeout 5 "$url" 2>/dev/null || echo "000")
if [[ "$code" == "$expected" ]]; then
log_info "HTTP OK: $name ($url) => $code"
else
log_error "HTTP FAIL: $name ($url) expected=$expected got=$code"
"$ALERT_SCRIPT" "HTTP check failed: $name returned $code (expected $expected)"
(( FAIL_COUNT++ ))
fi
}
# ── TCP 检查 ──────────────────────────────────────────────
check_tcp() {
local name="$1" host="$2" port="$3"
if nc -z -w 3 "$host" "$port" &>/dev/null; then
log_info "TCP OK: $name ($host:$port)"
else
log_error "TCP FAIL: $name ($host:$port) unreachable"
"$ALERT_SCRIPT" "TCP check failed: $name ($host:$port) is unreachable"
(( FAIL_COUNT++ ))
fi
}
# ── 进程检查 ──────────────────────────────────────────────
check_process() {
local name="$1" proc="$2"
if pgrep -x "$proc" &>/dev/null; then
log_info "PROC OK: $name ($proc running)"
else
log_error "PROC FAIL: $name ($proc not found)"
"$ALERT_SCRIPT" "Process check failed: $name ($proc) is not running"
(( FAIL_COUNT++ ))
fi
}
# ── 运行所有检查(从 config.env 读取目标列表)────────────
# 格式: HTTP_CHECKS="name,url,expected_code name2,url2,200"
IFS=' ' read -ra http_list
## 5. Resource Alerting (monitor/resource_alert.sh)
```bash
#!/usr/bin/env bash
# monitor/resource_alert.sh — CPU/内存/磁盘告警
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd awk df free
ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
# 默认阈值(可在 config.env 中覆盖)
CPU_THRESHOLD="${CPU_THRESHOLD:-85}"
MEM_THRESHOLD="${MEM_THRESHOLD:-90}"
DISK_THRESHOLD="${DISK_THRESHOLD:-85}"
# ── CPU 使用率 ────────────────────────────────────────────
cpu_usage() {
# 读取 /proc/stat 计算两次快照之间的 CPU 占用率
local cpu1 cpu2 idle1 idle2 total1 total2
read -r _ cpu1 = DISK_THRESHOLD )); then
log_warn "Disk usage ${usage}% on $mnt (threshold: ${DISK_THRESHOLD}%)"
"$ALERT_SCRIPT" "Disk alert: ${usage}% used on $mnt"
fi
done
}
# ── 执行检查 ──────────────────────────────────────────────
cpu=$(cpu_usage)
mem=$(mem_usage)
log_info "Resource snapshot — CPU: ${cpu}% MEM: ${mem}%"
if (( cpu >= CPU_THRESHOLD )); then
log_warn "CPU usage ${cpu}% exceeds threshold ${CPU_THRESHOLD}%"
"$ALERT_SCRIPT" "CPU alert: ${cpu}% (threshold: ${CPU_THRESHOLD}%)"
fi
if (( mem >= MEM_THRESHOLD )); then
log_warn "Memory usage ${mem}% exceeds threshold ${MEM_THRESHOLD}%"
"$ALERT_SCRIPT" "Memory alert: ${mem}% (threshold: ${MEM_THRESHOLD}%)"
fi
check_disk
6. Log Analysis (monitor/log_analyzer.sh)
#!/usr/bin/env bash
# monitor/log_analyzer.sh — 关键字扫描与错误频率统计
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd grep awk tail
ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
LOG_TARGET="${APP_LOG:-/var/log/app/app.log}"
ERROR_THRESHOLD="${ERROR_THRESHOLD:-10}" # 每分钟错误数阈值
WINDOW_SECONDS=60
# ── 统计最近 N 秒内的错误数 ───────────────────────────────
count_recent_errors() {
local log="$1"
local since
since=$(date -d "${WINDOW_SECONDS} seconds ago" '+%Y-%m-%d %H:%M:%S' 2>/dev/null \
|| date -v -"${WINDOW_SECONDS}"S '+%Y-%m-%d %H:%M:%S') # macOS 兼容
# 统计包含 ERROR 或 FATAL 的行数
awk -v since="$since" '
$0 >= since { count++ }
/ERROR|FATAL|PANIC/ && $0 >= since { err++ }
END { print err+0 }
' "$log"
}
# ── 关键字告警 ────────────────────────────────────────────
scan_keywords() {
local log="$1"
local -a keywords=("OutOfMemoryError" "SIGSEGV" "disk full" "connection refused")
local kw
for kw in "${keywords[@]}"; do
local cnt
cnt=$(grep -c "$kw" "$log" 2>/dev/null || echo 0)
if (( cnt > 0 )); then
log_warn "Keyword '$kw' found $cnt time(s) in $log"
"$ALERT_SCRIPT" "Log alert: '$kw' occurred $cnt time(s) in $(basename "$log")"
fi
done
}
# ── 实时跟踪模式(后台运行)──────────────────────────────
follow_log() {
local log="$1"
log_info "Starting real-time log tail on $log"
tail -F "$log" 2>/dev/null | grep --line-buffered -E 'ERROR|FATAL|PANIC' | \
while IFS= read -r line; do
log_error "Log event: $line"
"$ALERT_SCRIPT" "Real-time log alert: $line"
done
}
# ── 主流程 ────────────────────────────────────────────────
if [[ ! -f "$LOG_TARGET" ]]; then
log_warn "Log file not found: $LOG_TARGET"
exit 0
fi
err_count=$(count_recent_errors "$LOG_TARGET")
log_info "Errors in last ${WINDOW_SECONDS}s: $err_count (threshold: $ERROR_THRESHOLD)"
if (( err_count >= ERROR_THRESHOLD )); then
"$ALERT_SCRIPT" "Log alert: $err_count errors in ${WINDOW_SECONDS}s on $(hostname)"
fi
scan_keywords "$LOG_TARGET"
7. Webhook Alerting (alert/webhook.sh)
The alert script supports three channels: DingTalk robot, Feishu robot, and email. To prevent alert storms, a timestamp file lock ensures the same alert type fires at most once per ALERT_COOLDOWN seconds.
#!/usr/bin/env bash
# alert/webhook.sh — 多渠道告警(钉钉/飞书/邮件)+ 去重
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd curl
MESSAGE="${1:-Alert from $(hostname)}"
ALERT_COOLDOWN="${ALERT_COOLDOWN:-300}" # 5 分钟冷却期
COOLDOWN_DIR="/tmp/ops-alert-cooldown"
mkdir -p "$COOLDOWN_DIR"
# ── 告警去重 ──────────────────────────────────────────────
# 用消息的 MD5 作为标识(避免文件名特殊字符问题)
msg_hash=$(printf '%s' "$MESSAGE" | md5sum | cut -d' ' -f1)
cooldown_file="$COOLDOWN_DIR/$msg_hash"
if [[ -f "$cooldown_file" ]]; then
last=$(cat "$cooldown_file")
now=$(date +%s)
if (( now - last "$cooldown_file"
# ── 钉钉机器人 ────────────────────────────────────────────
send_dingtalk() {
[[ -z "${DINGTALK_WEBHOOK:-}" ]] && return 0
local body
body=$(printf '{"msgtype":"text","text":{"content":"[OPS ALERT] %s\nHost: %s\nTime: %s"}}' \
"$MESSAGE" "$(hostname)" "$(date '+%Y-%m-%d %H:%M:%S')")
local code
code=$(http_post "$DINGTALK_WEBHOOK" "$body")
if [[ "$code" == "200" ]]; then
log_info "DingTalk alert sent"
else
log_warn "DingTalk alert failed (HTTP $code)"
fi
}
# ── 飞书机器人 ────────────────────────────────────────────
send_feishu() {
[[ -z "${FEISHU_WEBHOOK:-}" ]] && return 0
local body
body=$(printf '{"msg_type":"text","content":{"text":"[OPS ALERT] %s\nHost: %s\nTime: %s"}}' \
"$MESSAGE" "$(hostname)" "$(date '+%Y-%m-%d %H:%M:%S')")
local code
code=$(http_post "$FEISHU_WEBHOOK" "$body")
if [[ "$code" == "200" ]]; then
log_info "Feishu alert sent"
else
log_warn "Feishu alert failed (HTTP $code)"
fi
}
# ── 邮件告警(需配置 SMTP 或本地 sendmail)──────────────
send_email() {
[[ -z "${ALERT_EMAIL:-}" ]] && return 0
local subject="[OPS ALERT] $(hostname) - $(date '+%H:%M')"
if command -v mail &>/dev/null; then
echo "$MESSAGE" | mail -s "$subject" "$ALERT_EMAIL"
log_info "Email alert sent to $ALERT_EMAIL"
fi
}
# ── 发送所有渠道 ──────────────────────────────────────────
log_warn "Sending alert: $MESSAGE"
send_dingtalk
send_feishu
send_email
8. Automated Deployment (deploy/rolling_deploy.sh)
#!/usr/bin/env bash
# deploy/rolling_deploy.sh — 滚动重启 + 自动回滚
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd git systemctl curl
lock_file "rolling_deploy"
APP_DIR="${APP_DIR:-/opt/app}"
SERVICE_NAME="${SERVICE_NAME:-myapp}"
HEALTH_URL="${HEALTH_URL:-http://localhost:8080/health}"
INSTANCES="${INSTANCES:-instance1 instance2 instance3}"
ROLLBACK_ON_FAIL="${ROLLBACK_ON_FAIL:-true}"
# ── 记录当前 commit 用于回滚 ─────────────────────────────
OLD_COMMIT=$(git -C "$APP_DIR" rev-parse HEAD)
# ── 拉取最新代码 ──────────────────────────────────────────
log_info "Pulling latest code..."
git -C "$APP_DIR" pull --ff-only || die "git pull failed"
NEW_COMMIT=$(git -C "$APP_DIR" rev-parse HEAD)
log_info "Deploying $OLD_COMMIT -> $NEW_COMMIT"
# ── 构建 ──────────────────────────────────────────────────
log_info "Building..."
make -C "$APP_DIR" -j"$(nproc)" || {
log_error "Build failed; rolling back"
git -C "$APP_DIR" checkout "$OLD_COMMIT"
die "Build failed"
}
# ── health_check 辅助函数 ─────────────────────────────────
wait_healthy() {
local url="$1" retries=10 i=0
while (( i
## 9. Backup System (backup/backup.sh)
Backups use `rsync --link-dest` for snapshot-style incremental backups — each snapshot looks like a full directory but only the changed files consume additional disk space. Optionally encrypts the backup with AES-256.
```bash
#!/usr/bin/env bash
# backup/backup.sh — rsync 快照备份 + AES-256 加密 + 轮转
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd rsync
BACKUP_SRC="${BACKUP_SRC:-/opt/app/data}"
BACKUP_DEST="${BACKUP_DEST:-/backup/app}"
KEEP_DAILY="${KEEP_DAILY:-7}"
KEEP_WEEKLY="${KEEP_WEEKLY:-4}"
ENCRYPT="${ENCRYPT_BACKUP:-false}"
PASSPHRASE="${BACKUP_PASSPHRASE:-}"
lock_file "backup"
DATE=$(date '+%Y-%m-%d_%H-%M-%S')
SNAPSHOT_DIR="$BACKUP_DEST/daily/$DATE"
LATEST_LINK="$BACKUP_DEST/latest"
mkdir -p "$BACKUP_DEST/daily"
# ── rsync 快照备份 ────────────────────────────────────────
log_info "Starting rsync snapshot: $BACKUP_SRC -> $SNAPSHOT_DIR"
rsync_opts=(
-aAX # 归档+ACL+扩展属性
--delete # 删除源中已不存在的文件
--numeric-ids # 保留数字 UID/GID
--info=progress2
)
# 如果存在上次备份,用 --link-dest 节省磁盘(硬链接未变化的文件)
if [[ -d "$LATEST_LINK" ]]; then
rsync_opts+=(--link-dest="$LATEST_LINK")
fi
rsync "${rsync_opts[@]}" "$BACKUP_SRC/" "$SNAPSHOT_DIR/" \
|| die "rsync failed"
# 更新 latest 符号链接
ln -sfn "$SNAPSHOT_DIR" "$LATEST_LINK"
log_info "Snapshot complete: $SNAPSHOT_DIR"
# ── 可选:AES-256 加密 ────────────────────────────────────
if [[ "$ENCRYPT" == "true" ]]; then
[[ -z "$PASSPHRASE" ]] && die "BACKUP_PASSPHRASE must be set when ENCRYPT_BACKUP=true"
require_cmd openssl tar
log_info "Encrypting backup..."
ARCHIVE="$BACKUP_DEST/daily/${DATE}.tar.gz.enc"
tar -czf - -C "$BACKUP_DEST/daily" "$DATE" | \
openssl enc -aes-256-cbc -pbkdf2 -iter 100000 \
-pass "pass:$PASSPHRASE" -out "$ARCHIVE"
# 加密成功后删除明文快照
rm -rf "$SNAPSHOT_DIR"
log_info "Encrypted archive: $ARCHIVE"
fi
# ── 保留策略:删除超期的日备份 ───────────────────────────
log_info "Rotating daily backups (keep last $KEEP_DAILY)..."
ls -1dt "$BACKUP_DEST"/daily/*/ 2>/dev/null | tail -n +"$((KEEP_DAILY+1))" | \
xargs -r rm -rf
# 周备份:每周日保留一份(判断今天是否为周日)
if [[ "$(date '+%u')" == "7" ]]; then
mkdir -p "$BACKUP_DEST/weekly"
cp -al "$LATEST_LINK" "$BACKUP_DEST/weekly/$(date '+%Y-W%V')" 2>/dev/null || true
ls -1dt "$BACKUP_DEST"/weekly/*/ 2>/dev/null | tail -n +"$((KEEP_WEEKLY+1))" | \
xargs -r rm -rf
log_info "Weekly backup saved"
fi
log_info "Backup finished successfully"
10. systemd Integration
Use systemd timers instead of crontab for full logging integration, dependency management, and error recovery (see Chapter 13).
# systemd/ops-monitor.service
[Unit]
Description=Ops health check and resource alert
After=network-online.target
[Service]
Type=oneshot
User=opsuser
WorkingDirectory=/opt/ops-scripts
ExecStart=/opt/ops-scripts/monitor/health_check.sh
ExecStart=/opt/ops-scripts/monitor/resource_alert.sh
EnvironmentFile=/opt/ops-scripts/config.env
StandardOutput=journal
StandardError=journal
SyslogIdentifier=ops-monitor
# systemd/ops-monitor.timer
[Unit]
Description=Run ops monitoring every 5 minutes
[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Persistent=true
RandomizedDelaySec=30
[Install]
WantedBy=timers.target
# 部署步骤
sudo cp /opt/ops-scripts/systemd/*.service /etc/systemd/system/
sudo cp /opt/ops-scripts/systemd/*.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now ops-monitor.timer
sudo systemctl enable --now ops-backup.timer
# 验证
systemctl list-timers --all | grep ops
journalctl -u ops-monitor.service -f
11. bats Automated Testing
bats (Bash Automated Testing System) is a unit testing framework for shell scripts. Each @test block is a test case; the run command executes the script under test and you then assert output and exit code.
#!/usr/bin/env bats
# tests/test_common.bats
load '../lib/common.sh'
setup() {
export LOG_FILE="$(mktemp)"
export LOCK_DIR="$(mktemp -d)"
}
teardown() {
rm -f "$LOG_FILE"
rm -rf "$LOCK_DIR"
}
@test "log_info writes timestamped INFO message" {
run log_info "hello world"
[ "$status" -eq 0 ]
grep -q "INFO" "$LOG_FILE"
grep -q "hello world" "$LOG_FILE"
}
@test "die exits with code 1 and logs error" {
run die "something went wrong"
[ "$status" -eq 1 ]
grep -q "ERROR" "$LOG_FILE"
grep -q "something went wrong" "$LOG_FILE"
}
@test "require_cmd succeeds for existing command" {
run require_cmd bash
[ "$status" -eq 0 ]
}
@test "require_cmd fails for nonexistent command" {
run require_cmd this_command_does_not_exist_xyz
[ "$status" -ne 0 ]
}
@test "lock_file prevents double locking" {
lock_file "testlock"
run bash -c "source lib/common.sh; lock_file testlock"
[ "$status" -ne 0 ]
[[ "$output" == *"Lock held"* ]]
}
#!/usr/bin/env bats
# tests/test_webhook.bats — 测试告警去重逻辑
setup() {
export LOG_FILE="$(mktemp)"
export LOCK_DIR="$(mktemp -d)"
export ALERT_COOLDOWN=300
export COOLDOWN_DIR="$(mktemp -d)"
# 禁用实际发送(mock 掉 webhook 变量)
unset DINGTALK_WEBHOOK FEISHU_WEBHOOK ALERT_EMAIL
}
teardown() {
rm -f "$LOG_FILE"
rm -rf "$LOCK_DIR" "$COOLDOWN_DIR"
}
@test "first alert goes through" {
run bash alert/webhook.sh "test alert message"
[ "$status" -eq 0 ]
# cooldown 文件应被创建
local hash
hash=$(printf '%s' "test alert message" | md5sum | cut -d' ' -f1)
[ -f "$COOLDOWN_DIR/$hash" ]
}
@test "duplicate alert within cooldown is suppressed" {
# 先发一次
bash alert/webhook.sh "dup message"
# 再次发送,应被抑制
run bash alert/webhook.sh "dup message"
[ "$status" -eq 0 ]
grep -q "suppressed" "$LOG_FILE"
}
GitHub Actions CI Integration
# .github/workflows/test.yml
name: Shell Tests
on: [push, pull_request]
jobs:
bats:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install bats
run: |
sudo apt-get install -y bats
bats --version
- name: Run bats tests
run: bats tests/
- name: Run shellcheck
run: |
sudo apt-get install -y shellcheck
shellcheck lib/*.sh monitor/*.sh alert/*.sh deploy/*.sh backup/*.sh
12. Book Summary and Next Steps
This book built five capability layers. The final project is a concrete demonstration of all five:
| Capability Layer | Chapters | Embodied In |
|---|---|---|
| Foundational Ops | Ch1–Ch4 | directory structure, text processing (awk/grep), log analysis |
| System Understanding | Ch5–Ch8 | process checks (pgrep), network probes (nc/curl), disk monitoring (df/rsync) |
| Shell Scripting | Ch9–Ch12 | function library, arrays, pipelines, set -euo pipefail, bats tests |
| Production Practice | Ch13–Ch16 | systemd service/timer, metric collection, encrypted backup, mutex locks |
| Kernel & Contribution | Ch17–Ch19 | understanding syscall paths, kernel-level debugging tools, ability to contribute upstream |
Recommended Next Steps
- Go Systems Programming — Shell excels at glue scripts; Go suits performance-sensitive tools. See: Systems Programming in Go
- eBPF Observability — BCC/bpftrace are modern Linux performance analysis tools — a natural extension of Chapter 14
- Ansible / Terraform — Elevate this chapter's script system into declarative infrastructure-as-code
- Kernel Contribution Practice — Follow Chapter 19's guidance: start with checkpatch fixes in drivers/staging and submit your first kernel patch
Congratulations on completing the book! Linux Shell is never merely a collection of command-line tools. It is a language for conversing with the operating system — a way of thinking that makes complex systems controllable, observable, and automatable. With the skills acquired in this book, you now have the foundation of a production-capable systems engineer.
Previous
← Ch19: Kernel Contribution
Back to Index
Book Index →