Final Project
Chapter 20: Final Project โ Production-Grade DevOps Script System
This chapter synthesizes every skill from the book into a complete, production-grade operations scripting system. It covers service health monitoring, resource alerting, log analysis, automated deployment, backup and recovery, webhook notifications, systemd integration, and bats automated testing. Every line of code maps back to a specific earlier chapter. After finishing this chapter you will have a deployable ops infrastructure foundation.
1. Project Overview and Architecture
The core design principles of this ops scripting system: single responsibility (each script does one thing), observable (unified log format), idempotent (repeated execution has no side effects), testable (critical logic covered by bats tests).
| Module | Function | Source Chapter |
|---|---|---|
| lib/common.sh | Shared function library | Ch9, Ch10, Ch11 |
| monitor/health_check.sh | HTTP/TCP/process monitoring | Ch7, Ch5 |
| monitor/resource_alert.sh | CPU/memory/disk alerting | Ch14, Ch8 |
| monitor/log_analyzer.sh | Log keyword scanning | Ch4, Ch11 |
| alert/webhook.sh | DingTalk/Feishu notifications | Ch7, Ch12 |
| deploy/rolling_deploy.sh | Rolling deploy and rollback | Ch5, Ch12 |
| backup/backup.sh | rsync+encrypted backup | Ch8, Ch15 |
| systemd/ | systemd service/timer units | Ch13 |
| tests/ | bats automated tests | Ch12 |
2. Project Directory Structure
ops-scripts/ โโโ lib/ โ โโโ common.sh # shared function library โโโ monitor/ โ โโโ health_check.sh # HTTP/TCP/process checks โ โโโ resource_alert.sh # resource threshold alerting โ โโโ log_analyzer.sh # log keyword analysis โโโ alert/ โ โโโ webhook.sh # DingTalk/Feishu/email notify โโโ deploy/ โ โโโ rolling_deploy.sh # rolling deploy and rollback โโโ backup/ โ โโโ backup.sh # rsync snapshot backup+encryption โ โโโ restore.sh # restore script โโโ systemd/ โ โโโ ops-monitor.service โ โโโ ops-monitor.timer โ โโโ ops-backup.service โโโ tests/ โ โโโ test_common.bats โ โโโ test_webhook.bats โ โโโ test_backup.bats โโโ config.env # config file (not committed to git) โโโ config.env.example # config template
3. Shared Function Library (lib/common.sh)
Every script sources the common library via source "$(dirname "$0")/../lib/common.sh". The library provides unified log formatting, dependency checking, mutex locking, and HTTP helper functions.
#!/usr/bin/env bash
# lib/common.sh โ ๅ
ฌๅ
ฑๅฝๆฐๅบ / shared function library
set -euo pipefail
# โโ ้ข่ฒๅธธ้๏ผไป
tty ๆถๅฏ็จ๏ผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
if [[ -t 2 ]]; then
RED='\033[0;31m'; YELLOW='\033[1;33m'
GREEN='\033[0;32m'; CYAN='\033[0;36m'
NC='\033[0m'
else
RED=''; YELLOW=''; GREEN=''; CYAN=''; NC=''
fi
# โโ ๆฅๅฟๅฝๆฐ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
LOG_FILE="${LOG_FILE:-/var/log/ops-scripts/ops.log}"
_log() {
local level="$1"; shift
local ts
ts=$(date '+%Y-%m-%dT%H:%M:%S%z')
# ๅๆถ่พๅบๅฐ stderr ๅๆฅๅฟๆไปถ
printf '%s [%s] %s\n' "$ts" "$level" "$*" | tee -a "$LOG_FILE" >&2
}
log_info() { _log "INFO " "${CYAN}$*${NC}"; }
log_warn() { _log "WARN " "${YELLOW}$*${NC}"; }
log_error() { _log "ERROR" "${RED}$*${NC}"; }
die() {
log_error "$*"
exit 1
}
# โโ ไพ่ตๆฃๆฅ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
require_cmd() {
local cmd
for cmd in "$@"; do
command -v "$cmd" &>/dev/null || die "Required command not found: $cmd"
done
}
# โโ ไบๆฅ้ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
LOCK_DIR="${LOCK_DIR:-/tmp/ops-locks}"
mkdir -p "$LOCK_DIR"
lock_file() {
local name="$1"
local lock="$LOCK_DIR/${name}.lock"
if ! mkdir "$lock" 2>/dev/null; then
local pid
pid=$(cat "$lock/pid" 2>/dev/null || echo "unknown")
die "Lock held by PID $pid: $lock"
fi
echo $$ > "$lock/pid"
# ๆณจๅ้ๅบๆถ่ชๅจ้ๆพ้
trap "unlock_file '$name'" EXIT INT TERM
}
unlock_file() {
local name="$1"
rm -rf "$LOCK_DIR/${name}.lock"
}
# โโ HTTP ๅทฅๅ
ท โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# http_post URL JSON_BODY โ ๅ้ POST ่ฏทๆฑ๏ผ่ฟๅ HTTP ็ถๆ็
http_post() {
local url="$1"
local body="$2"
curl -s -o /dev/null -w "%{http_code}" \
-H 'Content-Type: application/json' \
--data "$body" \
--max-time 10 \
"$url"
}
# โโ ้
็ฝฎๅ ่ฝฝ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
load_config() {
local cfg="${1:-config.env}"
[[ -f "$cfg" ]] || die "Config file not found: $cfg"
# ๅฎๅ
จๅ ่ฝฝ๏ผๅชๅ
่ฎธ KEY=VALUE ๆ ผๅผ๏ผ่ฟๆปคๆณจ้ๅ็ฉบ่ก
while IFS='=' read -r key value; do
[[ "$key" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]] || continue
export "$key"="$value"
done
## 4. Service Health Monitoring (monitor/health_check.sh)
The health check script supports three check types: HTTP status code check, TCP port reachability, and process liveness check. Configuration comes from `config.env`; failures call `alert/webhook.sh` to send alerts.
```bash
#!/usr/bin/env bash
# monitor/health_check.sh
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd curl nc pgrep
ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
FAIL_COUNT=0
# โโ HTTP ๆฃๆฅ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
check_http() {
local name="$1" url="$2" expected="${3:-200}"
local code
code=$(curl -sfo /dev/null -w "%{http_code}" \
--max-time 10 --connect-timeout 5 "$url" 2>/dev/null || echo "000")
if [[ "$code" == "$expected" ]]; then
log_info "HTTP OK: $name ($url) => $code"
else
log_error "HTTP FAIL: $name ($url) expected=$expected got=$code"
"$ALERT_SCRIPT" "HTTP check failed: $name returned $code (expected $expected)"
(( FAIL_COUNT++ ))
fi
}
# โโ TCP ๆฃๆฅ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
check_tcp() {
local name="$1" host="$2" port="$3"
if nc -z -w 3 "$host" "$port" &>/dev/null; then
log_info "TCP OK: $name ($host:$port)"
else
log_error "TCP FAIL: $name ($host:$port) unreachable"
"$ALERT_SCRIPT" "TCP check failed: $name ($host:$port) is unreachable"
(( FAIL_COUNT++ ))
fi
}
# โโ ่ฟ็จๆฃๆฅ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
check_process() {
local name="$1" proc="$2"
if pgrep -x "$proc" &>/dev/null; then
log_info "PROC OK: $name ($proc running)"
else
log_error "PROC FAIL: $name ($proc not found)"
"$ALERT_SCRIPT" "Process check failed: $name ($proc) is not running"
(( FAIL_COUNT++ ))
fi
}
# โโ ่ฟ่กๆๆๆฃๆฅ๏ผไป config.env ่ฏปๅ็ฎๆ ๅ่กจ๏ผโโโโโโโโโโโโ
# ๆ ผๅผ: HTTP_CHECKS="name,url,expected_code name2,url2,200"
IFS=' ' read -ra http_list
## 5. Resource Alerting (monitor/resource_alert.sh)
```bash
#!/usr/bin/env bash
# monitor/resource_alert.sh โ CPU/ๅ
ๅญ/็ฃ็ๅ่ญฆ
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd awk df free
ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
# ้ป่ฎค้ๅผ๏ผๅฏๅจ config.env ไธญ่ฆ็๏ผ
CPU_THRESHOLD="${CPU_THRESHOLD:-85}"
MEM_THRESHOLD="${MEM_THRESHOLD:-90}"
DISK_THRESHOLD="${DISK_THRESHOLD:-85}"
# โโ CPU ไฝฟ็จ็ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
cpu_usage() {
# ่ฏปๅ /proc/stat ่ฎก็ฎไธคๆฌกๅฟซ็
งไน้ด็ CPU ๅ ็จ็
local cpu1 cpu2 idle1 idle2 total1 total2
read -r _ cpu1 = DISK_THRESHOLD )); then
log_warn "Disk usage ${usage}% on $mnt (threshold: ${DISK_THRESHOLD}%)"
"$ALERT_SCRIPT" "Disk alert: ${usage}% used on $mnt"
fi
done
}
# โโ ๆง่กๆฃๆฅ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
cpu=$(cpu_usage)
mem=$(mem_usage)
log_info "Resource snapshot โ CPU: ${cpu}% MEM: ${mem}%"
if (( cpu >= CPU_THRESHOLD )); then
log_warn "CPU usage ${cpu}% exceeds threshold ${CPU_THRESHOLD}%"
"$ALERT_SCRIPT" "CPU alert: ${cpu}% (threshold: ${CPU_THRESHOLD}%)"
fi
if (( mem >= MEM_THRESHOLD )); then
log_warn "Memory usage ${mem}% exceeds threshold ${MEM_THRESHOLD}%"
"$ALERT_SCRIPT" "Memory alert: ${mem}% (threshold: ${MEM_THRESHOLD}%)"
fi
check_disk
6. Log Analysis (monitor/log_analyzer.sh)
#!/usr/bin/env bash
# monitor/log_analyzer.sh โ ๅ
ณ้ฎๅญๆซๆไธ้่ฏฏ้ข็็ป่ฎก
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd grep awk tail
ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
LOG_TARGET="${APP_LOG:-/var/log/app/app.log}"
ERROR_THRESHOLD="${ERROR_THRESHOLD:-10}" # ๆฏๅ้้่ฏฏๆฐ้ๅผ
WINDOW_SECONDS=60
# โโ ็ป่ฎกๆ่ฟ N ็งๅ
็้่ฏฏๆฐ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
count_recent_errors() {
local log="$1"
local since
since=$(date -d "${WINDOW_SECONDS} seconds ago" '+%Y-%m-%d %H:%M:%S' 2>/dev/null \
|| date -v -"${WINDOW_SECONDS}"S '+%Y-%m-%d %H:%M:%S') # macOS ๅ
ผๅฎน
# ็ป่ฎกๅ
ๅซ ERROR ๆ FATAL ็่กๆฐ
awk -v since="$since" '
$0 >= since { count++ }
/ERROR|FATAL|PANIC/ && $0 >= since { err++ }
END { print err+0 }
' "$log"
}
# โโ ๅ
ณ้ฎๅญๅ่ญฆ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
scan_keywords() {
local log="$1"
local -a keywords=("OutOfMemoryError" "SIGSEGV" "disk full" "connection refused")
local kw
for kw in "${keywords[@]}"; do
local cnt
cnt=$(grep -c "$kw" "$log" 2>/dev/null || echo 0)
if (( cnt > 0 )); then
log_warn "Keyword '$kw' found $cnt time(s) in $log"
"$ALERT_SCRIPT" "Log alert: '$kw' occurred $cnt time(s) in $(basename "$log")"
fi
done
}
# โโ ๅฎๆถ่ท่ธชๆจกๅผ๏ผๅๅฐ่ฟ่ก๏ผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
follow_log() {
local log="$1"
log_info "Starting real-time log tail on $log"
tail -F "$log" 2>/dev/null | grep --line-buffered -E 'ERROR|FATAL|PANIC' | \
while IFS= read -r line; do
log_error "Log event: $line"
"$ALERT_SCRIPT" "Real-time log alert: $line"
done
}
# โโ ไธปๆต็จ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
if [[ ! -f "$LOG_TARGET" ]]; then
log_warn "Log file not found: $LOG_TARGET"
exit 0
fi
err_count=$(count_recent_errors "$LOG_TARGET")
log_info "Errors in last ${WINDOW_SECONDS}s: $err_count (threshold: $ERROR_THRESHOLD)"
if (( err_count >= ERROR_THRESHOLD )); then
"$ALERT_SCRIPT" "Log alert: $err_count errors in ${WINDOW_SECONDS}s on $(hostname)"
fi
scan_keywords "$LOG_TARGET"
7. Webhook Alerting (alert/webhook.sh)
The alert script supports three channels: DingTalk robot, Feishu robot, and email. To prevent alert storms, a timestamp file lock ensures the same alert type fires at most once per ALERT_COOLDOWN seconds.
#!/usr/bin/env bash
# alert/webhook.sh โ ๅคๆธ ้ๅ่ญฆ๏ผ้้/้ฃไนฆ/้ฎไปถ๏ผ+ ๅป้
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd curl
MESSAGE="${1:-Alert from $(hostname)}"
ALERT_COOLDOWN="${ALERT_COOLDOWN:-300}" # 5 ๅ้ๅทๅดๆ
COOLDOWN_DIR="/tmp/ops-alert-cooldown"
mkdir -p "$COOLDOWN_DIR"
# โโ ๅ่ญฆๅป้ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# ็จๆถๆฏ็ MD5 ไฝไธบๆ ่ฏ๏ผ้ฟๅ
ๆไปถๅ็นๆฎๅญ็ฌฆ้ฎ้ข๏ผ
msg_hash=$(printf '%s' "$MESSAGE" | md5sum | cut -d' ' -f1)
cooldown_file="$COOLDOWN_DIR/$msg_hash"
if [[ -f "$cooldown_file" ]]; then
last=$(cat "$cooldown_file")
now=$(date +%s)
if (( now - last "$cooldown_file"
# โโ ้้ๆบๅจไบบ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
send_dingtalk() {
[[ -z "${DINGTALK_WEBHOOK:-}" ]] && return 0
local body
body=$(printf '{"msgtype":"text","text":{"content":"[OPS ALERT] %s\nHost: %s\nTime: %s"}}' \
"$MESSAGE" "$(hostname)" "$(date '+%Y-%m-%d %H:%M:%S')")
local code
code=$(http_post "$DINGTALK_WEBHOOK" "$body")
if [[ "$code" == "200" ]]; then
log_info "DingTalk alert sent"
else
log_warn "DingTalk alert failed (HTTP $code)"
fi
}
# โโ ้ฃไนฆๆบๅจไบบ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
send_feishu() {
[[ -z "${FEISHU_WEBHOOK:-}" ]] && return 0
local body
body=$(printf '{"msg_type":"text","content":{"text":"[OPS ALERT] %s\nHost: %s\nTime: %s"}}' \
"$MESSAGE" "$(hostname)" "$(date '+%Y-%m-%d %H:%M:%S')")
local code
code=$(http_post "$FEISHU_WEBHOOK" "$body")
if [[ "$code" == "200" ]]; then
log_info "Feishu alert sent"
else
log_warn "Feishu alert failed (HTTP $code)"
fi
}
# โโ ้ฎไปถๅ่ญฆ๏ผ้้
็ฝฎ SMTP ๆๆฌๅฐ sendmail๏ผโโโโโโโโโโโโโโ
send_email() {
[[ -z "${ALERT_EMAIL:-}" ]] && return 0
local subject="[OPS ALERT] $(hostname) - $(date '+%H:%M')"
if command -v mail &>/dev/null; then
echo "$MESSAGE" | mail -s "$subject" "$ALERT_EMAIL"
log_info "Email alert sent to $ALERT_EMAIL"
fi
}
# โโ ๅ้ๆๆๆธ ้ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
log_warn "Sending alert: $MESSAGE"
send_dingtalk
send_feishu
send_email
8. Automated Deployment (deploy/rolling_deploy.sh)
#!/usr/bin/env bash
# deploy/rolling_deploy.sh โ ๆปๅจ้ๅฏ + ่ชๅจๅๆป
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd git systemctl curl
lock_file "rolling_deploy"
APP_DIR="${APP_DIR:-/opt/app}"
SERVICE_NAME="${SERVICE_NAME:-myapp}"
HEALTH_URL="${HEALTH_URL:-http://localhost:8080/health}"
INSTANCES="${INSTANCES:-instance1 instance2 instance3}"
ROLLBACK_ON_FAIL="${ROLLBACK_ON_FAIL:-true}"
# โโ ่ฎฐๅฝๅฝๅ commit ็จไบๅๆป โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
OLD_COMMIT=$(git -C "$APP_DIR" rev-parse HEAD)
# โโ ๆๅๆๆฐไปฃ็ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
log_info "Pulling latest code..."
git -C "$APP_DIR" pull --ff-only || die "git pull failed"
NEW_COMMIT=$(git -C "$APP_DIR" rev-parse HEAD)
log_info "Deploying $OLD_COMMIT -> $NEW_COMMIT"
# โโ ๆๅปบ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
log_info "Building..."
make -C "$APP_DIR" -j"$(nproc)" || {
log_error "Build failed; rolling back"
git -C "$APP_DIR" checkout "$OLD_COMMIT"
die "Build failed"
}
# โโ health_check ่พ
ๅฉๅฝๆฐ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
wait_healthy() {
local url="$1" retries=10 i=0
while (( i
## 9. Backup System (backup/backup.sh)
Backups use `rsync --link-dest` for snapshot-style incremental backups โ each snapshot looks like a full directory but only the changed files consume additional disk space. Optionally encrypts the backup with AES-256.
```bash
#!/usr/bin/env bash
# backup/backup.sh โ rsync ๅฟซ็
งๅคไปฝ + AES-256 ๅ ๅฏ + ่ฝฎ่ฝฌ
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"
require_cmd rsync
BACKUP_SRC="${BACKUP_SRC:-/opt/app/data}"
BACKUP_DEST="${BACKUP_DEST:-/backup/app}"
KEEP_DAILY="${KEEP_DAILY:-7}"
KEEP_WEEKLY="${KEEP_WEEKLY:-4}"
ENCRYPT="${ENCRYPT_BACKUP:-false}"
PASSPHRASE="${BACKUP_PASSPHRASE:-}"
lock_file "backup"
DATE=$(date '+%Y-%m-%d_%H-%M-%S')
SNAPSHOT_DIR="$BACKUP_DEST/daily/$DATE"
LATEST_LINK="$BACKUP_DEST/latest"
mkdir -p "$BACKUP_DEST/daily"
# โโ rsync ๅฟซ็
งๅคไปฝ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
log_info "Starting rsync snapshot: $BACKUP_SRC -> $SNAPSHOT_DIR"
rsync_opts=(
-aAX # ๅฝๆกฃ+ACL+ๆฉๅฑๅฑๆง
--delete # ๅ ้คๆบไธญๅทฒไธๅญๅจ็ๆไปถ
--numeric-ids # ไฟ็ๆฐๅญ UID/GID
--info=progress2
)
# ๅฆๆๅญๅจไธๆฌกๅคไปฝ๏ผ็จ --link-dest ่็็ฃ็๏ผ็กฌ้พๆฅๆชๅๅ็ๆไปถ๏ผ
if [[ -d "$LATEST_LINK" ]]; then
rsync_opts+=(--link-dest="$LATEST_LINK")
fi
rsync "${rsync_opts[@]}" "$BACKUP_SRC/" "$SNAPSHOT_DIR/" \
|| die "rsync failed"
# ๆดๆฐ latest ็ฌฆๅท้พๆฅ
ln -sfn "$SNAPSHOT_DIR" "$LATEST_LINK"
log_info "Snapshot complete: $SNAPSHOT_DIR"
# โโ ๅฏ้๏ผAES-256 ๅ ๅฏ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
if [[ "$ENCRYPT" == "true" ]]; then
[[ -z "$PASSPHRASE" ]] && die "BACKUP_PASSPHRASE must be set when ENCRYPT_BACKUP=true"
require_cmd openssl tar
log_info "Encrypting backup..."
ARCHIVE="$BACKUP_DEST/daily/${DATE}.tar.gz.enc"
tar -czf - -C "$BACKUP_DEST/daily" "$DATE" | \
openssl enc -aes-256-cbc -pbkdf2 -iter 100000 \
-pass "pass:$PASSPHRASE" -out "$ARCHIVE"
# ๅ ๅฏๆๅๅๅ ้คๆๆๅฟซ็
ง
rm -rf "$SNAPSHOT_DIR"
log_info "Encrypted archive: $ARCHIVE"
fi
# โโ ไฟ็็ญ็ฅ๏ผๅ ้ค่ถ
ๆ็ๆฅๅคไปฝ โโโโโโโโโโโโโโโโโโโโโโโโโโโ
log_info "Rotating daily backups (keep last $KEEP_DAILY)..."
ls -1dt "$BACKUP_DEST"/daily/*/ 2>/dev/null | tail -n +"$((KEEP_DAILY+1))" | \
xargs -r rm -rf
# ๅจๅคไปฝ๏ผๆฏๅจๆฅไฟ็ไธไปฝ๏ผๅคๆญไปๅคฉๆฏๅฆไธบๅจๆฅ๏ผ
if [[ "$(date '+%u')" == "7" ]]; then
mkdir -p "$BACKUP_DEST/weekly"
cp -al "$LATEST_LINK" "$BACKUP_DEST/weekly/$(date '+%Y-W%V')" 2>/dev/null || true
ls -1dt "$BACKUP_DEST"/weekly/*/ 2>/dev/null | tail -n +"$((KEEP_WEEKLY+1))" | \
xargs -r rm -rf
log_info "Weekly backup saved"
fi
log_info "Backup finished successfully"
10. systemd Integration
Use systemd timers instead of crontab for full logging integration, dependency management, and error recovery (see Chapter 13).
# systemd/ops-monitor.service
[Unit]
Description=Ops health check and resource alert
After=network-online.target
[Service]
Type=oneshot
User=opsuser
WorkingDirectory=/opt/ops-scripts
ExecStart=/opt/ops-scripts/monitor/health_check.sh
ExecStart=/opt/ops-scripts/monitor/resource_alert.sh
EnvironmentFile=/opt/ops-scripts/config.env
StandardOutput=journal
StandardError=journal
SyslogIdentifier=ops-monitor
# systemd/ops-monitor.timer
[Unit]
Description=Run ops monitoring every 5 minutes
[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Persistent=true
RandomizedDelaySec=30
[Install]
WantedBy=timers.target
# ้จ็ฝฒๆญฅ้ชค
sudo cp /opt/ops-scripts/systemd/*.service /etc/systemd/system/
sudo cp /opt/ops-scripts/systemd/*.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now ops-monitor.timer
sudo systemctl enable --now ops-backup.timer
# ้ช่ฏ
systemctl list-timers --all | grep ops
journalctl -u ops-monitor.service -f
11. bats Automated Testing
bats (Bash Automated Testing System) is a unit testing framework for shell scripts. Each @test block is a test case; the run command executes the script under test and you then assert output and exit code.
#!/usr/bin/env bats
# tests/test_common.bats
load '../lib/common.sh'
setup() {
export LOG_FILE="$(mktemp)"
export LOCK_DIR="$(mktemp -d)"
}
teardown() {
rm -f "$LOG_FILE"
rm -rf "$LOCK_DIR"
}
@test "log_info writes timestamped INFO message" {
run log_info "hello world"
[ "$status" -eq 0 ]
grep -q "INFO" "$LOG_FILE"
grep -q "hello world" "$LOG_FILE"
}
@test "die exits with code 1 and logs error" {
run die "something went wrong"
[ "$status" -eq 1 ]
grep -q "ERROR" "$LOG_FILE"
grep -q "something went wrong" "$LOG_FILE"
}
@test "require_cmd succeeds for existing command" {
run require_cmd bash
[ "$status" -eq 0 ]
}
@test "require_cmd fails for nonexistent command" {
run require_cmd this_command_does_not_exist_xyz
[ "$status" -ne 0 ]
}
@test "lock_file prevents double locking" {
lock_file "testlock"
run bash -c "source lib/common.sh; lock_file testlock"
[ "$status" -ne 0 ]
[[ "$output" == *"Lock held"* ]]
}
#!/usr/bin/env bats
# tests/test_webhook.bats โ ๆต่ฏๅ่ญฆๅป้้ป่พ
setup() {
export LOG_FILE="$(mktemp)"
export LOCK_DIR="$(mktemp -d)"
export ALERT_COOLDOWN=300
export COOLDOWN_DIR="$(mktemp -d)"
# ็ฆ็จๅฎ้
ๅ้๏ผmock ๆ webhook ๅ้๏ผ
unset DINGTALK_WEBHOOK FEISHU_WEBHOOK ALERT_EMAIL
}
teardown() {
rm -f "$LOG_FILE"
rm -rf "$LOCK_DIR" "$COOLDOWN_DIR"
}
@test "first alert goes through" {
run bash alert/webhook.sh "test alert message"
[ "$status" -eq 0 ]
# cooldown ๆไปถๅบ่ขซๅๅปบ
local hash
hash=$(printf '%s' "test alert message" | md5sum | cut -d' ' -f1)
[ -f "$COOLDOWN_DIR/$hash" ]
}
@test "duplicate alert within cooldown is suppressed" {
# ๅ
ๅไธๆฌก
bash alert/webhook.sh "dup message"
# ๅๆฌกๅ้๏ผๅบ่ขซๆๅถ
run bash alert/webhook.sh "dup message"
[ "$status" -eq 0 ]
grep -q "suppressed" "$LOG_FILE"
}
GitHub Actions CI Integration
# .github/workflows/test.yml
name: Shell Tests
on: [push, pull_request]
jobs:
bats:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install bats
run: |
sudo apt-get install -y bats
bats --version
- name: Run bats tests
run: bats tests/
- name: Run shellcheck
run: |
sudo apt-get install -y shellcheck
shellcheck lib/*.sh monitor/*.sh alert/*.sh deploy/*.sh backup/*.sh
12. Book Summary and Next Steps
This book built five capability layers. The final project is a concrete demonstration of all five:
| Capability Layer | Chapters | Embodied In |
|---|---|---|
| Foundational Ops | Ch1โCh4 | directory structure, text processing (awk/grep), log analysis |
| System Understanding | Ch5โCh8 | process checks (pgrep), network probes (nc/curl), disk monitoring (df/rsync) |
| Shell Scripting | Ch9โCh12 | function library, arrays, pipelines, set -euo pipefail, bats tests |
| Production Practice | Ch13โCh16 | systemd service/timer, metric collection, encrypted backup, mutex locks |
| Kernel & Contribution | Ch17โCh19 | understanding syscall paths, kernel-level debugging tools, ability to contribute upstream |
Recommended Next Steps
- Go Systems Programming โ Shell excels at glue scripts; Go suits performance-sensitive tools. See: Systems Programming in Go
- eBPF Observability โ BCC/bpftrace are modern Linux performance analysis tools โ a natural extension of Chapter 14
- Ansible / Terraform โ Elevate this chapter's script system into declarative infrastructure-as-code
- Kernel Contribution Practice โ Follow Chapter 19's guidance: start with checkpatch fixes in drivers/staging and submit your first kernel patch
Congratulations on completing the book! Linux Shell is never merely a collection of command-line tools. It is a language for conversing with the operating system โ a way of thinking that makes complex systems controllable, observable, and automatable. With the skills acquired in this book, you now have the foundation of a production-capable systems engineer.
Previous
โ Ch19: Kernel Contribution
Back to Index
Book Index โ