Chapter 20

Final Project

Chapter 20: Final Project โ€” Production-Grade DevOps Script System

This chapter synthesizes every skill from the book into a complete, production-grade operations scripting system. It covers service health monitoring, resource alerting, log analysis, automated deployment, backup and recovery, webhook notifications, systemd integration, and bats automated testing. Every line of code maps back to a specific earlier chapter. After finishing this chapter you will have a deployable ops infrastructure foundation.

1. Project Overview and Architecture

The core design principles of this ops scripting system: single responsibility (each script does one thing), observable (unified log format), idempotent (repeated execution has no side effects), testable (critical logic covered by bats tests).

Module Function Source Chapter
lib/common.sh Shared function library Ch9, Ch10, Ch11
monitor/health_check.sh HTTP/TCP/process monitoring Ch7, Ch5
monitor/resource_alert.sh CPU/memory/disk alerting Ch14, Ch8
monitor/log_analyzer.sh Log keyword scanning Ch4, Ch11
alert/webhook.sh DingTalk/Feishu notifications Ch7, Ch12
deploy/rolling_deploy.sh Rolling deploy and rollback Ch5, Ch12
backup/backup.sh rsync+encrypted backup Ch8, Ch15
systemd/ systemd service/timer units Ch13
tests/ bats automated tests Ch12

2. Project Directory Structure

ops-scripts/ โ”œโ”€โ”€ lib/ โ”‚ โ””โ”€โ”€ common.sh # shared function library โ”œโ”€โ”€ monitor/ โ”‚ โ”œโ”€โ”€ health_check.sh # HTTP/TCP/process checks โ”‚ โ”œโ”€โ”€ resource_alert.sh # resource threshold alerting โ”‚ โ””โ”€โ”€ log_analyzer.sh # log keyword analysis โ”œโ”€โ”€ alert/ โ”‚ โ””โ”€โ”€ webhook.sh # DingTalk/Feishu/email notify โ”œโ”€โ”€ deploy/ โ”‚ โ””โ”€โ”€ rolling_deploy.sh # rolling deploy and rollback โ”œโ”€โ”€ backup/ โ”‚ โ”œโ”€โ”€ backup.sh # rsync snapshot backup+encryption โ”‚ โ””โ”€โ”€ restore.sh # restore script โ”œโ”€โ”€ systemd/ โ”‚ โ”œโ”€โ”€ ops-monitor.service โ”‚ โ”œโ”€โ”€ ops-monitor.timer โ”‚ โ””โ”€โ”€ ops-backup.service โ”œโ”€โ”€ tests/ โ”‚ โ”œโ”€โ”€ test_common.bats โ”‚ โ”œโ”€โ”€ test_webhook.bats โ”‚ โ””โ”€โ”€ test_backup.bats โ”œโ”€โ”€ config.env # config file (not committed to git) โ””โ”€โ”€ config.env.example # config template

3. Shared Function Library (lib/common.sh)

Every script sources the common library via source "$(dirname "$0")/../lib/common.sh". The library provides unified log formatting, dependency checking, mutex locking, and HTTP helper functions.

#!/usr/bin/env bash
# lib/common.sh โ€” ๅ…ฌๅ…ฑๅ‡ฝๆ•ฐๅบ“ / shared function library
set -euo pipefail

# โ”€โ”€ ้ขœ่‰ฒๅธธ้‡๏ผˆไป… tty ๆ—ถๅฏ็”จ๏ผ‰โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
if [[ -t 2 ]]; then
  RED='\033[0;31m'; YELLOW='\033[1;33m'
  GREEN='\033[0;32m'; CYAN='\033[0;36m'
  NC='\033[0m'
else
  RED=''; YELLOW=''; GREEN=''; CYAN=''; NC=''
fi

# โ”€โ”€ ๆ—ฅๅฟ—ๅ‡ฝๆ•ฐ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
LOG_FILE="${LOG_FILE:-/var/log/ops-scripts/ops.log}"

_log() {
  local level="$1"; shift
  local ts
  ts=$(date '+%Y-%m-%dT%H:%M:%S%z')
  # ๅŒๆ—ถ่พ“ๅ‡บๅˆฐ stderr ๅ’Œๆ—ฅๅฟ—ๆ–‡ไปถ
  printf '%s [%s] %s\n' "$ts" "$level" "$*" | tee -a "$LOG_FILE" >&2
}

log_info()  { _log "INFO " "${CYAN}$*${NC}"; }
log_warn()  { _log "WARN " "${YELLOW}$*${NC}"; }
log_error() { _log "ERROR" "${RED}$*${NC}"; }

die() {
  log_error "$*"
  exit 1
}

# โ”€โ”€ ไพ่ต–ๆฃ€ๆŸฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
require_cmd() {
  local cmd
  for cmd in "$@"; do
    command -v "$cmd" &>/dev/null || die "Required command not found: $cmd"
  done
}

# โ”€โ”€ ไบ’ๆ–ฅ้” โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
LOCK_DIR="${LOCK_DIR:-/tmp/ops-locks}"
mkdir -p "$LOCK_DIR"

lock_file() {
  local name="$1"
  local lock="$LOCK_DIR/${name}.lock"
  if ! mkdir "$lock" 2>/dev/null; then
    local pid
    pid=$(cat "$lock/pid" 2>/dev/null || echo "unknown")
    die "Lock held by PID $pid: $lock"
  fi
  echo $$ > "$lock/pid"
  # ๆณจๅ†Œ้€€ๅ‡บๆ—ถ่‡ชๅŠจ้‡Šๆ”พ้”
  trap "unlock_file '$name'" EXIT INT TERM
}

unlock_file() {
  local name="$1"
  rm -rf "$LOCK_DIR/${name}.lock"
}

# โ”€โ”€ HTTP ๅทฅๅ…ท โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# http_post URL JSON_BODY โ€” ๅ‘้€ POST ่ฏทๆฑ‚๏ผŒ่ฟ”ๅ›ž HTTP ็Šถๆ€็ 
http_post() {
  local url="$1"
  local body="$2"
  curl -s -o /dev/null -w "%{http_code}" \
    -H 'Content-Type: application/json' \
    --data "$body" \
    --max-time 10 \
    "$url"
}

# โ”€โ”€ ้…็ฝฎๅŠ ่ฝฝ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
load_config() {
  local cfg="${1:-config.env}"
  [[ -f "$cfg" ]] || die "Config file not found: $cfg"
  # ๅฎ‰ๅ…จๅŠ ่ฝฝ๏ผšๅชๅ…่ฎธ KEY=VALUE ๆ ผๅผ๏ผŒ่ฟ‡ๆปคๆณจ้‡Šๅ’Œ็ฉบ่กŒ
  while IFS='=' read -r key value; do
    [[ "$key" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]] || continue
    export "$key"="$value"
  done 
  
## 4. Service Health Monitoring (monitor/health_check.sh)


  
The health check script supports three check types: HTTP status code check, TCP port reachability, and process liveness check. Configuration comes from `config.env`; failures call `alert/webhook.sh` to send alerts.


  
```bash
#!/usr/bin/env bash
# monitor/health_check.sh
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"

require_cmd curl nc pgrep

ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
FAIL_COUNT=0

# โ”€โ”€ HTTP ๆฃ€ๆŸฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
check_http() {
  local name="$1" url="$2" expected="${3:-200}"
  local code
  code=$(curl -sfo /dev/null -w "%{http_code}" \
    --max-time 10 --connect-timeout 5 "$url" 2>/dev/null || echo "000")

  if [[ "$code" == "$expected" ]]; then
    log_info "HTTP OK: $name ($url) => $code"
  else
    log_error "HTTP FAIL: $name ($url) expected=$expected got=$code"
    "$ALERT_SCRIPT" "HTTP check failed: $name returned $code (expected $expected)"
    (( FAIL_COUNT++ ))
  fi
}

# โ”€โ”€ TCP ๆฃ€ๆŸฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
check_tcp() {
  local name="$1" host="$2" port="$3"
  if nc -z -w 3 "$host" "$port" &>/dev/null; then
    log_info "TCP OK: $name ($host:$port)"
  else
    log_error "TCP FAIL: $name ($host:$port) unreachable"
    "$ALERT_SCRIPT" "TCP check failed: $name ($host:$port) is unreachable"
    (( FAIL_COUNT++ ))
  fi
}

# โ”€โ”€ ่ฟ›็จ‹ๆฃ€ๆŸฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
check_process() {
  local name="$1" proc="$2"
  if pgrep -x "$proc" &>/dev/null; then
    log_info "PROC OK: $name ($proc running)"
  else
    log_error "PROC FAIL: $name ($proc not found)"
    "$ALERT_SCRIPT" "Process check failed: $name ($proc) is not running"
    (( FAIL_COUNT++ ))
  fi
}

# โ”€โ”€ ่ฟ่กŒๆ‰€ๆœ‰ๆฃ€ๆŸฅ๏ผˆไปŽ config.env ่ฏปๅ–็›ฎๆ ‡ๅˆ—่กจ๏ผ‰โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# ๆ ผๅผ: HTTP_CHECKS="name,url,expected_code name2,url2,200"
IFS=' ' read -ra http_list 
  
## 5. Resource Alerting (monitor/resource_alert.sh)


  
```bash
#!/usr/bin/env bash
# monitor/resource_alert.sh โ€” CPU/ๅ†…ๅญ˜/็ฃ็›˜ๅ‘Š่ญฆ
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"

require_cmd awk df free

ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"

# ้ป˜่ฎค้˜ˆๅ€ผ๏ผˆๅฏๅœจ config.env ไธญ่ฆ†็›–๏ผ‰
CPU_THRESHOLD="${CPU_THRESHOLD:-85}"
MEM_THRESHOLD="${MEM_THRESHOLD:-90}"
DISK_THRESHOLD="${DISK_THRESHOLD:-85}"

# โ”€โ”€ CPU ไฝฟ็”จ็އ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
cpu_usage() {
  # ่ฏปๅ– /proc/stat ่ฎก็ฎ—ไธคๆฌกๅฟซ็…งไน‹้—ด็š„ CPU ๅ ็”จ็އ
  local cpu1 cpu2 idle1 idle2 total1 total2
  read -r _ cpu1 = DISK_THRESHOLD )); then
      log_warn "Disk usage ${usage}% on $mnt (threshold: ${DISK_THRESHOLD}%)"
      "$ALERT_SCRIPT" "Disk alert: ${usage}% used on $mnt"
    fi
  done
}

# โ”€โ”€ ๆ‰ง่กŒๆฃ€ๆŸฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
cpu=$(cpu_usage)
mem=$(mem_usage)

log_info "Resource snapshot โ€” CPU: ${cpu}%  MEM: ${mem}%"

if (( cpu >= CPU_THRESHOLD )); then
  log_warn "CPU usage ${cpu}% exceeds threshold ${CPU_THRESHOLD}%"
  "$ALERT_SCRIPT" "CPU alert: ${cpu}% (threshold: ${CPU_THRESHOLD}%)"
fi

if (( mem >= MEM_THRESHOLD )); then
  log_warn "Memory usage ${mem}% exceeds threshold ${MEM_THRESHOLD}%"
  "$ALERT_SCRIPT" "Memory alert: ${mem}% (threshold: ${MEM_THRESHOLD}%)"
fi

check_disk

6. Log Analysis (monitor/log_analyzer.sh)

#!/usr/bin/env bash
# monitor/log_analyzer.sh โ€” ๅ…ณ้”ฎๅญ—ๆ‰ซๆไธŽ้”™่ฏฏ้ข‘็އ็ปŸ่ฎก
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"

require_cmd grep awk tail

ALERT_SCRIPT="$SCRIPT_DIR/../alert/webhook.sh"
LOG_TARGET="${APP_LOG:-/var/log/app/app.log}"
ERROR_THRESHOLD="${ERROR_THRESHOLD:-10}"  # ๆฏๅˆ†้’Ÿ้”™่ฏฏๆ•ฐ้˜ˆๅ€ผ
WINDOW_SECONDS=60

# โ”€โ”€ ็ปŸ่ฎกๆœ€่ฟ‘ N ็ง’ๅ†…็š„้”™่ฏฏๆ•ฐ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
count_recent_errors() {
  local log="$1"
  local since
  since=$(date -d "${WINDOW_SECONDS} seconds ago" '+%Y-%m-%d %H:%M:%S' 2>/dev/null \
       || date -v -"${WINDOW_SECONDS}"S '+%Y-%m-%d %H:%M:%S')  # macOS ๅ…ผๅฎน

  # ็ปŸ่ฎกๅŒ…ๅซ ERROR ๆˆ– FATAL ็š„่กŒๆ•ฐ
  awk -v since="$since" '
    $0 >= since { count++ }
    /ERROR|FATAL|PANIC/ && $0 >= since { err++ }
    END { print err+0 }
  ' "$log"
}

# โ”€โ”€ ๅ…ณ้”ฎๅญ—ๅ‘Š่ญฆ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
scan_keywords() {
  local log="$1"
  local -a keywords=("OutOfMemoryError" "SIGSEGV" "disk full" "connection refused")
  local kw
  for kw in "${keywords[@]}"; do
    local cnt
    cnt=$(grep -c "$kw" "$log" 2>/dev/null || echo 0)
    if (( cnt > 0 )); then
      log_warn "Keyword '$kw' found $cnt time(s) in $log"
      "$ALERT_SCRIPT" "Log alert: '$kw' occurred $cnt time(s) in $(basename "$log")"
    fi
  done
}

# โ”€โ”€ ๅฎžๆ—ถ่ทŸ่ธชๆจกๅผ๏ผˆๅŽๅฐ่ฟ่กŒ๏ผ‰โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
follow_log() {
  local log="$1"
  log_info "Starting real-time log tail on $log"
  tail -F "$log" 2>/dev/null | grep --line-buffered -E 'ERROR|FATAL|PANIC' | \
  while IFS= read -r line; do
    log_error "Log event: $line"
    "$ALERT_SCRIPT" "Real-time log alert: $line"
  done
}

# โ”€โ”€ ไธปๆต็จ‹ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
if [[ ! -f "$LOG_TARGET" ]]; then
  log_warn "Log file not found: $LOG_TARGET"
  exit 0
fi

err_count=$(count_recent_errors "$LOG_TARGET")
log_info "Errors in last ${WINDOW_SECONDS}s: $err_count (threshold: $ERROR_THRESHOLD)"

if (( err_count >= ERROR_THRESHOLD )); then
  "$ALERT_SCRIPT" "Log alert: $err_count errors in ${WINDOW_SECONDS}s on $(hostname)"
fi

scan_keywords "$LOG_TARGET"

7. Webhook Alerting (alert/webhook.sh)

The alert script supports three channels: DingTalk robot, Feishu robot, and email. To prevent alert storms, a timestamp file lock ensures the same alert type fires at most once per ALERT_COOLDOWN seconds.

#!/usr/bin/env bash
# alert/webhook.sh โ€” ๅคšๆธ ้“ๅ‘Š่ญฆ๏ผˆ้’‰้’‰/้ฃžไนฆ/้‚ฎไปถ๏ผ‰+ ๅŽป้‡
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"

require_cmd curl

MESSAGE="${1:-Alert from $(hostname)}"
ALERT_COOLDOWN="${ALERT_COOLDOWN:-300}"  # 5 ๅˆ†้’Ÿๅ†ทๅดๆœŸ
COOLDOWN_DIR="/tmp/ops-alert-cooldown"
mkdir -p "$COOLDOWN_DIR"

# โ”€โ”€ ๅ‘Š่ญฆๅŽป้‡ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# ็”จๆถˆๆฏ็š„ MD5 ไฝœไธบๆ ‡่ฏ†๏ผˆ้ฟๅ…ๆ–‡ไปถๅ็‰นๆฎŠๅญ—็ฌฆ้—ฎ้ข˜๏ผ‰
msg_hash=$(printf '%s' "$MESSAGE" | md5sum | cut -d' ' -f1)
cooldown_file="$COOLDOWN_DIR/$msg_hash"

if [[ -f "$cooldown_file" ]]; then
  last=$(cat "$cooldown_file")
  now=$(date +%s)
  if (( now - last  "$cooldown_file"

# โ”€โ”€ ้’‰้’‰ๆœบๅ™จไบบ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
send_dingtalk() {
  [[ -z "${DINGTALK_WEBHOOK:-}" ]] && return 0
  local body
  body=$(printf '{"msgtype":"text","text":{"content":"[OPS ALERT] %s\nHost: %s\nTime: %s"}}' \
    "$MESSAGE" "$(hostname)" "$(date '+%Y-%m-%d %H:%M:%S')")
  local code
  code=$(http_post "$DINGTALK_WEBHOOK" "$body")
  if [[ "$code" == "200" ]]; then
    log_info "DingTalk alert sent"
  else
    log_warn "DingTalk alert failed (HTTP $code)"
  fi
}

# โ”€โ”€ ้ฃžไนฆๆœบๅ™จไบบ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
send_feishu() {
  [[ -z "${FEISHU_WEBHOOK:-}" ]] && return 0
  local body
  body=$(printf '{"msg_type":"text","content":{"text":"[OPS ALERT] %s\nHost: %s\nTime: %s"}}' \
    "$MESSAGE" "$(hostname)" "$(date '+%Y-%m-%d %H:%M:%S')")
  local code
  code=$(http_post "$FEISHU_WEBHOOK" "$body")
  if [[ "$code" == "200" ]]; then
    log_info "Feishu alert sent"
  else
    log_warn "Feishu alert failed (HTTP $code)"
  fi
}

# โ”€โ”€ ้‚ฎไปถๅ‘Š่ญฆ๏ผˆ้œ€้…็ฝฎ SMTP ๆˆ–ๆœฌๅœฐ sendmail๏ผ‰โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
send_email() {
  [[ -z "${ALERT_EMAIL:-}" ]] && return 0
  local subject="[OPS ALERT] $(hostname) - $(date '+%H:%M')"
  if command -v mail &>/dev/null; then
    echo "$MESSAGE" | mail -s "$subject" "$ALERT_EMAIL"
    log_info "Email alert sent to $ALERT_EMAIL"
  fi
}

# โ”€โ”€ ๅ‘้€ๆ‰€ๆœ‰ๆธ ้“ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
log_warn "Sending alert: $MESSAGE"
send_dingtalk
send_feishu
send_email

8. Automated Deployment (deploy/rolling_deploy.sh)

#!/usr/bin/env bash
# deploy/rolling_deploy.sh โ€” ๆปšๅŠจ้‡ๅฏ + ่‡ชๅŠจๅ›žๆปš
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"

require_cmd git systemctl curl

lock_file "rolling_deploy"

APP_DIR="${APP_DIR:-/opt/app}"
SERVICE_NAME="${SERVICE_NAME:-myapp}"
HEALTH_URL="${HEALTH_URL:-http://localhost:8080/health}"
INSTANCES="${INSTANCES:-instance1 instance2 instance3}"
ROLLBACK_ON_FAIL="${ROLLBACK_ON_FAIL:-true}"

# โ”€โ”€ ่ฎฐๅฝ•ๅฝ“ๅ‰ commit ็”จไบŽๅ›žๆปš โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
OLD_COMMIT=$(git -C "$APP_DIR" rev-parse HEAD)

# โ”€โ”€ ๆ‹‰ๅ–ๆœ€ๆ–ฐไปฃ็  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
log_info "Pulling latest code..."
git -C "$APP_DIR" pull --ff-only || die "git pull failed"

NEW_COMMIT=$(git -C "$APP_DIR" rev-parse HEAD)
log_info "Deploying $OLD_COMMIT -> $NEW_COMMIT"

# โ”€โ”€ ๆž„ๅปบ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
log_info "Building..."
make -C "$APP_DIR" -j"$(nproc)" || {
  log_error "Build failed; rolling back"
  git -C "$APP_DIR" checkout "$OLD_COMMIT"
  die "Build failed"
}

# โ”€โ”€ health_check ่พ…ๅŠฉๅ‡ฝๆ•ฐ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
wait_healthy() {
  local url="$1" retries=10 i=0
  while (( i 
  
## 9. Backup System (backup/backup.sh)


  
Backups use `rsync --link-dest` for snapshot-style incremental backups โ€” each snapshot looks like a full directory but only the changed files consume additional disk space. Optionally encrypts the backup with AES-256.


  
```bash
#!/usr/bin/env bash
# backup/backup.sh โ€” rsync ๅฟซ็…งๅค‡ไปฝ + AES-256 ๅŠ ๅฏ† + ่ฝฎ่ฝฌ
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/common.sh"
load_config "$SCRIPT_DIR/../config.env"

require_cmd rsync

BACKUP_SRC="${BACKUP_SRC:-/opt/app/data}"
BACKUP_DEST="${BACKUP_DEST:-/backup/app}"
KEEP_DAILY="${KEEP_DAILY:-7}"
KEEP_WEEKLY="${KEEP_WEEKLY:-4}"
ENCRYPT="${ENCRYPT_BACKUP:-false}"
PASSPHRASE="${BACKUP_PASSPHRASE:-}"

lock_file "backup"

DATE=$(date '+%Y-%m-%d_%H-%M-%S')
SNAPSHOT_DIR="$BACKUP_DEST/daily/$DATE"
LATEST_LINK="$BACKUP_DEST/latest"

mkdir -p "$BACKUP_DEST/daily"

# โ”€โ”€ rsync ๅฟซ็…งๅค‡ไปฝ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
log_info "Starting rsync snapshot: $BACKUP_SRC -> $SNAPSHOT_DIR"
rsync_opts=(
  -aAX                    # ๅฝ’ๆกฃ+ACL+ๆ‰ฉๅฑ•ๅฑžๆ€ง
  --delete                # ๅˆ ้™คๆบไธญๅทฒไธๅญ˜ๅœจ็š„ๆ–‡ไปถ
  --numeric-ids           # ไฟ็•™ๆ•ฐๅญ— UID/GID
  --info=progress2
)

# ๅฆ‚ๆžœๅญ˜ๅœจไธŠๆฌกๅค‡ไปฝ๏ผŒ็”จ --link-dest ่Š‚็œ็ฃ็›˜๏ผˆ็กฌ้“พๆŽฅๆœชๅ˜ๅŒ–็š„ๆ–‡ไปถ๏ผ‰
if [[ -d "$LATEST_LINK" ]]; then
  rsync_opts+=(--link-dest="$LATEST_LINK")
fi

rsync "${rsync_opts[@]}" "$BACKUP_SRC/" "$SNAPSHOT_DIR/" \
  || die "rsync failed"

# ๆ›ดๆ–ฐ latest ็ฌฆๅท้“พๆŽฅ
ln -sfn "$SNAPSHOT_DIR" "$LATEST_LINK"
log_info "Snapshot complete: $SNAPSHOT_DIR"

# โ”€โ”€ ๅฏ้€‰๏ผšAES-256 ๅŠ ๅฏ† โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
if [[ "$ENCRYPT" == "true" ]]; then
  [[ -z "$PASSPHRASE" ]] && die "BACKUP_PASSPHRASE must be set when ENCRYPT_BACKUP=true"
  require_cmd openssl tar
  log_info "Encrypting backup..."
  ARCHIVE="$BACKUP_DEST/daily/${DATE}.tar.gz.enc"
  tar -czf - -C "$BACKUP_DEST/daily" "$DATE" | \
    openssl enc -aes-256-cbc -pbkdf2 -iter 100000 \
      -pass "pass:$PASSPHRASE" -out "$ARCHIVE"
  # ๅŠ ๅฏ†ๆˆๅŠŸๅŽๅˆ ้™คๆ˜Žๆ–‡ๅฟซ็…ง
  rm -rf "$SNAPSHOT_DIR"
  log_info "Encrypted archive: $ARCHIVE"
fi

# โ”€โ”€ ไฟ็•™็ญ–็•ฅ๏ผšๅˆ ้™ค่ถ…ๆœŸ็š„ๆ—ฅๅค‡ไปฝ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
log_info "Rotating daily backups (keep last $KEEP_DAILY)..."
ls -1dt "$BACKUP_DEST"/daily/*/ 2>/dev/null | tail -n +"$((KEEP_DAILY+1))" | \
  xargs -r rm -rf

# ๅ‘จๅค‡ไปฝ๏ผšๆฏๅ‘จๆ—ฅไฟ็•™ไธ€ไปฝ๏ผˆๅˆคๆ–ญไปŠๅคฉๆ˜ฏๅฆไธบๅ‘จๆ—ฅ๏ผ‰
if [[ "$(date '+%u')" == "7" ]]; then
  mkdir -p "$BACKUP_DEST/weekly"
  cp -al "$LATEST_LINK" "$BACKUP_DEST/weekly/$(date '+%Y-W%V')" 2>/dev/null || true
  ls -1dt "$BACKUP_DEST"/weekly/*/ 2>/dev/null | tail -n +"$((KEEP_WEEKLY+1))" | \
    xargs -r rm -rf
  log_info "Weekly backup saved"
fi

log_info "Backup finished successfully"

10. systemd Integration

Use systemd timers instead of crontab for full logging integration, dependency management, and error recovery (see Chapter 13).

# systemd/ops-monitor.service
[Unit]
Description=Ops health check and resource alert
After=network-online.target

[Service]
Type=oneshot
User=opsuser
WorkingDirectory=/opt/ops-scripts
ExecStart=/opt/ops-scripts/monitor/health_check.sh
ExecStart=/opt/ops-scripts/monitor/resource_alert.sh
EnvironmentFile=/opt/ops-scripts/config.env
StandardOutput=journal
StandardError=journal
SyslogIdentifier=ops-monitor
# systemd/ops-monitor.timer
[Unit]
Description=Run ops monitoring every 5 minutes

[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Persistent=true
RandomizedDelaySec=30

[Install]
WantedBy=timers.target
# ้ƒจ็ฝฒๆญฅ้ชค
sudo cp /opt/ops-scripts/systemd/*.service /etc/systemd/system/
sudo cp /opt/ops-scripts/systemd/*.timer  /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now ops-monitor.timer
sudo systemctl enable --now ops-backup.timer

# ้ชŒ่ฏ
systemctl list-timers --all | grep ops
journalctl -u ops-monitor.service -f

11. bats Automated Testing

bats (Bash Automated Testing System) is a unit testing framework for shell scripts. Each @test block is a test case; the run command executes the script under test and you then assert output and exit code.

#!/usr/bin/env bats
# tests/test_common.bats

load '../lib/common.sh'

setup() {
  export LOG_FILE="$(mktemp)"
  export LOCK_DIR="$(mktemp -d)"
}

teardown() {
  rm -f "$LOG_FILE"
  rm -rf "$LOCK_DIR"
}

@test "log_info writes timestamped INFO message" {
  run log_info "hello world"
  [ "$status" -eq 0 ]
  grep -q "INFO" "$LOG_FILE"
  grep -q "hello world" "$LOG_FILE"
}

@test "die exits with code 1 and logs error" {
  run die "something went wrong"
  [ "$status" -eq 1 ]
  grep -q "ERROR" "$LOG_FILE"
  grep -q "something went wrong" "$LOG_FILE"
}

@test "require_cmd succeeds for existing command" {
  run require_cmd bash
  [ "$status" -eq 0 ]
}

@test "require_cmd fails for nonexistent command" {
  run require_cmd this_command_does_not_exist_xyz
  [ "$status" -ne 0 ]
}

@test "lock_file prevents double locking" {
  lock_file "testlock"
  run bash -c "source lib/common.sh; lock_file testlock"
  [ "$status" -ne 0 ]
  [[ "$output" == *"Lock held"* ]]
}
#!/usr/bin/env bats
# tests/test_webhook.bats โ€” ๆต‹่ฏ•ๅ‘Š่ญฆๅŽป้‡้€ป่พ‘

setup() {
  export LOG_FILE="$(mktemp)"
  export LOCK_DIR="$(mktemp -d)"
  export ALERT_COOLDOWN=300
  export COOLDOWN_DIR="$(mktemp -d)"
  # ็ฆ็”จๅฎž้™…ๅ‘้€๏ผˆmock ๆމ webhook ๅ˜้‡๏ผ‰
  unset DINGTALK_WEBHOOK FEISHU_WEBHOOK ALERT_EMAIL
}

teardown() {
  rm -f "$LOG_FILE"
  rm -rf "$LOCK_DIR" "$COOLDOWN_DIR"
}

@test "first alert goes through" {
  run bash alert/webhook.sh "test alert message"
  [ "$status" -eq 0 ]
  # cooldown ๆ–‡ไปถๅบ”่ขซๅˆ›ๅปบ
  local hash
  hash=$(printf '%s' "test alert message" | md5sum | cut -d' ' -f1)
  [ -f "$COOLDOWN_DIR/$hash" ]
}

@test "duplicate alert within cooldown is suppressed" {
  # ๅ…ˆๅ‘ไธ€ๆฌก
  bash alert/webhook.sh "dup message"
  # ๅ†ๆฌกๅ‘้€๏ผŒๅบ”่ขซๆŠ‘ๅˆถ
  run bash alert/webhook.sh "dup message"
  [ "$status" -eq 0 ]
  grep -q "suppressed" "$LOG_FILE"
}

GitHub Actions CI Integration

# .github/workflows/test.yml
name: Shell Tests

on: [push, pull_request]

jobs:
  bats:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install bats
        run: |
          sudo apt-get install -y bats
          bats --version

      - name: Run bats tests
        run: bats tests/

      - name: Run shellcheck
        run: |
          sudo apt-get install -y shellcheck
          shellcheck lib/*.sh monitor/*.sh alert/*.sh deploy/*.sh backup/*.sh

12. Book Summary and Next Steps

This book built five capability layers. The final project is a concrete demonstration of all five:

Capability Layer Chapters Embodied In
Foundational Ops Ch1โ€“Ch4 directory structure, text processing (awk/grep), log analysis
System Understanding Ch5โ€“Ch8 process checks (pgrep), network probes (nc/curl), disk monitoring (df/rsync)
Shell Scripting Ch9โ€“Ch12 function library, arrays, pipelines, set -euo pipefail, bats tests
Production Practice Ch13โ€“Ch16 systemd service/timer, metric collection, encrypted backup, mutex locks
Kernel & Contribution Ch17โ€“Ch19 understanding syscall paths, kernel-level debugging tools, ability to contribute upstream

Congratulations on completing the book! Linux Shell is never merely a collection of command-line tools. It is a language for conversing with the operating system โ€” a way of thinking that makes complex systems controllable, observable, and automatable. With the skills acquired in this book, you now have the foundation of a production-capable systems engineer.

  Previous
  โ† Ch19: Kernel Contribution


  Back to Index
  Book Index โ†’
Rate this chapter
4.8  / 5  (8 ratings)

๐Ÿ’ฌ Comments