Chapter 3

File Operations Mastery

Algorithm Extension Ratio Speed Best For

  gzip
  .tar.gz / .tgz
  Medium
  Fast
  Daily use, best compatibility


  bzip2
  .tar.bz2
  High
  Slow
  Source releases (legacy)


  xz
  .tar.xz
  Highest
  Very slow
  Kernel/package releases, storage-first


  zstd
  .tar.zst
  High
  Very fast
  Modern default: best speed/ratio balance

**ๆŽจ่๏ผš**ๆ–ฐ้กน็›ฎไผ˜ๅ…ˆ้€‰็”จ zstdใ€‚Facebook ๅผ€ๅ‘็š„ zstd ๅœจๅŽ‹็ผฉ็އๆŽฅ่ฟ‘ xz ็š„ๅŒๆ—ถ๏ผŒ่งฃๅŽ‹้€ŸๅบฆไธŽ gzip ็›ธๅฝ“ใ€‚Linux ๅ†…ๆ ธ 5.9+ ๅทฒๅฐ† zstd ไฝœไธบๅ†…ๆ ธ้•œๅƒ็š„้ป˜่ฎคๅŽ‹็ผฉๆ ผๅผใ€‚

rsync๏ผšๅขž้‡ๅŒๆญฅไธŽ่ฟœ็จ‹ๅค‡ไปฝ

rsync ๆ˜ฏ็”Ÿไบง็Žฏๅขƒๅค‡ไปฝ็š„ๆ ‡ๅ‡†ๅทฅๅ…ทใ€‚ๅฎƒ้€š่ฟ‡ๆฏ”่พƒไธค็ซฏๆ–‡ไปถ็š„ๆ ก้ชŒๅ’ŒไธŽๆ—ถ้—ดๆˆณ๏ผŒๅชไผ ่พ“ๅทฎๅผ‚้ƒจๅˆ†๏ผŒๆžๅคงๅ‡ๅฐ‘็ฝ‘็ปœๅผ€้”€ใ€‚

ๆ ธๅฟƒ้€‰้กน

# ๆœฌๅœฐ็›ฎๅฝ•ๅŒๆญฅ๏ผˆๆณจๆ„๏ผšsrc/ ๆœซๅฐพๆ–œๆ ่กจ็คบ"็›ฎๅฝ•ๅ†…ๅฎน"๏ผŒๆ— ๆ–œๆ ่กจ็คบ"็›ฎๅฝ•ๆœฌ่บซ"๏ผ‰
rsync -avz /var/www/html/ /backup/html/

# ๅ…ˆๆจกๆ‹Ÿ๏ผŒ็กฎ่ฎคๆ— ่ฏฏๅ†ๅฎž้™…ๆ‰ง่กŒ
rsync -avz --dry-run /var/www/ /backup/www/
rsync -avz /var/www/ /backup/www/

# ้•œๅƒๅŒๆญฅ๏ผˆ็›ฎๆ ‡็ซฏๅคšไฝ™็š„ๆ–‡ไปถไผš่ขซๅˆ ้™ค๏ผ‰
rsync -avz --delete /var/www/ /backup/www/

# ่ฟœ็จ‹ๅŒๆญฅ๏ผˆrsync over SSH๏ผ‰
rsync -avz -e ssh /local/data/ user@remote:/backup/data/

# ๆŒ‡ๅฎš SSH ็ซฏๅฃ
rsync -avz -e "ssh -p 2222" /local/ user@host:/remote/

# ้™้€Ÿไผ ่พ“๏ผˆ1MB/s๏ผ‰๏ผŒ้ฟๅ…ๅ ๆปกๅธฆๅฎฝ
rsync -avz --bwlimit=1024 /data/ user@host:/data/

# ๆŽ’้™คๅคšไธช็›ฎๅฝ•
rsync -avz \
  --exclude='*.log' \
  --exclude='cache/' \
  --exclude='.git/' \
  /var/www/ /backup/www/

# ๅขž้‡ๅค‡ไปฝ๏ผšๅชๅŒๆญฅๆœ€่ฟ‘ 1 ๅคฉๅ†…ไฟฎๆ”น็š„ๆ–‡ไปถ
rsync -avz --filter="m-1440" /data/ /backup/

็”Ÿไบงๅค‡ไปฝ่„šๆœฌ็คบไพ‹

#!/bin/bash
# backup.sh โ€” ๆฏๆ—ฅๅขž้‡ๅค‡ไปฝ่„šๆœฌ

set -euo pipefail

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
SRC="/var/www"
BACKUP_ROOT="/mnt/backup"
DEST="${BACKUP_ROOT}/${TIMESTAMP}"
LATEST="${BACKUP_ROOT}/latest"
LOG="/var/log/backup.log"

echo "[${TIMESTAMP}] Starting backup..." | tee -a "$LOG"

# ไฝฟ็”จ --link-dest ๅฎž็Žฐๅฟซ็…งๅผๅขž้‡ๅค‡ไปฝ๏ผˆๅชๆœ‰ๅ˜ๅŒ–็š„ๆ–‡ไปถๆ‰ๅ ็”จๆ–ฐ็ฉบ้—ด๏ผ‰
rsync -avz --delete \
  --link-dest="${LATEST}" \
  --exclude='*.tmp' \
  --exclude='cache/' \
  "${SRC}/" "${DEST}/" 2>&1 | tee -a "$LOG"

# ๆ›ดๆ–ฐ latest ่ฝฏ้“พๆŽฅ
rm -f "${LATEST}"
ln -s "${DEST}" "${LATEST}"

echo "[$(date +%Y%m%d_%H%M%S)] Backup complete: ${DEST}" | tee -a "$LOG"

# ไฟ็•™ๆœ€่ฟ‘ 30 ๅคฉ็š„ๅค‡ไปฝ๏ผŒๅˆ ้™คๆ—ง็š„
find "${BACKUP_ROOT}" -maxdepth 1 -type d -name "20*" \
  -mtime +30 -exec rm -rf {} +

**--link-dest ็š„ๅŽŸ็†๏ผš**rsync ไผšๅฐ†็›ฎๆ ‡็›ฎๅฝ•ไธญไธŽไธŠๆฌกๅค‡ไปฝ็›ธๅŒ็š„ๆ–‡ไปถๅˆ›ๅปบ็กฌ้“พๆŽฅ๏ผŒ่€Œไธๆ˜ฏๅคๅˆถใ€‚่ฟ™ๆ ทๆฏๆฌกๅค‡ไปฝ้ƒฝๆ˜ฏ"ๅฎŒๆ•ดๅฟซ็…ง"๏ผŒไฝ†ๅฎž้™…ๅชๅ ็”จๅทฎๅผ‚้ƒจๅˆ†็š„็ฃ็›˜็ฉบ้—ดใ€‚30 ๅคฉ็š„ๆฏๆ—ฅๅค‡ไปฝๅฏ่ƒฝๅชๅ ็”จ 2-3 ๅ€ๅญ˜ๅ‚จ็ฉบ้—ดใ€‚

xargs๏ผšๆ‰น้‡ๅค„็†็š„ๅˆฉๅ™จ

xargs ๅฐ†ๆ ‡ๅ‡†่พ“ๅ…ฅ็š„ๅ†…ๅฎน่ฝฌๅŒ–ไธบๅ‘ฝไปคๅ‚ๆ•ฐ๏ผŒๅผฅ่กฅไบ† Linux ็ฎก้“ไธ่ƒฝ็›ดๆŽฅไผ ๅ‚็ป™ๅ‘ฝไปค็š„ๅฑ€้™ใ€‚

# ๅŸบๆœฌ็”จๆณ•๏ผšๅฐ† find ็ป“ๆžœไผ ็ป™ rm
find /tmp -name "*.tmp" | xargs rm -f

# -I{} ๅ ไฝ็ฌฆ๏ผšๅฐ†ๅ‚ๆ•ฐๆ’ๅ…ฅๅ‘ฝไปคไธญ้—ดไฝ็ฝฎ
find . -name "*.log" | xargs -I{} cp {} /backup/

# ๅค„็†ๅซ็ฉบๆ ผ็š„ๆ–‡ไปถๅ๏ผš้…ๅˆ find -print0
find . -name "*.txt" -print0 | xargs -0 wc -l

# -P ๅนถ่กŒๆ‰ง่กŒ๏ผˆ4 ไธช่ฟ›็จ‹ๅŒๆ—ถๅค„็†๏ผ‰
find . -name "*.jpg" -print0 | xargs -0 -P4 -I{} convert {} -resize 800x {}-resized.jpg

# -n ๆฏๆฌกไผ ๅ…ฅ็š„ๅ‚ๆ•ฐๆ•ฐ้‡๏ผˆๆฏๆฌกไผ  2 ไธช๏ผ‰
echo "a b c d e f" | xargs -n2 echo

# ็ป“ๅˆ grep ๆŸฅๆ‰พๅŒ…ๅซ็‰นๅฎšๅ†…ๅฎน็š„ๆ–‡ไปถๅนถ็ปŸ่ฎก
grep -rl "TODO" ./src | xargs wc -l

# ็”จ xargs ๅฎ‰ๅ…จๅˆ ้™คๅคง้‡ๆ–‡ไปถ๏ผˆ้ฟๅ…"argument list too long"๏ผ‰
find /var/log -name "*.gz" -mtime +90 | xargs -r rm -f

# ไบคไบ’ๅผ็กฎ่ฎค๏ผˆ-p ๆฏๆฌกๆ‰ง่กŒๅ‰่ฏข้—ฎ๏ผ‰
find . -name "*.bak" | xargs -p rm

**ๅซ็ฉบๆ ผๆ–‡ไปถๅๅฟ…้กป็”จ -print0 + -0๏ผš**้ป˜่ฎค xargs ไปฅ็ฉบ็™ฝๅญ—็ฌฆ๏ผˆ็ฉบๆ ผใ€ๆข่กŒ๏ผ‰ๅˆ†ๅ‰ฒๅ‚ๆ•ฐ๏ผŒๆ–‡ไปถๅๅซ็ฉบๆ ผๆ—ถไผšๅ‡บ้”™ใ€‚ๅง‹็ปˆ็”จ find -print0 | xargs -0 ็š„็ป„ๅˆๅค„็†็œŸๅฎžๆ–‡ไปถ่ทฏๅพ„ใ€‚

watch๏ผšๅฎžๆ—ถๅ‘จๆœŸ็›‘ๆŽง

watch ไปฅๅ›บๅฎš้—ด้š”้‡ๅคๆ‰ง่กŒๅ‘ฝไปคๅนถๅˆทๆ–ฐๅฑๅน•๏ผŒๆ˜ฏๅฎžๆ—ถ็›‘ๆŽง็ณป็ปŸ็Šถๆ€็š„็ฎ€ๅ•ๅˆฉๅ™จใ€‚

# ๆฏ 2 ็ง’ๅˆทๆ–ฐไธ€ๆฌก็ฃ็›˜ไฝฟ็”จๆƒ…ๅ†ต
watch -n 2 df -h

# ้ซ˜ไบฎๆ˜พ็คบๅ˜ๅŒ–ๅ†…ๅฎน๏ผˆ-d๏ผ‰
watch -d -n 1 'ss -tnp'

# ็›‘ๆŽง่ฟ›็จ‹๏ผˆ็ญ‰ๆ•ˆไบŽ็ฎ€ๅŒ–็‰ˆ top๏ผ‰
watch -n 1 'ps aux --sort=-%cpu | head -15'

# ็›‘ๆŽง็›ฎๅฝ•ๆ–‡ไปถๆ•ฐ้‡ๅ˜ๅŒ–
watch -n 5 'ls -l /var/spool/mail/ | wc -l'

# ็›‘ๆŽง nginx ่ฎฟ้—ฎๆ—ฅๅฟ—ๅฎžๆ—ถๆกๆ•ฐ
watch -n 2 'wc -l /var/log/nginx/access.log'

# ็›‘ๆŽง็ฝ‘็ปœ่ฟžๆŽฅ็Šถๆ€็ปŸ่ฎก
watch -n 2 'ss -s'

# ไธๆข่กŒ๏ผˆ--no-title ้š่—ๆ ‡้ข˜ๆ ๏ผ‰
watch --no-title -n 1 uptime

inotifywait๏ผšๆ–‡ไปถๅ˜ๅŠจๅฎžๆ—ถ็›‘ๅฌ

inotifywait ไฝฟ็”จ Linux ๅ†…ๆ ธ็š„ inotify ๆŽฅๅฃ็›‘ๅฌๆ–‡ไปถ็ณป็ปŸไบ‹ไปถ๏ผŒๅฏไปฅๅœจๆ–‡ไปถ่ขซๅˆ›ๅปบใ€ไฟฎๆ”นใ€ๅˆ ้™คๆ—ถ็ซ‹ๅณ่งฆๅ‘ๅ“ๅบ”๏ผŒๆ˜ฏ่‡ชๅŠจๅŒ–้ƒจ็ฝฒๅ’Œ้…็ฝฎ็ƒญ้‡่ฝฝ็š„ๅŸบ็ก€ๅทฅๅ…ทใ€‚

# ๅฎ‰่ฃ…
sudo apt install inotify-tools

# ๆŒ็ปญ็›‘ๅฌ็›ฎๅฝ•๏ผˆ-m ๆŒ็ปญ่ฟ่กŒ๏ผŒ-r ้€’ๅฝ’๏ผŒ-e ๆŒ‡ๅฎšไบ‹ไปถ๏ผ‰
inotifywait -m -r -e create,modify,delete /etc/nginx/

# ็›‘ๅฌ็‰นๅฎšไบ‹ไปถๅนถๆ ผๅผๅŒ–่พ“ๅ‡บ
inotifywait -m -r \
  --format '%T %w%f %e' \
  --timefmt '%Y-%m-%d %H:%M:%S' \
  -e create,modify,delete \
  /var/www/html/

# ่‡ชๅŠจ้‡่ฝฝ nginx ้…็ฝฎ๏ผšๆฃ€ๆต‹ๅˆฐ้…็ฝฎๆ–‡ไปถๅ˜ๅŒ–ๆ—ถ้‡่ฝฝ
inotifywait -m -e modify /etc/nginx/nginx.conf |
while read -r path action file; do
  echo "Config changed: $file ($action)"
  nginx -t && systemctl reload nginx
done

# ่‡ชๅŠจๅŒๆญฅ๏ผšๆฃ€ๆต‹ๅˆฐๆœฌๅœฐ็›ฎๅฝ•ๅ˜ๅŒ–ๆ—ถ่งฆๅ‘ rsync
inotifywait -m -r -e create,modify,delete /var/www/html/ |
while read -r dir event file; do
  echo "[$event] $dir$file"
  rsync -az /var/www/html/ user@remote:/var/www/html/
done

**inotify ็š„ๅ†…ๆ ธ้™ๅˆถ๏ผš**้ป˜่ฎคๆœ€ๅคง็›‘ๅฌๆ•ฐไธบ 8192๏ผˆ/proc/sys/fs/inotify/max_user_watches๏ผ‰ใ€‚็›‘ๅฌๅคงๅž‹ไปฃ็ ไป“ๅบ“ๆ—ถ้œ€่ฆ่ฐƒๅคง๏ผšecho 524288 | sudo tee /proc/sys/fs/inotify/max_user_watches๏ผŒๅนถๅ†™ๅ…ฅ /etc/sysctl.conf ๆฐธไน…็”Ÿๆ•ˆใ€‚

ๆ–‡ไปถๆŸฅๆ‰พๅทฅๅ…ทๅ…จ่งฃ

Linux ๆœ‰ๅคš็ง"ๆŸฅๆ‰พๅ‘ฝไปค"ๅทฅๅ…ท๏ผŒๅฎƒไปฌ็š„ๅทฅไฝœๅŽŸ็†ๅ’Œ็”จ้€”ๅ„ไธ็›ธๅŒ๏ผŒๅพˆๅคšไบบๆททๆท†ไบ†ๅฎƒไปฌใ€‚

Command Search Scope Speed Notes
which PATH ็Žฏๅขƒๅ˜้‡ Instant Finds executable path in PATH
type Shell built-in Instant Distinguishes builtins/aliases/functions
whereis Fixed path list Fast Finds binary, man page, source
locate Database index Very fast Requires updatedb, new files have delay
find Real-time filesystem scan Slow (large dirs) Most powerful, supports complex criteria
# which๏ผšๆ‰พๅฏๆ‰ง่กŒๆ–‡ไปถไฝ็ฝฎ
which python3
# /usr/bin/python3

# type๏ผšๅˆคๆ–ญๅ‘ฝไปค็ฑปๅž‹๏ผˆๅ†…ๅปบ/ๅˆซๅ/ๅ‡ฝๆ•ฐ/ๆ–‡ไปถ๏ผ‰
type ls
# ls is aliased to `ls --color=auto'
type cd
# cd is a shell builtin

# whereis๏ผšๅŒๆ—ถๆ‰พๅˆฐไบŒ่ฟ›ๅˆถใ€man ้กต้ขๅ’Œๆบ็ ไฝ็ฝฎ
whereis nginx
# nginx: /usr/sbin/nginx /etc/nginx /usr/share/man/man8/nginx.8.gz

# locate๏ผšๆ•ฐๆฎๅบ“ๆŸฅๆ‰พ๏ผŒๆžๅฟซ
sudo updatedb           # ๅ…ˆๆ›ดๆ–ฐๆ•ฐๆฎๅบ“
locate "*.conf" | grep nginx

# find๏ผš็ฒพ็กฎๅฎžๆ—ถๆœ็ดข
find /etc -name "*.conf" -mtime -7   # 7 ๅคฉๅ†…ไฟฎๆ”น่ฟ‡็š„้…็ฝฎๆ–‡ไปถ
find /var/log -size +100M            # ๅคงไบŽ 100MB ็š„ๆ—ฅๅฟ—ๆ–‡ไปถ
find . -perm /4000                   # ๆœ‰ SUID ไฝ็š„ๆ–‡ไปถ๏ผˆๅฎ‰ๅ…จๅฎก่ฎก๏ผ‰

ๆ–‡ๆœฌๅค„็†ๅŸบ็ก€ๅทฅๅ…ท

ไปฅไธ‹ๅทฅๅ…ทๆ˜ฏๆ–‡ๆœฌๅค„็†ๆตๆฐด็บฟ็š„ๅŸบ็ก€็ป„ไปถ๏ผŒไธŽ grep/awk/sed ้…ๅˆไฝฟ็”จๆ•ˆๆžœๆ›ดๅผบ๏ผˆ็ฌฌ4็ซ ่ฏฆ็ป†่ฎฒ่งฃ๏ผ‰ใ€‚

# wc๏ผš็ปŸ่ฎก่กŒๆ•ฐ/ๅ•่ฏๆ•ฐ/ๅญ—่Š‚ๆ•ฐ
wc -l access.log          # ่กŒๆ•ฐ
wc -w document.txt        # ๅ•่ฏๆ•ฐ
wc -c binary.dat          # ๅญ—่Š‚ๆ•ฐ

# sort๏ผšๆŽ’ๅบ
sort -n numbers.txt        # ๆŒ‰ๆ•ฐๅญ—ๆŽ’ๅบ
sort -rn numbers.txt       # ้€†ๅบๆ•ฐๅญ—ๆŽ’ๅบ
sort -k2 -t: /etc/passwd   # ๆŒ‰็ฌฌ2ๅญ—ๆฎตๆŽ’ๅบ๏ผŒๅˆ†้š”็ฌฆไธบ:
sort -u names.txt          # ๆŽ’ๅบๅนถๅŽป้‡

# uniq๏ผšๅŽป้‡๏ผˆ้œ€้…ๅˆ sort ไฝฟ็”จ๏ผ‰
sort access.log | uniq -c | sort -rn | head -20   # ็ปŸ่ฎกๆœ€้ข‘็น็š„่กŒ

# cut๏ผšๅญ—ๆฎตๅˆ‡ๅ‰ฒ
cut -d: -f1 /etc/passwd    # ๅ–ไปฅ:ๅˆ†ๅ‰ฒ็š„็ฌฌ1ๅˆ—๏ผˆ็”จๆˆทๅ๏ผ‰
cut -c1-10 file.txt        # ๅ–ๆฏ่กŒๅ‰10ไธชๅญ—็ฌฆ

# paste๏ผšๆจชๅ‘ๅˆๅนถๆ–‡ไปถ
paste file1.txt file2.txt  # ไธคๆ–‡ไปถๆŒ‰ๅˆ—ๅˆๅนถ๏ผŒTab ๅˆ†้š”
paste -d, file1.txt file2.txt  # ็”จ้€—ๅทๅˆ†้š”

# tee๏ผšๅŒๆ—ถ่พ“ๅ‡บๅˆฐๅฑๅน•ๅ’Œๆ–‡ไปถ
ls -la | tee listing.txt                    # ๆ˜พ็คบๅนถไฟๅญ˜
echo "start" | tee -a build.log             # ่ฟฝๅŠ ๆจกๅผ

{{else}}

Chapter 3: File Operations Mastery

Deep options and safe usage of cp/mv/rm, complete tar guide (gzip/bzip2/xz/zstd comparison), rsync incremental sync and remote backup, xargs batch processing, watch periodic monitoring, inotifywait real-time file change detection.

cp Deep Dive

cp is the most commonly used file copy command, but most users ignore its powerful options. Mastering these options lets you copy files more precisely and safely.

Core Options Explained

# Archive copy: preserve all attributes (great for server migrations)
cp -a /var/www/html/ /backup/html-20260425/

# Only copy files newer than destination (incremental)
cp -u /src/*.conf /dst/

# Auto-backup before overwriting, old file becomes .bak
cp --backup=numbered --suffix=.bak nginx.conf /etc/nginx/nginx.conf

# Copy directory following all symlinks (copies real files)
cp -rL /opt/app/ /backup/app/

# Copy directory preserving symlinks as-is
cp -ra /opt/app/ /backup/app/

-a vs -p difference: -p preserves only permissions, timestamps, and ownership. -a additionally preserves symlink structure (-d) and extended attributes (xattr). Use -a for backups, -p for everyday copies.

mv and Renaming

mv is atomic on the same filesystem (just modifies a directory entry, no data movement), but degrades to cp + rm across filesystems. Moving large files across disks saturates I/O and risks data loss if interrupted.

# Move file (same partition: instant; cross-partition: actual copy)
mv largefile.tar.gz /mnt/backup/

# Rename
mv old-name.txt new-name.txt

# Prompt before overwriting
mv -i source.txt target.txt

# Never overwrite existing destination
mv -n draft.txt production.txt

# Batch rename: change .jpeg to .jpg (requires rename tool)
rename 's/\.jpeg$/.jpg/' *.jpeg

# Batch rename with bash loop (no extra tools needed)
for f in *.log.1; do mv "$f" "${f%.1}.old"; done

Risk of cross-filesystem mv with large files: mv across disks equals cp + rm. If interrupted by a power cut or Ctrl-C, the source file is untouched but the destination is incomplete. For large cross-disk transfers, use rsync --remove-source-files and verify after completion.

rm Safety Practices

rm -rf is one of the most dangerous commands in Linux. No recycle bin, no confirmation, no undo.

Safer Alternatives

# Install trash-cli
sudo apt install trash-cli   # Debian/Ubuntu

# Safe delete (recoverable)
trash-put /tmp/old-logs/

# List trash contents
trash-list

# Restore a file
trash-restore

# Empty trash
trash-empty

# Confirm each file before deletion
rm -ri ./temp-dir/

# Remove only empty directories (including empty subdirs)
find . -type d -empty -delete

Production rule: Before rm -rf in a script, always print the path or use echo rm -rf to simulate. Never write rm -rf "$VAR/" without first validating VAR is non-empty โ€” an empty variable turns this into rm -rf /.

mkdir and Directory Operations

# Recursively create nested directories
mkdir -p /opt/app/{logs,conf,data,tmp}

# install -d: create directory and set permissions in one step
install -d -m 755 -o www-data -g www-data /var/www/html

# View directory tree
tree -L 2 /opt/app/

# Show file sizes
tree -sh /opt/app/

# Show only directories
tree -d /etc/

touch โ€” More Than Creating Files

touch isn't just for creating empty files. It can precisely control file timestamps, which matters for Makefiles and build systems.

# Create empty file
touch newfile.txt

# Create multiple files
touch file{01..10}.txt

# Update timestamps to now (file must exist; -c skips creation)
touch -c existing.txt

# Update only access time (atime)
touch -a logfile.txt

# Update only modification time (mtime)
touch -m config.conf

# Set a specific timestamp
touch -d "2026-01-01 00:00:00" archive.tar.gz

# Copy timestamps from one file to another
touch -r reference.txt target.txt

tar Complete Guide

tar (Tape ARchive) is the core Linux archiving tool. It doesn't compress by itself โ€” compression is done by external programs (gzip/bzip2/xz/zstd), invoked via the -z/-j/-J/--zstd flags.

Common Operations Quick Reference

# Create gzip archive (most common)
tar -czf archive.tar.gz /path/to/dir/

# Create with verbose output (list each file)
tar -czvf archive.tar.gz /path/to/dir/

# Extract to current directory
tar -xzf archive.tar.gz

# Extract to a specific directory
tar -xzf archive.tar.gz -C /opt/

# List archive contents without extracting
tar -tzf archive.tar.gz

# Extract a single file from the archive
tar -xzf archive.tar.gz path/inside/archive/file.conf

# Exclude specific directories and patterns
tar -czf backup.tar.gz /var/www/ \
    --exclude='/var/www/cache' \
    --exclude='*.log' \
    --exclude='.git'

# bzip2 compression (higher ratio, slower)
tar -cjf archive.tar.bz2 /path/to/dir/

# xz compression (best ratio, slowest)
tar -cJf archive.tar.xz /path/to/dir/

# zstd compression (modern: fast + good ratio)
tar --zstd -cf archive.tar.zst /path/to/dir/

# Incremental backup: only pack files newer than given date
tar -czf incremental.tar.gz \
    --newer-mtime="2026-04-01" /var/data/

Compression Algorithm Comparison

Algorithm Extension Ratio Speed Best For
gzip .tar.gz / .tgz Medium Fast Daily use, best compatibility
bzip2 .tar.bz2 High Slow Source releases (legacy)
xz .tar.xz Highest Very slow Kernel/package releases, storage-first
zstd .tar.zst High Very fast Modern default: best speed/ratio balance

Recommendation: Choose zstd for new projects. Developed by Facebook, zstd achieves near-xz compression ratios at gzip-comparable decompression speed. Linux kernel 5.9+ uses zstd as the default kernel image compression format.

rsync: Incremental Sync and Remote Backup

rsync is the standard tool for production backups. It compares checksums and timestamps on both ends, transferring only the differences โ€” drastically reducing network overhead.

Key Options

# Local directory sync (trailing slash on src = "contents of dir", no slash = "the dir itself")
rsync -avz /var/www/html/ /backup/html/

# Dry run first, then execute for real
rsync -avz --dry-run /var/www/ /backup/www/
rsync -avz /var/www/ /backup/www/

# Mirror sync (delete files at destination not in source)
rsync -avz --delete /var/www/ /backup/www/

# Remote sync over SSH
rsync -avz -e ssh /local/data/ user@remote:/backup/data/

# Custom SSH port
rsync -avz -e "ssh -p 2222" /local/ user@host:/remote/

# Rate-limited transfer (1 MB/s) to avoid saturating bandwidth
rsync -avz --bwlimit=1024 /data/ user@host:/data/

# Exclude multiple patterns
rsync -avz \
  --exclude='*.log' \
  --exclude='cache/' \
  --exclude='.git/' \
  /var/www/ /backup/www/

Production Backup Script Example

#!/bin/bash
# backup.sh โ€” daily incremental backup script

set -euo pipefail

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
SRC="/var/www"
BACKUP_ROOT="/mnt/backup"
DEST="${BACKUP_ROOT}/${TIMESTAMP}"
LATEST="${BACKUP_ROOT}/latest"
LOG="/var/log/backup.log"

echo "[${TIMESTAMP}] Starting backup..." | tee -a "$LOG"

# --link-dest enables snapshot-style incremental backups
# (only changed files consume new disk space)
rsync -avz --delete \
  --link-dest="${LATEST}" \
  --exclude='*.tmp' \
  --exclude='cache/' \
  "${SRC}/" "${DEST}/" 2>&1 | tee -a "$LOG"

# Update the 'latest' symlink
rm -f "${LATEST}"
ln -s "${DEST}" "${LATEST}"

echo "[$(date +%Y%m%d_%H%M%S)] Backup complete: ${DEST}" | tee -a "$LOG"

# Keep last 30 days of backups, delete older ones
find "${BACKUP_ROOT}" -maxdepth 1 -type d -name "20*" \
  -mtime +30 -exec rm -rf {} +

How --link-dest works: rsync creates hard links for files that are identical to the previous backup, rather than copying them. This means each backup is a "full snapshot" but only the changed files consume additional disk space. 30 daily backups might use only 2โ€“3x the storage of the original data.

xargs: Batch Processing Power

xargs converts standard input into command arguments, bridging the gap where Linux pipes can't pass arguments directly to commands.

# Basic: pass find results to rm
find /tmp -name "*.tmp" | xargs rm -f

# -I{} placeholder: insert argument at a specific position
find . -name "*.log" | xargs -I{} cp {} /backup/

# Handle filenames with spaces: use find -print0
find . -name "*.txt" -print0 | xargs -0 wc -l

# -P parallel execution (4 processes simultaneously)
find . -name "*.jpg" -print0 | xargs -0 -P4 -I{} convert {} -resize 800x {}-resized.jpg

# -n: number of arguments per invocation (2 at a time)
echo "a b c d e f" | xargs -n2 echo

# Find files with TODO and count lines
grep -rl "TODO" ./src | xargs wc -l

# Safely delete many files (avoids "argument list too long")
find /var/log -name "*.gz" -mtime +90 | xargs -r rm -f

# Interactive confirmation (-p prompts before each execution)
find . -name "*.bak" | xargs -p rm

Filenames with spaces require -print0 + -0: xargs splits on whitespace by default, so filenames with spaces will cause errors. Always use the find -print0 | xargs -0 combination when working with real file paths.

watch: Periodic Real-time Monitoring

watch repeatedly executes a command at a fixed interval and refreshes the screen โ€” a simple but powerful real-time monitoring tool.

# Refresh disk usage every 2 seconds
watch -n 2 df -h

# Highlight changes (-d)
watch -d -n 1 'ss -tnp'

# Monitor processes (simplified top alternative)
watch -n 1 'ps aux --sort=-%cpu | head -15'

# Monitor file count in a directory
watch -n 5 'ls -l /var/spool/mail/ | wc -l'

# Watch nginx access log line count
watch -n 2 'wc -l /var/log/nginx/access.log'

# Monitor network connection stats
watch -n 2 'ss -s'

# Hide the header bar
watch --no-title -n 1 uptime

inotifywait: Real-time File Change Detection

inotifywait uses the Linux kernel's inotify interface to watch filesystem events, triggering immediately when files are created, modified, or deleted. It's a foundation for automated deployments and hot-reload configuration systems.

# Install
sudo apt install inotify-tools

# Continuously monitor a directory (-m continuous, -r recursive, -e events)
inotifywait -m -r -e create,modify,delete /etc/nginx/

# Monitor with formatted output
inotifywait -m -r \
  --format '%T %w%f %e' \
  --timefmt '%Y-%m-%d %H:%M:%S' \
  -e create,modify,delete \
  /var/www/html/

# Auto-reload nginx when config file changes
inotifywait -m -e modify /etc/nginx/nginx.conf |
while read -r path action file; do
  echo "Config changed: $file ($action)"
  nginx -t && systemctl reload nginx
done

# Auto-sync: trigger rsync when local directory changes
inotifywait -m -r -e create,modify,delete /var/www/html/ |
while read -r dir event file; do
  echo "[$event] $dir$file"
  rsync -az /var/www/html/ user@remote:/var/www/html/
done

inotify kernel limits: The default max watchers is 8192 (/proc/sys/fs/inotify/max_user_watches). For large codebases, increase it: echo 524288 | sudo tee /proc/sys/fs/inotify/max_user_watches, and persist in /etc/sysctl.conf.

File Lookup Tools Compared

Command Search Scope Speed Notes
which PATH variable Instant Finds executable path in PATH only
type Shell built-in Instant Distinguishes builtins/aliases/functions/executables
whereis Fixed path list Fast Finds binary, man page, and source simultaneously
locate Database index Very fast Requires updatedb; new files have a delay
find Real-time filesystem scan Slow (large dirs) Most powerful; supports time/permission/size criteria
# which: find executable location
which python3
# /usr/bin/python3

# type: classify a command
type ls
# ls is aliased to `ls --color=auto'
type cd
# cd is a shell builtin

# whereis: find binary, man page, and source at once
whereis nginx
# nginx: /usr/sbin/nginx /etc/nginx /usr/share/man/man8/nginx.8.gz

# locate: database-based search, very fast
sudo updatedb           # update the database first
locate "*.conf" | grep nginx

# find: real-time precise search
find /etc -name "*.conf" -mtime -7    # configs modified in last 7 days
find /var/log -size +100M             # log files over 100MB
find . -perm /4000                    # files with SUID bit (security audit)

Text Processing Foundation Tools

# wc: count lines/words/bytes
wc -l access.log          # line count
wc -w document.txt        # word count
wc -c binary.dat          # byte count

# sort
sort -n numbers.txt        # numeric sort
sort -rn numbers.txt       # reverse numeric sort
sort -k2 -t: /etc/passwd   # sort by field 2, delimiter :
sort -u names.txt          # sort and deduplicate

# uniq: deduplicate (requires sorted input)
sort access.log | uniq -c | sort -rn | head -20   # top 20 most frequent lines

# cut: field extraction
cut -d: -f1 /etc/passwd    # first column (username) with : delimiter
cut -c1-10 file.txt        # first 10 characters of each line

# paste: horizontal file merge
paste file1.txt file2.txt        # merge columns with tab
paste -d, file1.txt file2.txt   # merge with comma delimiter

# tee: output to both screen and file
ls -la | tee listing.txt                    # display and save
echo "start" | tee -a build.log             # append mode

{{end}}

Previous
โ† Ch2: Filesystem Deep Dive


Next
Ch4: Text Processing Tools โ†’
Rate this chapter
4.8  / 5  (77 ratings)

๐Ÿ’ฌ Comments