Chapter 2

Core Python for Automation

Chapter 2: Core Python for Automation

This chapter is not a Python primer โ€” it's a curated toolkit for automation specifically. Six topics, each one you'll return to repeatedly in later chapters. The Chinese section above contains fully annotated code examples for each. This section provides the key rules and quick reference for English readers.

Path Handling: pathlib vs os.path

Rule: use pathlib.Path for all path operations. Forget os.path.

Old: os.path

import os
sub = os.path.join("/data", "2024", "r.csv")
name = os.path.basename(sub)
exists = os.path.exists(sub)

New: pathlib

from pathlib import Path
sub = Path("/data") / "2024" / "r.csv"
name = sub.name        # "r.csv"
exists = sub.exists()

Key methods: p.name, p.stem, p.suffix, p.parent, p.with_stem(), p.with_suffix(), p.glob("*.py"), p.rglob("*.csv"), p.mkdir(parents=True, exist_ok=True), Path.home(), Path.cwd(). The / operator joins path components cross-platform.

File I/O: Encoding and Context Managers

Always use with open(...) as f โ€” it guarantees the file is closed even on exceptions. Key rules:

Regular Expressions: The re Module

Compile patterns once outside loops with re.compile() for better performance. Core methods:

Common patterns for automation: email (r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"), ISO date (r"\d{4}-\d{2}-\d{2}"), URL (r"https?://[^\s<>\"']+").

Data Structures: Comprehensions, collections, dataclasses

Concurrency: Choosing the Right Model

Model Best For Typical Use Caveat
threading / ThreadPoolExecutor I/O-bound (network, file I/O) Concurrent API calls, batch file processing GIL blocks true CPU parallelism
multiprocessing CPU-bound (image processing, heavy math) Parallel image compression High IPC overhead, slow startup
asyncio Massive concurrent I/O (100+ simultaneous) High-volume web scraping Requires async-native libraries

The standard pattern for I/O-bound automation: use ThreadPoolExecutor as a context manager, submit tasks with executor.submit(fn, arg), collect results with as_completed(futures). Set max_workers to 4-16 for local file I/O, 10-50 for network I/O. Larger is not always better โ€” excess threads add context-switch overhead.

Type Hints: Why Automation Scripts Need Them Too

Automation scripts run unattended. A wrong parameter type can silently corrupt an entire batch run. Type hints enable your IDE and mypy to catch these errors before execution.

Python 3.10+ key rules: use str | None instead of Optional[str]; use built-in list[str], dict[str, int], tuple[Path, Path] instead of typing.List etc.; use Sequence[Path] when you want to accept both lists and tuples. Always annotate function signatures; annotate local variables only when the type isn't obvious.

Enforce types before deployment: Run mypy your_script.py (install with pip install mypy) to catch type errors statically. VS Code + Pylance highlights problems in real time as you type.

Previous Chapter

Next Chapter
Chapter 3: AI-Assisted Programming
Rate this chapter
4.6  / 5  (84 ratings)

๐Ÿ’ฌ Comments