Core Python for Automation
Chapter 2: Core Python for Automation
This chapter is not a Python primer — it's a curated toolkit for automation specifically. Six topics, each one you'll return to repeatedly in later chapters. The Chinese section above contains fully annotated code examples for each. This section provides the key rules and quick reference for English readers.
Path Handling: pathlib vs os.path
Rule: use pathlib.Path for all path operations. Forget os.path.
Old: os.path
import os
sub = os.path.join("/data", "2024", "r.csv")
name = os.path.basename(sub)
exists = os.path.exists(sub)
New: pathlib
from pathlib import Path
sub = Path("/data") / "2024" / "r.csv"
name = sub.name # "r.csv"
exists = sub.exists()
Key methods: p.name, p.stem, p.suffix, p.parent, p.with_stem(), p.with_suffix(), p.glob("*.py"), p.rglob("*.csv"), p.mkdir(parents=True, exist_ok=True), Path.home(), Path.cwd(). The / operator joins path components cross-platform.
File I/O: Encoding and Context Managers
Always use with open(...) as f — it guarantees the file is closed even on exceptions. Key rules:
- Always specify
encoding="utf-8"explicitly — do not rely on platform defaults - For files from Windows Chinese systems: try
gbkifutf-8raisesUnicodeDecodeError - For large files: iterate line-by-line (
for line in f) rather thanf.read()to avoid loading everything into memory - For CSVs: always use the
csvmodule withcsv.DictReader/csv.writer— never split manually - For CSVs targeting Windows Excel: use
encoding="utf-8-sig"(BOM-prefixed) to avoid garbled Chinese characters
Regular Expressions: The re Module
Compile patterns once outside loops with re.compile() for better performance. Core methods:
pattern.search(text)— first match anywhere in string (returns Match or None)pattern.findall(text)— all matches as a list of stringspattern.finditer(text)— all matches as Match objects (memory-efficient)pattern.sub(replacement, text)— replace matches; replacement can be a function(?P<name>...)— named capture groups; access withm.group("name")orm.groupdict()
Common patterns for automation: email (r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"), ISO date (r"\d{4}-\d{2}-\d{2}"), URL (r"https?://[^\s<>\"']+").
Data Structures: Comprehensions, collections, dataclasses
- List/dict/set comprehensions — prefer over for loops for simple transformations; more readable and often faster
collections.defaultdict(list)— group items by key without KeyError; eliminates the "check if key exists, then append" patterncollections.Counter— count element frequencies;.most_common(n)returns the top n items@dataclass— declare structured data containers with auto-generated__init__,__repr__,__eq__; usefield(default_factory=list)for mutable defaults
Concurrency: Choosing the Right Model
| Model | Best For | Typical Use | Caveat |
|---|---|---|---|
| threading / ThreadPoolExecutor | I/O-bound (network, file I/O) | Concurrent API calls, batch file processing | GIL blocks true CPU parallelism |
| multiprocessing | CPU-bound (image processing, heavy math) | Parallel image compression | High IPC overhead, slow startup |
| asyncio | Massive concurrent I/O (100+ simultaneous) | High-volume web scraping | Requires async-native libraries |
The standard pattern for I/O-bound automation: use ThreadPoolExecutor as a context manager, submit tasks with executor.submit(fn, arg), collect results with as_completed(futures). Set max_workers to 4-16 for local file I/O, 10-50 for network I/O. Larger is not always better — excess threads add context-switch overhead.
Type Hints: Why Automation Scripts Need Them Too
Automation scripts run unattended. A wrong parameter type can silently corrupt an entire batch run. Type hints enable your IDE and mypy to catch these errors before execution.
Python 3.10+ key rules: use str | None instead of Optional[str]; use built-in list[str], dict[str, int], tuple[Path, Path] instead of typing.List etc.; use Sequence[Path] when you want to accept both lists and tuples. Always annotate function signatures; annotate local variables only when the type isn't obvious.
Enforce types before deployment: Run
mypy your_script.py(install withpip install mypy) to catch type errors statically. VS Code + Pylance highlights problems in real time as you type.
Previous Chapter
Next Chapter
Chapter 3: AI-Assisted Programming