第 14 章

GUI 自动化——pyautogui 与 Playwright 操控桌面

第14章：GUI 自动化——用 pyautogui 和 pywinauto 控制桌面应用

不是所有软件都提供 API。当你面对没有接口的老旧 ERP、只能手动点击的政府系统、或者需要操作本地桌面软件时，GUI 自动化是最后的武器。本章带你掌握 pyautogui 的全套鼠标键盘控制、pywinauto 的 Windows 控件操作，以及图像识别和 OCR 技术，最后用一个完整的自动化填报项目串联所有知识点。

什么时候用 GUI 自动化

在动手写代码之前，先想清楚：GUI 自动化是最后的手段，不是首选方案。

选型对比

方案	适用场景	稳定性	速度
直接调用 API / SDK	有官方接口的系统	极高	极快
Playwright / Selenium	Web 应用、浏览器操作	高	快
pywinauto	Windows 原生桌面应用（有控件树）	中高	中
pyautogui + 图像识别	任何可见界面，最后手段	中低	慢

用 GUI 自动化的合理理由：

老旧 ERP / MES 系统，没有 API，只有桌面客户端
政府或机构系统，必须通过特定软件操作
需要操作的功能没有快捷键或命令行接口
第三方软件不支持二次开发

GUI 自动化的主要风险：

脆弱性：界面稍有变化（分辨率、主题、字体大小）就可能失效
速度慢：需要等待界面响应，无法并发
调试困难：错误现象不直观，排查费时
跨平台问题：pywinauto 只支持 Windows；pyautogui 跨平台但功能有限

**使用前先评估：**如果目标系统有 Web 版，优先用 Playwright。如果是 Windows 原生应用且有标准控件，优先用 pywinauto。只有当以上方案都不可行时，才考虑基于图像识别的 pyautogui 方案。

pyautogui 基础

pyautogui 是跨平台的 GUI 自动化库，通过模拟鼠标和键盘操作控制任何可见的界面。

pip install pyautogui pillow

鼠标控制

import pyautogui
import time

# 获取屏幕尺寸
width, height = pyautogui.size()
print(f"屏幕分辨率：{width} x {height}")

# 获取当前鼠标位置（调试时非常有用）
x, y = pyautogui.position()
print(f"鼠标当前位置：({x}, {y})")

# 移动鼠标（不点击）
pyautogui.moveTo(500, 300, duration=0.5)   # 0.5秒平滑移动到绝对坐标
pyautogui.moveRel(100, 0, duration=0.3)    # 相对当前位置向右移动100像素

# 点击操作
pyautogui.click(500, 300)                  # 左键单击
pyautogui.doubleClick(500, 300)            # 左键双击
pyautogui.rightClick(500, 300)             # 右键单击
pyautogui.middleClick(500, 300)            # 中键点击

# 拖拽操作
pyautogui.dragTo(700, 400, duration=1.0)   # 拖拽到目标坐标
pyautogui.dragRel(200, 0, duration=0.5)    # 相对拖拽

# 滚动
pyautogui.scroll(3)    # 向上滚动3格
pyautogui.scroll(-3)   # 向下滚动3格

键盘输入

import pyautogui

# 输入文本（typewrite 不支持中文！）
pyautogui.typewrite('Hello World', interval=0.05)  # 每个字符间隔0.05秒

# 中文输入方案：先复制到剪贴板，再粘贴
import pyperclip
pyperclip.copy('你好世界')
pyautogui.hotkey('ctrl', 'v')

# 单个按键
pyautogui.press('enter')
pyautogui.press('tab')
pyautogui.press('escape')
pyautogui.press('f5')

# 组合键
pyautogui.hotkey('ctrl', 'a')          # 全选
pyautogui.hotkey('ctrl', 'c')          # 复制
pyautogui.hotkey('ctrl', 'z')          # 撤销
pyautogui.hotkey('alt', 'f4')          # 关闭窗口
pyautogui.hotkey('ctrl', 'shift', 's') # 三键组合

# 按住/释放（用于复杂组合）
pyautogui.keyDown('shift')
pyautogui.press('left')  # Shift+左箭头选中文字
pyautogui.keyUp('shift')

**中文输入问题：**pyautogui 的 typewrite 方法底层使用键盘事件模拟，不支持中文字符。处理中文的标准做法是：用 pyperclip 把中文文本写入剪贴板，再用 Ctrl+V 粘贴。需要安装 pip install pyperclip。

截图与图像识别

import pyautogui

# 截取全屏
screenshot = pyautogui.screenshot()
screenshot.save('screen.png')

# 截取指定区域（left, top, width, height）
region = pyautogui.screenshot(region=(0, 0, 800, 600))

# 图像识别：在屏幕上查找某个图像的位置
# 需要提前截取目标按钮/元素的截图保存为 button.png
location = pyautogui.locateOnScreen('button.png', confidence=0.9)
if location:
    print(f"找到目标，位置：{location}")
    center = pyautogui.center(location)  # 获取中心点
    pyautogui.click(center)
else:
    print("未找到目标图像")

# 等待图像出现（最多等待10秒）
try:
    location = pyautogui.locateOnScreen('loading_done.png',
                                         confidence=0.85,
                                         minSearchTime=10)
    pyautogui.click(pyautogui.center(location))
except pyautogui.ImageNotFoundException:
    print("超时：目标图像未出现")

**图像识别依赖 opencv-python：**使用 confidence 参数需要安装 pip install opencv-python。confidence 值在 0-1 之间，建议设为 0.85-0.95。值太低会误匹配，太高在分辨率不同的机器上容易找不到。

安全措施：FAILSAFE

import pyautogui

# pyautogui 默认开启 FAILSAFE
# 只需把鼠标移动到屏幕左上角（0,0），就会触发异常，立即停止脚本
# 这是防止自动化脚本失控的安全机制

pyautogui.FAILSAFE = True   # 默认已开启，建议保持
pyautogui.PAUSE = 0.5       # 每个操作之间默认暂停0.5秒（防止操作过快）

# 测试时建议在脚本开头加3秒延迟，给自己时间切换到目标窗口
import time
print("3秒后开始执行，请切换到目标窗口...")
time.sleep(3)

pywinauto（Windows 专用）

pywinauto 通过 Windows 的 UI Automation 或 Win32 API 直接操作应用程序的控件树，比基于坐标的 pyautogui 稳定得多。

pip install pywinauto

连接到应用窗口

from pywinauto.application import Application
from pywinauto import Desktop
import time

# 方式1：启动新应用
app = Application(backend='uia').start('notepad.exe')
time.sleep(1)  # 等待应用启动

# 方式2：连接到已运行的应用（通过进程名）
app = Application(backend='uia').connect(path='notepad.exe')

# 方式3：连接到已运行的应用（通过窗口标题）
app = Application(backend='uia').connect(title_re='.*记事本.*')

# 获取主窗口
window = app.top_window()
print(window.window_text())  # 打印窗口标题

# 打印控件树（调试神器）
window.print_control_identifiers()

操作控件

from pywinauto.application import Application

app = Application(backend='uia').start('notepad.exe')
window = app.top_window()

# 通过控件类型和文本定位
edit = window.child_window(control_type='Edit')
edit.click_input()
edit.type_keys('Hello pywinauto{ENTER}', with_spaces=True)

# 通过 auto_id 或 title 定位（从 print_control_identifiers 获取）
btn = window.child_window(title='保存', control_type='Button')
btn.click()

# 菜单操作
window.menu_select('文件->另存为')

# 下拉框操作
combo = window.child_window(control_type='ComboBox', found_index=0)
combo.select('UTF-8')

完整案例：自动操作老旧 ERP 系统导出数据

"""
场景：老旧 ERP 系统没有 API，只能通过桌面客户端导出报表。
目标：自动打开导出功能，设置日期范围，点击导出，保存文件。
"""
from pywinauto.application import Application
from pywinauto.keyboard import send_keys
import time
import os

def export_erp_report(start_date: str, end_date: str, save_path: str):
    """
    自动从 ERP 系统导出指定日期范围的报表。
    start_date / end_date: 格式 "YYYY-MM-DD"
    """
    # 连接到已打开的 ERP 系统
    app = Application(backend='uia').connect(title_re='.*ERP.*')
    main_win = app.top_window()

    # 通过菜单打开报表功能
    main_win.menu_select('报表->销售日报->按日期导出')
    time.sleep(1.5)

    # 找到新弹出的对话框
    export_dlg = app.window(title_re='.*导出.*')
    export_dlg.wait('ready', timeout=10)

    # 填写开始日期
    start_edit = export_dlg.child_window(auto_id='startDate', control_type='Edit')
    start_edit.set_edit_text(start_date)

    # 填写结束日期
    end_edit = export_dlg.child_window(auto_id='endDate', control_type='Edit')
    end_edit.set_edit_text(end_date)

    # 点击导出按钮
    export_btn = export_dlg.child_window(title='导出', control_type='Button')
    export_btn.click()

    # 等待另存为对话框出现
    save_dlg = app.window(title='另存为')
    save_dlg.wait('exists', timeout=15)

    # 输入保存路径
    file_name_edit = save_dlg.child_window(auto_id='1148', control_type='Edit')
    file_name_edit.set_edit_text(save_path)

    # 点击保存
    save_dlg.child_window(title='保存', control_type='Button').click()
    time.sleep(2)
    print(f"导出完成：{save_path}")

# 使用示例
export_erp_report(
    start_date='2024-01-01',
    end_date='2024-01-31',
    save_path=r'C:\Reports\sales_2024_01.xlsx'
)

图像识别自动化

当 pywinauto 无法识别控件（如某些 Java Swing、Qt 应用），或者界面元素没有可访问性标识时，可以用 OpenCV 模板匹配来定位元素。

OpenCV 模板匹配

import cv2
import numpy as np
import pyautogui
from PIL import Image

def find_element_on_screen(template_path: str, threshold: float = 0.8):
    """
    在当前屏幕截图中查找模板图像，返回中心坐标。
    threshold: 匹配度阈值，0-1 之间，建议 0.8-0.95
    """
    # 截取当前屏幕
    screenshot = pyautogui.screenshot()
    screen_np = np.array(screenshot)
    screen_gray = cv2.cvtColor(screen_np, cv2.COLOR_RGB2GRAY)

    # 读取模板图像
    template = cv2.imread(template_path, cv2.IMREAD_GRAYSCALE)
    h, w = template.shape

    # 执行模板匹配
    result = cv2.matchTemplate(screen_gray, template, cv2.TM_CCOEFF_NORMED)
    min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

    if max_val >= threshold:
        # 计算匹配区域的中心点
        center_x = max_loc[0] + w // 2
        center_y = max_loc[1] + h // 2
        return center_x, center_y, max_val
    return None

# 使用示例
result = find_element_on_screen('submit_button.png', threshold=0.85)
if result:
    x, y, confidence = result
    print(f"找到元素，置信度：{confidence:.2f}，位置：({x}, {y})")
    pyautogui.click(x, y)
else:
    print("未找到目标元素")

处理不同分辨率和缩放

import pyautogui
import ctypes

def get_dpi_scale():
    """获取系统 DPI 缩放比例（Windows）"""
    try:
        awareness = ctypes.c_int()
        ctypes.windll.shcore.GetProcessDpiAwareness(0, ctypes.byref(awareness))
        dpi = ctypes.windll.user32.GetDpiForWindow(ctypes.windll.user32.GetForegroundWindow())
        return dpi / 96.0
    except Exception:
        return 1.0

def screenshot_region_scaled(region: tuple):
    """考虑DPI缩放的区域截图"""
    scale = get_dpi_scale()
    # 调整区域坐标以适应实际物理像素
    scaled_region = (
        int(region[0] * scale),
        int(region[1] * scale),
        int(region[2] * scale),
        int(region[3] * scale),
    )
    return pyautogui.screenshot(region=scaled_region)

定时截图与监控

定时截取屏幕并检测异常

import pyautogui
import schedule
import time
import os
from datetime import datetime
from pathlib import Path

SCREENSHOT_DIR = Path("./screenshots")
SCREENSHOT_DIR.mkdir(exist_ok=True)

def capture_and_check():
    """截图并检查是否有异常状态"""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = SCREENSHOT_DIR / f"screen_{timestamp}.png"

    # 截取屏幕
    screenshot = pyautogui.screenshot()
    screenshot.save(filename)

    # 检查特定状态（比如错误弹窗）
    error_dialog = pyautogui.locateOnScreen('error_dialog.png', confidence=0.85)
    if error_dialog:
        print(f"[{timestamp}] 检测到错误弹窗！发送告警...")
        # 这里可以接入第10章的邮件通知或第11章的推送通知
        send_alert(f"系统出现错误弹窗，截图已保存：{filename}")

        # 自动关闭错误弹窗
        close_btn = pyautogui.locateOnScreen('close_button.png', confidence=0.9)
        if close_btn:
            pyautogui.click(pyautogui.center(close_btn))

def send_alert(message: str):
    """告警函数（接入通知系统）"""
    print(f"ALERT: {message}")
    # 实际接入见第11章通知推送

# 每5分钟执行一次
schedule.every(5).minutes.do(capture_and_check)

print("屏幕监控已启动，按 Ctrl+C 停止...")
while True:
    schedule.run_pending()
    time.sleep(1)

OCR 提取屏幕文字

"""
使用 pytesseract 从截图中提取文字。
前置条件：
  1. 安装 Tesseract OCR 引擎：https://github.com/tesseract-ocr/tesseract
  2. pip install pytesseract pillow
  3. 中文识别需要下载 chi_sim.traineddata 语言包
"""
import pytesseract
import pyautogui
from PIL import Image, ImageFilter

# Windows 需要指定 Tesseract 安装路径
# pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

def extract_text_from_region(region: tuple, lang: str = 'chi_sim+eng') -> str:
    """
    从屏幕指定区域提取文字。
    region: (left, top, width, height)
    lang: 语言，'chi_sim' 中文简体，'eng' 英文，可组合
    """
    screenshot = pyautogui.screenshot(region=region)

    # 预处理提升识别率
    img = screenshot.convert('L')              # 转灰度
    img = img.filter(ImageFilter.SHARPEN)      # 锐化
    img = img.point(lambda x: 0 if x < 140 else 255)  # 二值化

    text = pytesseract.image_to_string(img, lang=lang, config='--psm 6')
    return text.strip()

# 使用示例：读取屏幕某区域的数值
value_text = extract_text_from_region(region=(850, 320, 200, 40), lang='eng')
print(f"识别到的数值：{value_text}")

# 判断系统状态
status_text = extract_text_from_region(region=(0, 0, 400, 30))
if '错误' in status_text or 'Error' in status_text.lower():
    print("检测到错误状态！")

实战项目：自动化填报系统

场景：公司每天需要把 Excel 表格中的数据逐条录入一个老旧的桌面申报系统（没有 API，只能手动填写）。表格有 50+ 条记录，手动操作需要 2 小时。我们用 pyautogui + pywinauto 实现全自动填报。

"""
自动化填报系统 - 完整实现
依赖：openpyxl, pyautogui, pywinauto, pyperclip
"""
import time
import pyautogui
import pyperclip
import openpyxl
from pathlib import Path
from dataclasses import dataclass
from typing import Optional
from pywinauto.application import Application

pyautogui.FAILSAFE = True
pyautogui.PAUSE = 0.3

@dataclass
class FormRecord:
    """一条需要填报的记录"""
    company_name: str
    tax_id: str
    amount: float
    category: str
    remark: str

def load_records_from_excel(file_path: str) -> list[FormRecord]:
    """从 Excel 读取待填报数据"""
    wb = openpyxl.load_workbook(file_path)
    ws = wb.active
    records = []
    for row in ws.iter_rows(min_row=2, values_only=True):
        if not row[0]:  # 跳过空行
            continue
        records.append(FormRecord(
            company_name=str(row[0]),
            tax_id=str(row[1]),
            amount=float(row[2]),
            category=str(row[3]),
            remark=str(row[4]) if row[4] else ''
        ))
    print(f"读取到 {len(records)} 条记录")
    return records

def type_chinese(text: str):
    """通过剪贴板输入中文"""
    pyperclip.copy(text)
    pyautogui.hotkey('ctrl', 'v')
    time.sleep(0.2)

def fill_form_field(field_pos: tuple, value: str, clear_first: bool = True):
    """点击字段并填写内容"""
    pyautogui.click(*field_pos)
    time.sleep(0.2)
    if clear_first:
        pyautogui.hotkey('ctrl', 'a')  # 全选现有内容
        time.sleep(0.1)
    if any('\u4e00' <= c <= '\u9fff' for c in value):  # 包含中文
        type_chinese(value)
    else:
        pyautogui.typewrite(value, interval=0.04)

class FormFiller:
    """申报系统自动填报器"""

    # 界面元素坐标（需根据实际系统截图调整）
    POSITIONS = {
        'new_record_btn': (120, 85),     # 新建按钮
        'company_name':   (280, 220),    # 单位名称输入框
        'tax_id':         (280, 265),    # 税号输入框
        'amount':         (280, 310),    # 金额输入框
        'category_combo': (280, 355),    # 类别下拉框
        'remark':         (280, 400),    # 备注输入框
        'submit_btn':     (400, 480),    # 提交按钮
        'confirm_btn':    (380, 340),    # 确认对话框的确认按钮
    }

    def __init__(self, app_title: str):
        self.app = Application(backend='uia').connect(title_re=app_title)
        self.window = self.app.top_window()
        print(f"已连接到应用：{self.window.window_text()}")

    def wait_for_ready(self, timeout: int = 5):
        """等待界面就绪"""
        time.sleep(0.5)
        # 检查提交按钮是否可见（代表界面已加载完毕）
        deadline = time.time() + timeout
        while time.time() < deadline:
            btn = pyautogui.locateOnScreen('submit_btn.png', confidence=0.85)
            if btn:
                return True
            time.sleep(0.3)
        return False

    def fill_one_record(self, record: FormRecord) -> bool:
        """填写一条记录，返回是否成功"""
        try:
            # 点击新建
            pyautogui.click(*self.POSITIONS['new_record_btn'])
            time.sleep(0.8)

            # 填写各字段
            fill_form_field(self.POSITIONS['company_name'], record.company_name)
            pyautogui.press('tab')
            fill_form_field(self.POSITIONS['tax_id'], record.tax_id)
            pyautogui.press('tab')
            fill_form_field(self.POSITIONS['amount'], f"{record.amount:.2f}")
            pyautogui.press('tab')

            # 下拉框选择类别
            pyautogui.click(*self.POSITIONS['category_combo'])
            time.sleep(0.3)
            type_chinese(record.category)
            pyautogui.press('enter')
            time.sleep(0.2)

            # 备注（可选）
            if record.remark:
                fill_form_field(self.POSITIONS['remark'], record.remark)

            # 提交
            pyautogui.click(*self.POSITIONS['submit_btn'])
            time.sleep(1.0)

            # 处理可能出现的确认弹窗
            confirm = pyautogui.locateOnScreen('confirm_dialog.png', confidence=0.9)
            if confirm:
                pyautogui.click(*self.POSITIONS['confirm_btn'])
                time.sleep(0.5)

            return True

        except Exception as e:
            print(f"填写失败：{record.company_name} - {e}")
            return False

def run_auto_fill(excel_path: str, app_title: str):
    """主函数：从 Excel 读取数据并自动填报"""
    records = load_records_from_excel(excel_path)
    filler = FormFiller(app_title)

    success_count = 0
    failed_records = []

    print(f"\n开始自动填报，共 {len(records)} 条记录...")
    print("将鼠标移到屏幕左上角可紧急停止\n")

    for i, record in enumerate(records, 1):
        print(f"[{i}/{len(records)}] 正在填报：{record.company_name}", end=' ')
        if filler.fill_one_record(record):
            print("成功")
            success_count += 1
        else:
            print("失败")
            failed_records.append(record)
        time.sleep(0.5)  # 每条记录之间稍作停顿

    print(f"\n填报完成：成功 {success_count} 条，失败 {len(failed_records)} 条")
    if failed_records:
        print("失败记录：")
        for r in failed_records:
            print(f"  - {r.company_name} ({r.tax_id})")

# 启动脚本，给操作人员3秒时间切换到申报系统
if __name__ == '__main__':
    print("3 秒后开始执行，请确保申报系统已打开并处于主界面...")
    time.sleep(3)
    run_auto_fill(
        excel_path='./data/申报数据.xlsx',
        app_title='.*申报系统.*'
    )

**坐标定位技巧：**运行 python -c "import pyautogui; import time; time.sleep(3); print(pyautogui.position())"，3 秒内把鼠标移到目标控件上，就能读出它的屏幕坐标。每个字段都这样测量一遍，填入 POSITIONS 字典。

上一章

下一章
第15章：定时任务与 CLI 工具

本章评分

4.6 / 5 (18 评分)