功能描述

交互式 PDF 逐行阅读器。当用户想要阅读 PDF 文档、控制阅读进度（下一页、上一页、跳转第 X 页）、搜索内容、添加书签、整理 PDF 列表时使用此 skill。支持「开始读」「下一句」「去第 X 页」「搜索」「书签」等自然语言指令。适用于长文档分块阅读、定位特定章节、关键词搜索等场景。

使用说明 (SKILL.md)

Deep Reading Skill

Name: Book Walker
Author: youngandsure

交互式PDF逐行阅读器，支持块级/行级翻页、跳转、搜索、书签等功能。

触发方式

当用户想要阅读PDF文档、通过指令控制阅读进度时使用此skill。

功能

📖 打开PDF: 加载PDF文件并开始阅读
➡️ 下一句: 读取下一块/行内容
⬅️ 上一句: 返回上一块/行
🔄 重读: 重新读取当前内容
📑 跳转: 支持页码跳转、块跳转
🔍 搜索: 关键词全文搜索
🔖 书签: 添加和管理书签
📊 进度: 显示阅读进度

指令列表

指令	说明	示例
`开始读 \x3C文件>`	打开PDF文件	`开始读 /path/to/file.pdf`
`模式 text` / `模式 ocr`	切换提取模式	`模式 ocr`
`下一句` / `继续`	读取下一块	`下一句`
`上一句` / `后退`	返回上一块	`上一句`
`重读` / `再念一遍`	重读当前块	`重读`
`去第X页`	跳转到指定页	`去第10页`
`跳到第X块`	跳转到指定块	`跳到第5块`
`跳到第X行`	跳转到指定行	`跳到第50行`
`搜索 \x3C关键词>`	搜索关键词	`搜索机器学习`
`书签`	查看书签列表	`书签`
`书签添加 \x3C备注>`	添加书签	`书签添加重要`
`模板列表`	列出可用模板	`模板列表`
`模板使用 \x3C名>`	切换当前模板	`模板使用原文翻译解读`
`模板定义 \x3C名> \x3C内容>`	定义/覆盖模板	`模板定义简洁请逐句翻译`
`列出PDF` / `列表`	扫描 workspace 下所有 PDF 并建立索引	`列出PDF`
`进度`	显示阅读进度	`进度`
`暂停`	暂停阅读	`暂停`
`关闭`	关闭当前PDF	`关闭`
`帮助`	显示帮助	`帮助`

使用示例

用户: 列出PDF
助手: 📚 当前 Workspace 可读 PDF 索引
     路径: /path/to/workspace-e
     ━━━━━━━━━━━━━━━━━━━━━━━━
     共 3 个 PDF：
      1. docs/report.pdf
      2. papers/paper.pdf
     💡 使用「开始读 1」或「开始读 report.pdf」打开

用户: 开始读 /home/docs/report.pdf
助手: 📄 第1页 · 块1/50 · [██░░░░░░░] 4%
     ━━━━━━━━━━━━━━━━━━━━━━━━
     第一章 项目概述
     
     本报告主要介绍...

用户: 下一句
助手: 📄 第1页 · 块2/50 · [██░░░░░░░] 6%
     ━━━━━━━━━━━━━━━━━━━━━━━━
     
     1.1 项目背景...

用户: 去第5页
助手: 📄 第5页 · 块20/50 · [████░░░░░] 40%
     ━━━━━━━━━━━━━━━━━━━━━━━━
     第二章 技术方案...

实现细节

核心模块

pdf-reader/
├── __init__.py         # 主入口
├── reader/
│   ├── types.py        # 类型定义
│   ├── exceptions.py   # 异常类
│   ├── blocks.py       # 数据类
│   ├── parser.py       # PDF解析引擎
│   ├── state.py        # 状态管理
│   └── cache.py        # 缓存管理
├── commands/
│   └── navigation.py   # 导航指令
└── ui/
    └── formatter.py   # 输出格式化

MVP 范围

✅ 可搜索PDF（非扫描件）
✅ 单栏布局
✅ 无页数限制（按需解析+落盘）
✅ 块级翻页
✅ 页码跳转
✅ 进度显示
✅ 书签（按 PDF 持久化）

技术依赖

pdfplumber: 文本/表格提取
PyMuPDF: 图片提取

模板与 Agent LLM 加工

块级导航：下一句 按块（paragraph）推进
模板：用户可 模板定义、模板使用，skill 不调用 LLM
Structured Payload：当使用非默认模板时，skill 在输出末尾附加 [PDF_READER_TEMPLATE_PAYLOAD] 块，内含 template_prompt、original、page、block_id
Agent 职责：解析该 payload 后，由 Agent 调用 LLM 按 template_prompt 对 original 加工（如原文/翻译/解读），将 LLM 输出呈现给用户

存储结构

每个 PDF 有独立目录（~/.cache/pdf-reader/{hash}/），包含：

index.json - 索引（total_pages, page_offsets）
p1.json, p2.json ... - 各页块数据
state.json - 阅读进度
bookmarks.json - 书签

注意事项

MVP版本不支持扫描件PDF（需要OCR）
MVP版本不支持多栏布局
大文件可能需要较长的初始加载时间
书签与解析结果、进度均按 PDF 分目录保存，切换 PDF 即切换目录

PDF 文本提取说明

提取模式

PDF Reader 支持两种文本提取模式：

模式	命令	说明
文本模式	`模式 text`	默认，使用 pdfplumber 直接提取 PDF 文本
OCR 模式	`模式 ocr`	将页面转为图片后用 tesseract 识别

文本模式

适用于：大多数正常 PDF
优点：速度快
问题：部分 PDF（老版本 LaTeX 生成）文本流没有空格

OCR 模式

适用于：扫描件PDF、文本提取有问题的 PDF
优点：识别准确
缺点：速度较慢

需要安装 tesseract：

# macOS
brew install tesseract

# Ubuntu
sudo apt install tesseract-ocr

使用示例

用户: 模式 ocr
助手: ✅ 已切换到 ocr 模式

用户: 开始读 xxx.pdf
助手: (使用 OCR 提取文本...)

技术实现

text_enhance.py：文本后处理模块
- enhance_text() - 智能添加空格
- is_scanned_pdf() - 检测是否为扫描件
parser.py：
- _extract_blocks() - 文本提取
- _extract_blocks_ocr() - OCR 提取

安全使用建议

What to consider before installing: - The skill will create a Python virtual environment and pip-install packages (pdfplumber and pypdfium2) into the skill directory; review those packages if you want to avoid installing new packages. - The SKILL.md/README mention PyMuPDF but the install command installs pypdfium2 and the code only imports pdfplumber — this mismatch is likely harmless but worth noting (OCR mode also requires a system tesseract binary if you use OCR). - The skill will scan your workspace for PDFs (it checks OPENCLAW_WORKSPACE or WORKSPACE_ROOT if set, otherwise guesses a workspace path) and will write per-PDF cache/state under ~/.cache/deep-reading (and create a .venv under the skill path). If you prefer different locations, inspect/modify the code before use. - The skill intentionally returns a structured payload marker (PDF_READER_TEMPLATE_PAYLOAD) for the agent to call an LLM; the skill itself does not call external LLMs or network endpoints. The agent may invoke an LLM on that payload — ensure you are comfortable with your agent's LLM usage and data flow. - No network endpoints, secrets, or unrelated credentials are requested by the skill. If you want extra caution, run the skill in an isolated environment or inspect the full omitted files for anything unexpected before enabling it.

功能分析

Type: OpenClaw Skill Name: book-walker Version: 2.0.2 The book-walker skill is a legitimate interactive PDF reader providing text extraction, OCR capabilities via Tesseract, and a templated LLM processing system. The code is well-structured, using standard libraries like pdfplumber and pypdfium2, and handles subprocess calls for OCR safely by using temporary files. While it includes a design pattern where the skill provides instructions to the OpenClaw agent to process text via specific LLM prompts (found in SKILL.md and formatter.py), this is a transparently documented feature for translation and summarization rather than a malicious prompt-injection attack. No evidence of data exfiltration, unauthorized execution, or persistence was found.

能力评估

✓ Purpose & Capability

Name/description match the code: code parses PDFs, provides navigation, search, bookmarks, persistent per-PDF cache/state, and returns a structured payload for optional LLM processing by the agent.

✓ Instruction Scope

SKILL.md instructions align with code behavior: commands like '开始读', '下一句', '搜索', '书签' map to functions. The skill scans the workspace for PDFs (it reads OPENCLAW_WORKSPACE or WORKSPACE_ROOT if set, otherwise infers a workspace path), and it stores cache/state under a per-skill cache directory.

ℹ Install Mechanism

Installation is a simple venv creation + pip install (pdfplumber, pypdfium2) invoked via a shell command in SKILL.md. This is moderate risk (running pip installs). Small inconsistency: SKILL.md and README mention PyMuPDF as a dependency but the provided install command installs pypdfium2; the code imports pdfplumber (present) but not pypdfium2/PyMuPDF. The install command hardcodes the workspace path (~/.openclaw/workspace-e/skills/book-walker) which may differ from where the platform places skills.

✓ Credentials

The skill does not request environment variables or credentials. It does optionally read OPENCLAW_WORKSPACE or WORKSPACE_ROOT to locate the workspace; this is reasonable for a workspace-scanning PDF tool.

✓ Persistence & Privilege

always=false and agent-invocable defaults are normal. The skill creates a virtualenv in the skill folder and uses ~/.cache/deep-reading for per-PDF caches and state; this is expected for a local reader but the locations and write behavior are persistent and will create files under your home directory.

版本历史

v2.0.2

## book-walker 2.0.2 Changelog - Migrated dependency installation to Python venv; now uses a shell command to create a virtual environment and install dependencies. - Updated SKILL.md metadata to remove direct pip requirements, improving environment isolation. - Documentation reflects the new installation method. - No changes in command features or PDF reading behavior.

v2.0.1

- 添加 pdf_reader/template/manager.py 和 __init__.py 模板管理模块，引入自定义模板功能。 - 新增 pdf_reader/storage/state.json 状态持久化，支持保存和恢复阅读进度。 - 增加 config.yaml 配置文件，便于参数管理和扩展性。 - 完善存储与模板结构，为未来功能打下基础。

v2.0.0

Major refactor: Deep file/module restructuring and codebase cleanup. - Migrated documentation and code structure from "book-walker" to "pdf-reader", updating directory and cache naming. - Removed 17 files related to core functionality and documentation, likely in preparation for architectural changes or as part of deprecation. - The only remaining file is SKILL.md, now reflecting the new project naming and adjusted internal module paths. - PDF reading commands and usage remain similar, but implementation paths are now under "pdf-reader/" instead of "book-walker/" or "deep-reading/". - No changes to end-user commands or features indicated.

v1.0.1

Fix directory names in documentation

v1.0.0

- 首次发布交互式 PDF 逐行阅读器，支持中文自然语言指令。 - 支持按页/块/行阅读进度控制，包括「下一句」「上一句」「去第 X 页」「跳到第X块」等。 - 提供关键词搜索与书签功能，可对每个 PDF 独立管理进度和标记。 - 可扫描 workspace 下所有 PDF 并快速建立阅读索引。 - 支持文本与 OCR 两种提取模式，适配多种 PDF 类型。 - 内置模板与结构化 payload 机制，便于与 LLM 后端联动实现翻译、摘要等高级加工。

元数据

Slug book-walker

版本 2.0.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 5

常见问题

Book Walker 是什么？

交互式 PDF 逐行阅读器。当用户想要阅读 PDF 文档、控制阅读进度（下一页、上一页、跳转第 X 页）、搜索内容、添加书签、整理 PDF 列表时使用此 skill。支持「开始读」「下一句」「去第 X 页」「搜索」「书签」等自然语言指令。适用于长文档分块阅读、定位特定章节、关键词搜索等场景。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 276 次。

如何安装 Book Walker？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install book-walker」即可一键安装，无需额外配置。

Book Walker 是免费的吗？

是的，Book Walker 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Book Walker 支持哪些平台？

Book Walker 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Book Walker？

由 YoungAndSure（@youngandsure）开发并维护，当前版本 v2.0.2。

Book Walker

Deep Reading Skill

触发方式

功能

指令列表

使用示例

实现细节

核心模块

MVP 范围

技术依赖

模板与 Agent LLM 加工

存储结构

注意事项

PDF 文本提取说明

提取模式

文本模式

OCR 模式

使用示例

技术实现

Book Walker 是什么？

如何安装 Book Walker？

Book Walker 是免费的吗？

Book Walker 支持哪些平台？

谁开发了 Book Walker？

💬 留言讨论