← Back to Skills Marketplace
zhengbin1973

Pdf Highlight Extractor

by zhengbin1973 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
48
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install pdf-highlight-extractor
Description
识别用户发来的 PDF 文档中的高亮标注内容(荧光笔标记),提取所有高亮文字, 汇总后生成一个带 YAML Front Matter(title、date、tags 三件套)的 Markdown 文件。 title 和 tags 由 AI 根据内容语义自动生成;Markdown 包含「摘录原文」和「内容总结」两部...
README (SKILL.md)

\r \r

PDF 高亮提取 → Markdown 技能\r

\r

目标\r

\r 从用户提供的 PDF 文件中提取所有高亮(荧光笔)标注文字,汇总后生成带 YAML Front Matter 的 Markdown 文档。\r \r

工作流程\r

\r

Step 1:确认依赖\r

\r 首次使用时,先运行安装脚本确保 pymupdf 已安装:\r \r

\x3Cpython> scripts/install_deps.py\r
```\r
\r
其中 `\x3Cpython>` 替换为当前环境的 Python 路径(优先使用 managed 版本)。\r
\r
### Step 2:提取高亮(JSON 模式)\r
\r
用 JSON 模式运行提取脚本,获得结构化的高亮数据供后续 AI 处理:\r
\r
```bash\r
\x3Cpython> scripts/extract_highlights.py "\x3Cpdf_path>" --json\r
```\r
\r
- `\x3Cpdf_path>`:用户提供的 PDF 绝对路径\r
- 脚本输出 JSON,包含每条高亮的 `page`(页码)、`color`(颜色名)、`text`(内容)\r
- 如果用户只想提取特定颜色,加 `--color yellow`(支持 yellow/green/red/blue/pink/orange/purple/cyan)\r
\r
### Step 3:AI 生成标题和 Tags\r
\r
分析所有高亮文本的语义,生成:\r
\r
- **title**:3~10 字,概括高亮内容的核心主题,中文优先\r
- **tags**:3~6 个标签,涵盖主题领域、文档类型、关键概念,全部小写,用中文或英文均可\r
\r
### Step 4:生成 Markdown 文件\r
\r
按以下模板在 **PDF 同目录**生成 `\x3Cpdf文件名>_highlights.md`:\r
\r
```markdown\r
---\r
title: "\x3CAI生成的标题>"\r
date: \x3C今日日期 YYYY-MM-DD>\r
tags:\r
  - \x3Ctag1>\r
  - \x3Ctag2>\r
  - ...\r
---\r
\r
# \x3C标题>\r
\r
## 摘录原文\r
\r
### 第 N 页\r
\r
- 高亮内容1\r
- 高亮内容2\r
\r
### 第 M 页\r
\r
- ...\r
\r
---\r
\r
## 内容总结\r
\r
\x3CAI 根据所有高亮内容撰写的 200~400 字综合总结,提炼核心观点、关键数据和重要结论>\r
```\r
\r
### Step 5:输出确认\r
\r
告知用户:\r
- 生成的 Markdown 文件路径\r
- 共提取了多少条高亮、来自多少页\r
- 简要展示 YAML Front Matter 内容\r
\r
## 注意事项\r
\r
- 若脚本报告「未找到任何高亮标注」,可能是 PDF 使用了图片扫描而非文字型高亮,或高亮格式为手写/非标准注释;此时如实告知用户\r
- 若 PDF 路径含中文或空格,确保用双引号包裹路径\r
- 总结部分需真正阅读所有摘录内容后撰写,不能只复述标题\r
Usage Guidance
Before installing, consider that the skill will read the selected PDF's highlighted text, ask the AI to summarize it, install PyMuPDF if needed, and create a Markdown file in the same directory as the PDF. Avoid using it on confidential PDFs in shared, synced, or version-controlled folders unless that output location is acceptable.
Capability Assessment
Purpose & Capability
The scripts and instructions are coherent with the stated purpose: read a specified PDF, extract highlight annotations using PyMuPDF, and format the results as Markdown.
Instruction Scope
The activation text includes broader phrases like generating reading notes, but it is bounded by the user providing a PDF and indicating highlight or annotation-related intent.
Install Mechanism
First use runs a small dependency installer that invokes pip for the fixed package pymupdf; this is disclosed and purpose-aligned, with no dynamic package names or hidden commands.
Credentials
Read and write access are proportionate for processing a local PDF and creating an output file, though the extracted text may contain sensitive content.
Persistence & Privilege
No background process, credential access, privilege escalation, destructive action, or ongoing persistence is present; the only durable output is the Markdown file and the installed Python dependency.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install pdf-highlight-extractor
  3. After installation, invoke the skill by name or use /pdf-highlight-extractor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release: Extracts all highlight annotations from user-provided PDF files and generates a Markdown summary with YAML Front Matter (title, date, tags). - Automatically analyzes the highlights to generate a concise title and relevant tags. - Markdown output includes both original excerpts, organized by page, and an AI-written summary of the highlighted content. - Produced Markdown is saved in the same directory as the PDF, with a clear naming convention. - Informs users of extraction results, including number of highlights, pages, and YAML metadata. - Handles scenarios where no highlights are found or highlights are not in extractable format.
Metadata
Slug pdf-highlight-extractor
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Pdf Highlight Extractor?

识别用户发来的 PDF 文档中的高亮标注内容(荧光笔标记),提取所有高亮文字, 汇总后生成一个带 YAML Front Matter(title、date、tags 三件套)的 Markdown 文件。 title 和 tags 由 AI 根据内容语义自动生成;Markdown 包含「摘录原文」和「内容总结」两部... It is an AI Agent Skill for Claude Code / OpenClaw, with 48 downloads so far.

How do I install Pdf Highlight Extractor?

Run "/install pdf-highlight-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Pdf Highlight Extractor free?

Yes, Pdf Highlight Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Pdf Highlight Extractor support?

Pdf Highlight Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Pdf Highlight Extractor?

It is built and maintained by zhengbin1973 (@zhengbin1973); the current version is v1.0.0.

💬 Comments