← Back to Skills Marketplace

Pdf Highlight Extractor

Name: Pdf Highlight Extractor
Author: zhengbin1973

by zhengbin1973 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install pdf-highlight-extractor

Description

识别用户发来的 PDF 文档中的高亮标注内容（荧光笔标记），提取所有高亮文字，汇总后生成一个带 YAML Front Matter（title、date、tags 三件套）的 Markdown 文件。 title 和 tags 由 AI 根据内容语义自动生成；Markdown 包含「摘录原文」和「内容总结」两部...

README (SKILL.md)

\r \r

PDF 高亮提取 → Markdown 技能\r

目标\r

\r 从用户提供的 PDF 文件中提取所有高亮（荧光笔）标注文字，汇总后生成带 YAML Front Matter 的 Markdown 文档。\r \r

工作流程\r

Step 1：确认依赖\r

\r 首次使用时，先运行安装脚本确保 pymupdf 已安装：\r \r

\x3Cpython> scripts/install_deps.py\r
```\r
\r
其中 `\x3Cpython>` 替换为当前环境的 Python 路径（优先使用 managed 版本）。\r
\r
### Step 2：提取高亮（JSON 模式）\r
\r
用 JSON 模式运行提取脚本，获得结构化的高亮数据供后续 AI 处理：\r
\r
```bash\r
\x3Cpython> scripts/extract_highlights.py "\x3Cpdf_path>" --json\r
```\r
\r
- `\x3Cpdf_path>`：用户提供的 PDF 绝对路径\r
- 脚本输出 JSON，包含每条高亮的 `page`（页码）、`color`（颜色名）、`text`（内容）\r
- 如果用户只想提取特定颜色，加 `--color yellow`（支持 yellow/green/red/blue/pink/orange/purple/cyan）\r
\r
### Step 3：AI 生成标题和 Tags\r
\r
分析所有高亮文本的语义，生成：\r
\r
- **title**：3~10 字，概括高亮内容的核心主题，中文优先\r
- **tags**：3~6 个标签，涵盖主题领域、文档类型、关键概念，全部小写，用中文或英文均可\r
\r
### Step 4：生成 Markdown 文件\r
\r
按以下模板在 **PDF 同目录**生成 `\x3Cpdf文件名>_highlights.md`：\r
\r
```markdown\r
---\r
title: "\x3CAI生成的标题>"\r
date: \x3C今日日期 YYYY-MM-DD>\r
tags:\r
  - \x3Ctag1>\r
  - \x3Ctag2>\r
  - ...\r
---\r
\r
# \x3C标题>\r
\r
## 摘录原文\r
\r
### 第 N 页\r
\r
- 高亮内容1\r
- 高亮内容2\r
\r
### 第 M 页\r
\r
- ...\r
\r
---\r
\r
## 内容总结\r
\r
\x3CAI 根据所有高亮内容撰写的 200~400 字综合总结，提炼核心观点、关键数据和重要结论>\r
```\r
\r
### Step 5：输出确认\r
\r
告知用户：\r
- 生成的 Markdown 文件路径\r
- 共提取了多少条高亮、来自多少页\r
- 简要展示 YAML Front Matter 内容\r
\r
## 注意事项\r
\r
- 若脚本报告「未找到任何高亮标注」，可能是 PDF 使用了图片扫描而非文字型高亮，或高亮格式为手写/非标准注释；此时如实告知用户\r
- 若 PDF 路径含中文或空格，确保用双引号包裹路径\r
- 总结部分需真正阅读所有摘录内容后撰写，不能只复述标题\r

Usage Guidance

Before installing, consider that the skill will read the selected PDF's highlighted text, ask the AI to summarize it, install PyMuPDF if needed, and create a Markdown file in the same directory as the PDF. Avoid using it on confidential PDFs in shared, synced, or version-controlled folders unless that output location is acceptable.

Capability Assessment

✓ Purpose & Capability

The scripts and instructions are coherent with the stated purpose: read a specified PDF, extract highlight annotations using PyMuPDF, and format the results as Markdown.

ℹ Instruction Scope

The activation text includes broader phrases like generating reading notes, but it is bounded by the user providing a PDF and indicating highlight or annotation-related intent.

ℹ Install Mechanism

First use runs a small dependency installer that invokes pip for the fixed package pymupdf; this is disclosed and purpose-aligned, with no dynamic package names or hidden commands.

ℹ Credentials

Read and write access are proportionate for processing a local PDF and creating an output file, though the extracted text may contain sensitive content.

✓ Persistence & Privilege

No background process, credential access, privilege escalation, destructive action, or ongoing persistence is present; the only durable output is the Markdown file and the installed Python dependency.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install pdf-highlight-extractor
After installation, invoke the skill by name or use /pdf-highlight-extractor
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release: Extracts all highlight annotations from user-provided PDF files and generates a Markdown summary with YAML Front Matter (title, date, tags). - Automatically analyzes the highlights to generate a concise title and relevant tags. - Markdown output includes both original excerpts, organized by page, and an AI-written summary of the highlighted content. - Produced Markdown is saved in the same directory as the PDF, with a clear naming convention. - Informs users of extraction results, including number of highlights, pages, and YAML metadata. - Handles scenarios where no highlights are found or highlights are not in extractable format.

Metadata

Slug pdf-highlight-extractor

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Pdf Highlight Extractor?

识别用户发来的 PDF 文档中的高亮标注内容（荧光笔标记），提取所有高亮文字，汇总后生成一个带 YAML Front Matter（title、date、tags 三件套）的 Markdown 文件。 title 和 tags 由 AI 根据内容语义自动生成；Markdown 包含「摘录原文」和「内容总结」两部... It is an AI Agent Skill for Claude Code / OpenClaw, with 48 downloads so far.

How do I install Pdf Highlight Extractor?

Run "/install pdf-highlight-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Pdf Highlight Extractor free?

Yes, Pdf Highlight Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Pdf Highlight Extractor support?

Pdf Highlight Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Pdf Highlight Extractor?

It is built and maintained by zhengbin1973 (@zhengbin1973); the current version is v1.0.0.

More Skills