Description

Analyze, classify, organize, summarize, and report on PDFs within a given local folder, including batch processing and auto-sorting by topic.

README (SKILL.md)

\r \r

文件分析器\r

Name: document-management
Author: wyj0124

\r 当用户提供一个包含多个文档的本地目录，并希望：\r

对文档进行管理\r
批量提取文档文本\r
对文档进行主题分类\r
基于固定模板输出一份总报告\r
自动将文档移动到分类后的文件夹\r \r ---\r

Token Extraction\r

\r From user input 管理 D:\测试路径下的文档 → file_path = D:\测试\r \r

技能目标\r

\r 本技能的目标是处理一个本地文件夹，并输出一份完整报告。\r \r 整个流程必须分为三个阶段：\r \r

提取文本\r
文档分类 + 文件整理\r
写报告\r \r 其中：\r

第一阶段由脚本完成\r
第二阶段完成分类 + 文件移动\r
第三阶段写报告时，必须参考用户提供的报告模板或技能内预设模板\r ---\r \r

输入\r

\r 用户会提供一个本地目录路径，例如：\r \r D:\papers\r \r 目录中应包含一个或多个 .pdf 文件。\r \r 如果用户提供的是单个文件而不是目录，不要假装支持目录分析。 \r 应明确说明该输入不符合本技能预期。\r \r ---\r \r

第一部分：提取文本\r

\r 这一阶段只负责从目录中的 PDF 提取原始文本。\r \r

脚本职责\r

\r 运行脚本，遍历目标目录中的全部 PDF，并提取每篇文档的纯文本。\r \r 脚本只负责：\r

校验目录是否存在\r
遍历目录中的 PDF 文件\r
提取 PDF 文本\r
返回结构化结果\r \r 不要把以下逻辑写进 Python：\r
分类任务\r
摘要生成\r
报告写作\r
模板填充\r \r 这些工作都应交给模型完成。\r \r

脚本调用\r

\r 运行脚本：\r \r

run {baseDir}/scripts/extract_pdf_folder.py "\x3Cfolder_path>"\r
```\r
\r
# 第二部分：文档分类\r
\r
## 分类数量要求\r
主题数量应根据文档数量动态调整，建议：\r
- 文档数量 3-6 篇：3-4 个主题\r
- 文档数量 7-12 篇：4-6 个主题\r
- 文档数量 12 篇以上：6-8 个主题\r
\r
分类时应考虑：\r
- **研究领域**（如机器学习、交通工程、气候科学等）\r
- **应用场景**（如预测、检测、建模、分析等）\r
- **数据类型**（如 GPS 轨迹、时序数据、遥感数据等）\r
- **方法论**（如深度学习、统计方法、混合方法等）\r
\r
优先按研究领域细分，同一领域内可按应用场景或方法论进一步区分。\r
\r
## 第二部分输入\r
输入为第一阶段返回的 `documents` 列表。\r
\r
## 第二部分执行\r
逐篇读取 `text`，判断其最核心主题。\r
将其加入到对应的文件夹。\r
\r
## 文件整理（移动文档到分类文件夹）\r
\r
### 执行要求\r
在完成文档分类后，必须将每篇 PDF 文件移动到对应的分类文件夹中。\r
\r
### 操作步骤\r
1. 在目标目录（PDF 所在目录）下创建以主题命名的子文件夹\r
2. 将每篇 PDF 移动到对应主题的子文件夹中\r
3. 文件夹名称应简洁明了，建议使用中文命名（如"电动汽车出行模式"、"城市货运物流"等）\r
\r
### 注意事项\r
- 如果目标文件夹已存在同名子文件夹，直接使用\r
- 移动前确保原文件没有被占用\r
- 报告中的"各主题下的文档归应与实际文件夹结构一致\r
\r
## 第二部分输出\r
至少形成两类内部结果：\r
1. 文档到主题的映射\r
2. 主题到文档集合的映射（根据文档数量动态调整）\r
\r
# 第三部分：写报告\r
\r
这一阶段根据以下输入生成最终报告：\r
\r
1. 第一阶段提取出的文本\r
2. 第二阶段生成的分类结果\r
3. 用户提供的模板或默认模板\r
\r
## 第三部分目标\r
\r
生成一份完整总报告。  \r
报告应先呈现整体内容，再呈现单篇文档卡片。  \r
不要把单篇卡片单独作为第一结果输出。\r
\r
## 模板位置\r
\r
默认报告模板文件位于：\r
\r
`references/report-template.md`\r
\r
写报告时，必须先读取并遵循该模板。\r
\r
如果用户在当前对话中提供了自己的模板，则优先使用用户模板。  \r
如果用户没有提供模板，则使用默认模板。\r
\r
## 第三部分执行顺序\r
\r
1. 读取并理解报告模板\r
2. 根据第一阶段文本和第二阶段分类结果整理报告内容\r
3. 先写文件夹概览\r
4. 再写主题分类结果\r
5. 再写各主题下的文档归并\r
6. 再写总体结论\r
7. 最后写单篇文档卡片\r
8. 若存在失败文件，则在末尾列出\r
\r
## 第三部分输出\r
\r
最终只输出一份完整报告，并保存。\r
\r
报告中至少应包含：\r
- 文件夹概览\r
- 主题分类结果\r
- 各主题下的文档归并\r
- 总体结论\r
- 单篇文档卡片\r
- 处理失败的文件（如有）\r
\r
## 模板遵循要求\r
\r
不要脱离模板自由生成章节结构。  \r
不要擅自增加“文档对比”“差异分析”“优劣比较”等内容，除非用户明确要求。  \r
如果模板中的某个字段无法从文本中得到明确支持，写：\r
\r
`未明确提及`\r
\r
不要为了填满模板而编造内容。

Usage Guidance

This skill largely does what it says: it extracts text from PDFs, classifies them, and generates a report from a template. However: - The SKILL.md and the script disagree on the script filename (SKILL.md expects extract_pdf_folder.py; provided file is scripts/analyze_pdf_folder.py whose usage text still references extract_pdf_folder.py). Expect invocation failures until names are reconciled. - The skill explicitly requires moving your PDF files into topic subfolders; these file moves are destructive if you run it on originals. Back up the folder or test on a copy first. - The Python script requires the pypdf package (pip install pypdf). There are no network calls or hidden endpoints in the code. Before installing/running: fix or confirm the script filename/command, make a backup copy of your PDFs, and ensure pypdf is available. If you need the file-moving automated, confirm whether the agent will perform safe moves (only within the provided folder) or whether you prefer to move files manually after classification.

Capability Analysis

Type: OpenClaw Skill Name: document-management Version: 1.0.0 The skill is designed for local PDF document management, including text extraction, thematic classification, and file organization into subfolders. The Python script analyze_pdf_folder.py uses the pypdf library to extract text and saves results locally without any network activity or unauthorized data access. While the instructions in SKILL.md direct the agent to move files, this behavior is consistent with the stated purpose of organizing documents and does not demonstrate malicious intent.

Capability Assessment

ℹ Purpose & Capability

Name/description align with the included files: a text-extraction script and a report template. However the SKILL.md repeatedly references a script named extract_pdf_folder.py (and example run commands likewise), while the repository provides scripts/analyze_pdf_folder.py whose usage/help text still mentions extract_pdf_folder.py — this filename mismatch indicates the skill is likely broken or out of sync. The SKILL.md requires automatic moving of files into topic folders, but no code in the repo implements that; the agent (or operator) will be expected to perform the move operations.

ℹ Instruction Scope

Instructions are limited to local PDF extraction, classification, creating subfolders, moving PDFs into those subfolders, and generating a single report from a template. All referenced filesystem paths are within the user-supplied folder and the provided template. No network endpoints or unrelated system files are referenced. Caveat: the SKILL.md instructs running a script filename that doesn't match the included script, so automated invocation may fail unless corrected. Also, the classification + file-move steps are specified as required but are left for the model/agent to perform (not implemented in the script), meaning the agent will have to perform potentially destructive file operations.

✓ Install Mechanism

No install spec provided (instruction-only + one script). The script depends on the pypdf Python package (the script will exit with a message instructing pip install pypdf if missing). No downloads, remote executables, or unusual install steps are present.

✓ Credentials

The skill requires no environment variables, credentials, or config paths. The only required runtime dependency is pypdf, which is proportionate to extracting PDF text.

✓ Persistence & Privilege

always is false and model invocation is allowed (normal). The skill does not request persistent or elevated platform privileges. It will read and write files inside a user-specified folder (including creating subfolders and moving PDFs), which is expected for its purpose but is a destructive action to note.

Version History

v1.0.0

Document-Management 1.0.0 - Initial release enabling batch analysis and organization of PDF files in a specified local folder. - Supports text extraction from PDFs, theme-based document classification, and automatic file sorting into categorized subfolders. - Generates a comprehensive report for all documents, following a user-provided or default template. - Ensures clear separation of responsibilities between script (text extraction) and model (classification, report generation). - Provides guidance for categorization granularity and template adherence, including explicit handling of unsupported or missing fields.

Metadata

Slug document-management

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is document-management?

Analyze, classify, organize, summarize, and report on PDFs within a given local folder, including batch processing and auto-sorting by topic. It is an AI Agent Skill for Claude Code / OpenClaw, with 291 downloads so far.

How do I install document-management?

Run "/install document-management" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is document-management free?

Yes, document-management is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does document-management support?

document-management is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created document-management?

It is built and maintained by wyj0124 (@wyj0124); the current version is v1.0.0.

More Skills

document-management