← 返回 Skills 市场
301
总下载
0
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install doc-genius
功能描述
支持PDF、Word、Markdown智能摘要和格式转换,提供批量处理与进度报告,提升文档处理效率。
使用说明 (SKILL.md)
Doc Genius - 智能文档处理助手
快速开始
智能摘要
# PDF摘要
python3 scripts/doc_processor.py summarize /path/to/document.pdf
# Word摘要
python3 scripts/doc_processor.py summarize /path/to/document.docx
# Markdown摘要
python3 scripts/doc_processor.py summarize /path/to/document.md --format json
格式转换
# PDF → Markdown
python3 scripts/doc_processor.py convert /path/to/document.pdf --output markdown
# Word → Markdown
python3 scripts/doc_processor.py convert /path/to/document.docx --output markdown
# Markdown → HTML
python3 scripts/doc_processor.py convert /path/to/document.md --output html
批量处理
# 批量转换文件夹
python3 scripts/doc_processor.py batch /path/to/folder --output markdown
# 批量摘要
python3 scripts/doc_processor.py batch /path/to/folder --action summarize
输出格式
JSON格式(默认)
{
"file": "document.pdf",
"type": "pdf",
"summary": "这是文档的智能摘要...",
"keywords": ["关键词1", "关键词2"],
"word_count": 5000,
"pages": 12
}
Markdown格式
python3 scripts/doc_processor.py summarize document.pdf --format markdown
核心功能
1. 智能摘要
支持格式:
- ✅ PDF(PyPDF2)
- ✅ Word(.docx)
- ✅ Markdown
- ✅ 纯文本
摘要算法:
- 本地摘要(TextRank,速度快)
- AI摘要(OpenAI API,质量高)
示例:
# 本地摘要
python3 scripts/doc_processor.py summarize document.pdf --method local
# AI摘要(需配置API Key)
export OPENAI_API_KEY="sk-xxx"
python3 scripts/doc_processor.py summarize document.pdf --method ai
2. 格式转换
转换矩阵:
| 输入格式 | 输出格式 | 状态 |
|---|---|---|
| Markdown | ✅ | |
| HTML | ⚠️ 实验性 | |
| Word | Markdown | ✅ |
| Word | HTML | ✅ |
| Markdown | HTML | ✅ |
| Markdown | Word | 🔜 计划中 |
示例:
# PDF → Markdown(推荐)
python3 scripts/doc_processor.py convert report.pdf --output markdown
# Word → HTML
python3 scripts/doc_processor.py convert report.docx --output html
3. 批量处理
功能:
- 文件夹扫描
- 并发处理
- 进度报告
- 错误日志
示例:
# 批量转换(默认并发数=5)
python3 scripts/doc_processor.py batch /path/to/docs --output markdown
# 指定并发数
python3 scripts/doc_processor.py batch /path/to/docs --output markdown --workers 10
# 生成报告
python3 scripts/doc_processor.py batch /path/to/docs --action summarize --report report.json
4. 结构化提取(实验性)
提取内容:
- 标题层级
- 目录
- 关键信息(日期、金额、人名)
示例:
python3 scripts/doc_processor.py extract document.pdf --fields title,toc,dates
高级用法
使用AI摘要
# 配置API Key
export OPENAI_API_KEY="sk-xxx"
# AI摘要(更智能)
python3 scripts/doc_processor.py summarize document.pdf --method ai --model gpt-4
自定义输出
# 指定输出文件
python3 scripts/doc_processor.py convert document.pdf --output markdown --out-file output.md
# 指定输出目录
python3 scripts/doc_processor.py batch /path/to/docs --output-dir /path/to/output
过滤处理
# 只处理PDF文件
python3 scripts/doc_processor.py batch /path/to/docs --filter "*.pdf"
# 排除文件
python3 scripts/doc_processor.py batch /path/to/docs --exclude "temp_*"
技术细节
依赖库
PyPDF2==3.0.1 # PDF处理
python-docx==1.1.0 # Word处理
markdown==3.5.1 # Markdown处理
beautifulsoup4==4.12.2 # HTML解析
aiofiles==23.2.1 # 异步文件处理
安装依赖
pip install PyPDF2 python-docx markdown beautifulsoup4 aiofiles
性能优化
并发处理
- 默认并发数:5
- 最大并发数:20
- 推荐:根据CPU核心数调整
内存优化
- 流式处理大文件(>10MB)
- 分块处理(避免内存溢出)
错误处理
常见错误
| 错误 | 原因 | 解决方案 |
|---|---|---|
FileNotFoundError |
文件不存在 | 检查路径 |
PermissionError |
权限不足 | 检查文件权限 |
UnsupportedFormat |
格式不支持 | 查看支持列表 |
日志级别
# 调试模式
python3 scripts/doc_processor.py summarize document.pdf --log-level debug
最佳实践
1. 大文件处理
# 分块处理
python3 scripts/doc_processor.py summarize large.pdf --chunk-size 1000
2. 批量处理优化
# 使用适当的并发数
python3 scripts/doc_processor.py batch /path/to/docs --workers $(nproc)
3. 输出格式选择
| 场景 | 推荐格式 |
|---|---|
| 内容分析 | JSON |
| 人类阅读 | Markdown |
| 网页展示 | HTML |
使用场景
1. 研究人员
- 快速阅读大量论文
- 提取关键信息
- 生成文献摘要
2. 内容创作者
- 转换格式(PDF→Markdown)
- 提取素材
- 智能摘要
3. 企业用户
- 批量处理合同
- 文档格式统一
- 知识库构建
与其他技能配合
scrapling-fetch
# 抓取网页 → 转换PDF → 智能摘要
python3 scrapling-fetch/scripts/fetch.py "https://example.com/article" --text > temp.md
python3 doc-genius/scripts/doc_processor.py summarize temp.md
更新日志
v1.0.0 (2026-03-07)
- ✅ 初始发布
- ✅ 支持PDF/Word/Markdown摘要
- ✅ 支持格式转换
- ✅ 支持批量处理
反馈与支持
- GitHub Issues: [待补充]
- ClawHub: https://clawhub.com/skill/doc-genius
- Email: [待补充]
Doc Genius - 让文档处理更智能 📄✨
安全使用建议
This skill's core document-processing code appears legitimate, but there are important red flags you should address before installing or running it: (1) The package includes a 'paid' script that contacts an external billing service (skillpay.me) and contains a hard-coded billing API key — treat that key as sensitive and avoid running that script until you confirm its legitimacy. (2) The registry metadata does not declare environment variables (OPENAI_API_KEY) referenced in the docs; expect to provide your OpenAI key if you plan to use AI summarization. (3) Run the code in a restricted environment (container or sandbox) and inspect or remove the paid script if you do not intend to use billing. (4) Ask the author for provenance: where the hard-coded billing key came from, why billing is bundled but undocumented, and for a version without embedded secrets. If you cannot verify the source or the billing integration, do not run the paid script and consider rejecting this skill.
功能分析
Type: OpenClaw Skill
Name: doc-genius
Version: 1.2.0
The skill bundle provides legitimate document processing, summarization, and format conversion capabilities for PDF, Word, and Markdown files. The code in scripts/doc_processor.py and its variants (v2 and paid) uses standard libraries like PyPDF2 and python-docx for text extraction and offers both local TextRank-based and OpenAI-based summarization. While scripts/doc_processor_paid.py contains a hardcoded API key for a billing service (skillpay.me) and an absolute local file path (/Users/gaolei/...), these appear to be developer oversights or configuration remnants rather than intentional malicious behavior. No evidence of data exfiltration, unauthorized command execution, or prompt injection was found.
能力评估
Purpose & Capability
The name/description and most code files implement PDF/Word/Markdown summarization and conversion, which is coherent. However, a bundled 'paid' variant (scripts/doc_processor_paid.py) includes SkillPay billing integration (skillpay.me) and an embedded BILLING_API_KEY and SKILL_ID that are not mentioned in SKILL.md or registry metadata. The presence of billing code in a tool advertised as a free document processor is unexpected and should be justified.
Instruction Scope
SKILL.md instructs running scripts/doc_processor.py and references OPENAI_API_KEY for AI summarization. It does not mention the paid script or any billing/remote calls. The codebase includes additional scripts (doc_processor_paid.py and v2) that import 'requests' and contact external endpoints; this expands runtime actions beyond the documented instructions and the user-visible examples.
Install Mechanism
No install spec downloads arbitrary code; this is an instruction-and-source bundle. Dependencies are local Python packages (PyPDF2, python-docx, markdown, beautifulsoup4) and no external installers or archive downloads are used.
Credentials
Registry metadata declares no required environment variables, but SKILL.md and code reference OPENAI_API_KEY for AI summarization (expected). More concerning: scripts/doc_processor_paid.py hard-codes a BILLING_API_KEY and a user-specific VENV_PYTHON path. A billing API key embedded in code is disproportionate and sensitive; the skill also performs network calls to billing endpoints without documenting them in metadata or instructions.
Persistence & Privilege
The skill does not request always:true, does not claim to modify other skills, and appears to run as user-invoked scripts. No elevated persistence or automatic always-on behavior is present in the metadata.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install doc-genius - 安装完成后,直接呼叫该 Skill 的名称或使用
/doc-genius触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.2.0
优化:完整TextRank算法、智能降级策略、关键词提取增强、错误处理优化
v1.1.0
新增付费版本,集成SkillPay计费系统,支持自动扣费和充值链接生成
v1.0.0
首个版本:智能摘要、格式转换、批量处理
元数据
常见问题
Doc Genius 是什么?
支持PDF、Word、Markdown智能摘要和格式转换,提供批量处理与进度报告,提升文档处理效率。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 301 次。
如何安装 Doc Genius?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-genius」即可一键安装,无需额外配置。
Doc Genius 是免费的吗?
是的,Doc Genius 完全免费(开源免费),可自由下载、安装和使用。
Doc Genius 支持哪些平台?
Doc Genius 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Doc Genius?
由 imgolye(@imgolye)开发并维护,当前版本 v1.2.0。
推荐 Skills