← 返回 Skills 市场
longjf25

DocHub

作者 juanfenglong · GitHub ↗ · v1.4.0 · MIT-0
cross-platform ⚠ suspicious
163
总下载
0
收藏
0
当前安装
6
版本数
在 OpenClaw 中安装
/install dochub
功能描述
All-in-one document management: batch convert to Markdown, auto-categorize, full-text search, and intelligent output. 全能文档管理技能,整合文档生命周期管理与智能检索。 Trigger: init...
使用说明 (SKILL.md)

dochub / 文档工作台\r

\r

技能简介\r

\r dochub 是一个专业的文档知识库管理技能,负责将原始文档初始化为标准 Markdown 格式,提供专业的知识库文档概要与索引,并能根据检索内容分析、汇总并回复用户问题。\r \r

支持的文档格式\r

\r | 格式 | 状态 | 说明 |\r |------|------|------|\r | .docx | ✅ 支持 | Word 现代格式 |\r | .xlsx | ✅ 支持 | Excel 现代格式 |\r | .doc / .xls / .pdf / .pptx / 其他 | ❌ 不支持 | 请先转换为 .docx 或 .xlsx |\r \r

注意:dochub 仅支持处理 .docx 和 .xlsx 两种格式,其他格式文档将被跳过并提示用户。\r \r

核心功能\r

\r

1. 初始化(init)\r

\r 将工作区原始文档转换为标准知识库格式:\r \r

原始文档 → raw/ → 安全确认 → 检测不支持的格式 → MD转换(.docx/.xlsx) → 生成知识库概要与索引\r
```\r
\r
**步骤说明:**\r
1. **安全确认**:询问用户文档是否已脱敏(不含敏感个人信息、机密数据等),必须确认后才继续\r
2. **移动原始文档**:将所有原始文档统一移动到工作区根目录 `raw/` 文件夹\r
3. **文件名规范化**:只保留中文、英文、数字及中横线 `-`,其余字符统一替换为 `-`\r
4. **检测不支持的格式**:扫描非 .docx/.xlsx 文件,如发现则列出并提示用户这些文件将被跳过\r
5. **MD 文档转换**:使用 `markitdown` 按原目录结构转换为 MD 文档\r
   - 仅支持格式:.docx, .xlsx\r
   - 转换前检查目标文件是否存在\r
   - 首次存在时询问用户选择「跳过」或「覆盖」,后续自动应用该选择\r
6. **生成知识库概要与索引**:生成 `_docs_knowledge_base.md`,包含:\r
   - 文档统计概览(总数、大小、分类数)\r
   - 分类目录树(可视化结构)\r
   - 高频关键词标签云\r
   - 文档详细索引(按分类组织的文件清单)\r
\r
### 2. 增量更新\r
\r
将新文档放入 `update/` 目录后,运行增量更新:\r
\r
- **安全确认**:同样需要先确认文档已脱敏\r
- **检测新增/变更文档**\r
- **检测不支持的格式**:列出非 .docx/.xlsx 文件并提示跳过\r
- **仅转换未转换或已修改的文件**\r
- **更新知识库概要与索引**\r
\r
### 3. 检索与问答\r
\r
支持以下检索方式:\r
\r
- **全文检索**:在 MD 文档中搜索关键词\r
- **分类检索**:按文档分类查找\r
- **语义问答**:基于文档内容回答用户问题\r
\r
## 使用方式\r
\r
### 初始化知识库\r
\r
```\r
使用 dochub 技能,初始化文档知识库\r
```\r
\r
或指定工作区:\r
\r
```\r
使用 dochub 技能,初始化 [指定路径] 的文档知识库\r
```\r
\r
### 增量更新\r
\r
```\r
使用 dochub 技能,增量更新文档\r
```\r
\r
### 检索文档\r
\r
```\r
使用 dochub 技能,检索 [关键词]\r
```\r
\r
### 问答\r
\r
```\r
使用 dochub 技能,回答:[问题]\r
```\r
\r
## 目录结构\r
\r
```\r
workspace/\r
├── raw/                        # 原始文档存放目录\r
├── _docs_md/                   # MD 文档输出目录\r
├── _docs_knowledge_base.md     # 知识库概要与索引(合并文档)\r
└── update/                     # 增量更新目录\r
```\r
\r
## 依赖工具\r
\r
- **markitdown** 0.1.5+:MD 转换核心工具\r
- **python-docx**:Word 文档处理\r
- **openpyxl**:Excel 文档处理\r
\r
## 注意事项\r
\r
1. **格式限制**:dochub 仅支持 .docx 和 .xlsx 格式,其他格式(.doc/.xls/.pdf/.pptx 等)将被跳过并提示用户\r
2. **安全确认**:每次初始化或增量更新前,必须确认文档已脱敏\r
3. **备份原始文档**:初始化会修改文件名和目录结构,建议提前备份\r
4. **跳过/覆盖选择**:首次遇到重复文件时会询问,后续自动应用该选择\r
5. **原始文档保留**:转换后保留原始文档作为备份\r
安全使用建议
This skill is a local document conversion/indexer and will move and rename items inside your workspace and execute local Python modules (including markitdown). Before installing or running: 1) Back up your workspace (scripts will move and rename files). 2) Note WORKBUDDY_WORKSPACE is read by the code but not declared—ensure this env var is set deliberately or tests run from a safe test directory. 3) Review and, if needed, run the scripts on a small sample to confirm they move the files you expect (init.step1 moves directories only; single files at workspace root may be skipped). 4) Inspect/verify the markitdown package you will install (pip install) since the conversion step runs it as 'python -m markitdown'. 5) If you need non-interactive automation, be aware of --yes/--force/--skip-conflict flags which bypass interactive confirmations (including desensitization confirmation). 6) If anything seems unexpected (extra libraries for PDF/XLS support, broad renaming behavior), ask the publisher for clarification or run in an isolated environment first.
能力评估
Purpose & Capability
The name/description promise local conversion and knowledge-base generation, which the included scripts implement; however there are mismatches: the SKILL.md states all original documents will be moved into workspace/raw/, but init.step1 only collects and moves directories (it explicitly skips top-level files), so individual files at the workspace root may be ignored. Also requirements.txt includes libraries for PDF/XLS legacy handling (pdfplumber, xlrd, pywin32) despite the skill claiming it only supports .docx/.xlsx—this is disproportionate to the stated narrow format support and may indicate leftover code or scope creep.
Instruction Scope
SKILL.md instructs interactive safety confirmation and moving/renaming original documents, which the scripts perform. But the implementation differs: init.py moves directories only (not files), normalize_names renames many filesystem entries, and convert scripts spawn subprocesses to run markitdown. The code will modify filesystem structure (move/rename directories and files) and can operate non-interactively via --yes/--force flags, so automated runs could change many files if invoked without caution. There are no instructions or code that exfiltrate data or call external network endpoints, but external dependencies (markitdown) are executed as a Python module and could contain network behavior—this is not visible here.
Install Mechanism
No install spec is present (instruction-only), so nothing is downloaded by default. A requirements.txt is included, meaning a developer/operator might pip-install the listed packages; those packages come from PyPI (markitdown required). The skill executes markitdown via 'python -m markitdown' which will run whatever markitdown is installed in the environment—verify that package before installing. No suspicious download URLs or extract operations are present.
Credentials
The SKILL metadata declares no required environment variables, but the code checks WORKBUDDY_WORKSPACE to determine the workspace path. That undeclared env var is used to change which files are processed. No sensitive credentials are requested, but reliance on an undocumented env var is an incoherence and could cause the skill to operate on an unexpected directory. Other environment usage is limited to standard Python execution and subprocess invocation.
Persistence & Privilege
The skill does not request permanent always-enabled status and does not modify other skills or global agent configuration. Its main privileges are filesystem changes within the determined workspace (moving and renaming files), which are consistent with a document-management tool's needs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install dochub
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /dochub 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.4.0
优化进度显示,修复路径计算问题,提升性能
v1.3.0
v1.3.0: 命名规范强制执行 - init/run/process全流程规范化目录和文件名(只允许中文/英文/数字/下横线/中横线); 修复init超时; init重建前自动清理旧_docs_md
v1.2.3
新增目录名规范化功能:自动去除空格和特殊字符,解决Windows路径编码问题
v1.2.2
修复目录结构保留问题,新增工作知识库生成功能,优化初始化流程
v1.2.1
v1.2.1: bilingual display name
v1.2.0
v1.2.0: i18n SKILL.md (EN/ZH bilingual), security hardening (path traversal, subprocess safety, log fault-tolerance, MAX_TEXT_LENGTH), English directory naming (_docs_md, _convert_log.txt, _index.md)
元数据
Slug dochub
版本 1.4.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 6
常见问题

DocHub 是什么?

All-in-one document management: batch convert to Markdown, auto-categorize, full-text search, and intelligent output. 全能文档管理技能,整合文档生命周期管理与智能检索。 Trigger: init... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 163 次。

如何安装 DocHub?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dochub」即可一键安装,无需额外配置。

DocHub 是免费的吗?

是的,DocHub 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

DocHub 支持哪些平台?

DocHub 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 DocHub?

由 juanfenglong(@longjf25)开发并维护,当前版本 v1.4.0。

💬 留言讨论