← 返回 Skills 市场
86
总下载
1
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install markitdown-file-converter
功能描述
将 PDF、Word (docx/doc)、Excel (xlsx/xls)、PPT (pptx/ppt)、图片等文件一键转换为 Markdown 或 JSON。 内置三大引擎:pandoc(DOCX 表格/Emoji/公式最强)、markitdown(微软开源,Excel/PPT/图片 OCR)、mammoth...
安全使用建议
This skill mostly does what it claims (local conversion + OCR), but it will by default send images/documents to a remote 'PaddleOCR Cloud' endpoint because the code ships with a non-empty default API URL and token. Before installing or running: 1) Do not run it on sensitive documents without first verifying or disabling the cloud OCR: set environment variables PADDLEOCR_DOC_PARSING_API_URL="" and/or PADDLEOCR_ACCESS_TOKEN="" to disable the cloud path, or remove/patch scripts/ocr/paddleocr.py so is_configured() returns False unless you explicitly configure it. 2) If you must use cloud OCR, replace the default endpoint/token with a known, trusted service and a token you control, and inspect network traffic to confirm destination. 3) Consider running conversions with the pix2tex / RapidOCR local engines only (they are present) or audit/modify the code to never call external endpoints automatically. 4) If unsure, run the skill in an isolated environment (offline or sandboxed) and review/grep the repository for other hard-coded endpoints or secrets before use.
功能分析
Type: OpenClaw Skill
Name: markitdown-file-converter
Version: 1.0.0
The skill bundle implements a document converter with high-risk capabilities, including automated system-level software installation and data transmission to external endpoints. It uses subprocess calls to execute 'pip install', 'winget', and 'apt-get' for dependency management (scripts/utils/deps.py, scripts/backends/pandoc.py), and it sends document data to a hardcoded third-party API (https://c474r929pea0qa6c.aistudio-app.com/layout-parsing) for OCR processing (scripts/ocr/paddleocr.py). While these behaviors are documented as features, the combination of automated environment modification and document exfiltration to a specific external service meets the threshold for suspicious activity.
能力标签
能力评估
Purpose & Capability
The skill is a document-to-Markdown/JSON converter and its files and CLI match that purpose. However, the code includes a 'PaddleOCR Cloud' integration with a default API URL and hard-coded access token so the skill will call an external cloud API by default. Requiring a remote OCR service is not inherently wrong, but the registry metadata declared no required env vars/credentials and the README implied cloud OCR is used only if configured — the code contradicts that by enabling the cloud path via non-empty defaults.
Instruction Scope
SKILL.md describes local installs and optional 'PaddleOCR Cloud' usage 'if configured'. But the runtime instructions and code will attempt to call PaddleOCR Cloud automatically because defaults are present. The skill will read images/files and POST them to an external HTTP endpoint (ocr/paddleocr.py -> httpx.post). That behavior (sending potentially sensitive document contents to a third-party endpoint) is not clearly documented as enabled by default in SKILL.md and thus expands the instruction scope unexpectedly.
Install Mechanism
There is no platform install spec in registry metadata (instruction-driven). The scripts run pip installs at runtime (subprocess pip install) and pandoc download logic uses GitHub releases or winget/brew/apt — these are expected for this functionality. No obfuscated installers or unusual download hosts are present except the single hard-coded PaddleOCR API endpoint for runtime calls (not an installer).
Credentials
Registry metadata lists no required environment variables or credentials, but the code reads PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN (in scripts/ocr/paddleocr.py). Worse, these have non-empty default values in the code, so the cloud OCR path is considered 'configured' even if the user sets nothing. A document conversion skill should not transmit document contents to a remote service without explicit, declared credentials or opt-in.
Persistence & Privilege
The skill does not request permanent inclusion (always=false), does not modify other skills, and does not persist credentials or change global agent configuration. It runs installs in the current environment, which is expected for a utility script.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install markitdown-file-converter - 安装完成后,直接呼叫该 Skill 的名称或使用
/markitdown-file-converter触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
markitdown-file-converter 1.0.0
- 首次发布,支持将 PDF、Word、Excel、PPT、图片等主流文件一键转换成 Markdown 或 JSON。
- 集成三大引擎(pandoc、markitdown、mammoth),可自动安装和检测依赖,无需手动配置。
- 支持数学公式转 LaTeX、表格转标准 Markdown 表格、图片自动提取与 OCR 文字/公式识别、base64 图片自动解码。
- 提供批量转换、按标题结构化 JSON 输出、超时控制、详细进度反馈等增强功能。
- 支持命令行一键转换、目录批量处理,并自动为不同文件类型选择最优后端。
元数据
常见问题
Markitdown File Converter 是什么?
将 PDF、Word (docx/doc)、Excel (xlsx/xls)、PPT (pptx/ppt)、图片等文件一键转换为 Markdown 或 JSON。 内置三大引擎:pandoc(DOCX 表格/Emoji/公式最强)、markitdown(微软开源,Excel/PPT/图片 OCR)、mammoth... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 86 次。
如何安装 Markitdown File Converter?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install markitdown-file-converter」即可一键安装,无需额外配置。
Markitdown File Converter 是免费的吗?
是的,Markitdown File Converter 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Markitdown File Converter 支持哪些平台?
Markitdown File Converter 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Markitdown File Converter?
由 SQLSkills(@sqlskills)开发并维护,当前版本 v1.0.0。
推荐 Skills