← Back to Skills Marketplace
mapleshadow

MarkItDown文档转换中文版

by mapleshadow · GitHub ↗ · v1.0.3 · MIT-0
cross-platform ✓ Security Clean
104
Downloads
1
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install markitdown-zh
Description
使用微软 markitdown 库将多种文档格式(PDF、DOCX、PPTX、XLSX、XLS、CSV、JSON、TXT、EPUB、HTML等)转换为 Markdown。支持批量转换、保留格式、图片提取等功能。使用场景:(1) "把这个 PDF 转成 Markdown",(2) "批量转换这个文件夹里的文档",(...
README (SKILL.md)

MarkItDown 文档转换技能

使用微软的 markitdown 库将各种文档格式转换为 Markdown。

支持的格式

  • PDF (.pdf)
  • Word 文档 (.docx)
  • PowerPoint 演示文稿 (.pptx)
  • Excel 电子表格 (.xlsx .xls)
  • HTML 文件 (.html, .htm)
  • 纯文本文件 (.txt, .rtf, .xml, .csv, .json)
  • 电子书 (.epub)
  • 等等...

快速开始

单个文件转换

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)

使用提供的脚本

# 转换单个文件
python3 scripts/convert.py input.pdf output.md

# 批量转换文件夹
python3 scripts/batch_convert.py input_folder/ output_folder/

# 提取文档中的图片
python3 scripts/extract_images.py document.pdf images_folder/

详细用法

单个文件转换

使用 scripts/convert.py

python3 scripts/convert.py \x3Cinput_file> [output_file]

如果不指定输出文件,会自动生成 .md 文件。

批量转换

使用 scripts/batch_convert.py

python3 scripts/batch_convert.py \x3Cinput_directory> \x3Coutput_directory>

会递归处理目录中的所有支持的文件。

图片提取

使用 scripts/extract_images.py

python3 scripts/extract_images.py \x3Cinput_file> \x3Coutput_directory>

从文档中提取所有图片并保存到指定目录。

脚本说明

  • scripts/convert.py - 单个文件转换脚本
  • scripts/batch_convert.py - 批量转换脚本
  • scripts/extract_images.py - 图片提取脚本

每个脚本都有 --help 选项查看详细参数。

安装依赖

Python 版本要求

markitdown 需要 Python 3.10 或更高版本。

检查 Python 版本:

python3.12 --version  # 或 python3.11, python3.13

安装 markitdown

使用 Python 3.10+ 安装:

# 使用 Python 3.12(推荐)
# 安装pipx包管理应用,通过pipx安装包不需要(或者说是自动)新建虚拟环境
sudo apt-get install pipx
# 使用pipx安装markitdown包 all表示支持所有格式
pipx install 'markitdown[all]'

# 或使用虚拟环境(适用于非root用户,如node用户)
python3.12 -m venv markitdown-env
source markitdown-env/bin/activate
pip install "markitdown[all]"

可选:系统依赖

某些格式转换可能需要额外的系统依赖:

  • PDF 处理: brew install poppler (macOS) 或 sudo apt-get install poppler-utils (Linux)
  • OCR: brew install tesseract (macOS) 或 sudo apt-get install tesseract-ocr (Linux)

验证安装

python3.12 -c "from markitdown import MarkItDown; print('安装成功!')"

使用脚本

所有脚本都支持使用特定 Python 版本运行:

# 使用 Python 3.12 运行
python3.12 scripts/convert.py input.pdf output.md
python3.12 scripts/batch_convert.py input_folder/ output_folder/
python3.12 scripts/extract_images.py document.pdf images_folder/

另请参阅

Usage Guidance
This skill appears to be a straightforward wrapper around the markitdown Python library. Before installing: (1) verify you trust the markitdown package source (pip/pypi or the linked GitHub); (2) run installs inside a virtual environment (avoid sudo for pip installs) to limit system impact; (3) only point the scripts at directories/files you trust — the scripts will read any file under the provided input path and write output files to the specified output path; (4) installing optional system deps (poppler, tesseract) requires package manager privileges — confirm those commands on your OS. If you need extra assurance, review the upstream Microsoft markitdown repository and inspect the package you will install.
Capability Analysis
Type: OpenClaw Skill Name: markitdown-zh Version: 1.0.3 The skill is a legitimate wrapper for Microsoft's MarkItDown library, designed to convert various document formats (PDF, DOCX, etc.) into Markdown. The provided Python scripts (convert.py, batch_convert.py, and extract_images.py) perform standard file processing and image extraction tasks consistent with the tool's description. No evidence of data exfiltration, malicious command execution, or prompt injection was found.
Capability Assessment
Purpose & Capability
Name/description match the included scripts and SKILL.md: the skill wraps Microsoft's markitdown library to convert many document formats. The declared supported formats and example use cases align with the scripts' behavior.
Instruction Scope
SKILL.md and scripts only instruct installing markitdown and running local conversion/extraction scripts. The scripts read files from user-supplied input paths and write conversion output to specified directories — behavior that matches the stated purpose. There are no instructions to read unrelated system files, environment variables, or send data to external endpoints.
Install Mechanism
There is no automated install spec; installation guidance uses standard tools: pipx or virtualenv with pip and normal system package managers (apt/brew) for optional dependencies (poppler, tesseract). No downloads from obscure URLs or extracted archives are present in the skill package.
Credentials
The skill requires no environment variables or credentials. Optional system packages are appropriate for PDF/OCR support. No excessive or unrelated secrets/config paths are requested.
Persistence & Privilege
always is false and the skill does not attempt to persist or modify agent-wide settings. Scripts only operate on user-supplied file system paths and do not alter other skills' configurations.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install markitdown-zh
  3. After installation, invoke the skill by name or use /markitdown-zh
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.3
- 更新描述,明确列举支持的文档格式,如 CSV、JSON、TXT、EPUB、HTML 等。 - 移除“另请参阅”中的 USAGE-GUIDE.md 和 reference.md 说明,仅保留上游库链接。 - 其他内容保持不变。
v1.0.2
- 参照并重写
v1.0.1
markitdown-zh功能说明: - 支持使用微软 MarkItDown 库将多种文档类型(PDF、Word、PowerPoint、Excel、带 OCR 的图像、音频、HTML、YouTube)转换为 Markdown 格式。 - 提供命令行界面和 Python API 使用的设置说明。 - 包含使用示例和批量转换指南。 - 提供常见问题的故障排除部分。 - 附带了额外文档和用于批量操作的有用脚本。
Metadata
Slug markitdown-zh
Version 1.0.3
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 3
Frequently Asked Questions

What is MarkItDown文档转换中文版?

使用微软 markitdown 库将多种文档格式(PDF、DOCX、PPTX、XLSX、XLS、CSV、JSON、TXT、EPUB、HTML等)转换为 Markdown。支持批量转换、保留格式、图片提取等功能。使用场景:(1) "把这个 PDF 转成 Markdown",(2) "批量转换这个文件夹里的文档",(... It is an AI Agent Skill for Claude Code / OpenClaw, with 104 downloads so far.

How do I install MarkItDown文档转换中文版?

Run "/install markitdown-zh" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is MarkItDown文档转换中文版 free?

Yes, MarkItDown文档转换中文版 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does MarkItDown文档转换中文版 support?

MarkItDown文档转换中文版 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created MarkItDown文档转换中文版?

It is built and maintained by mapleshadow (@mapleshadow); the current version is v1.0.3.

💬 Comments