← 返回 Skills 市场
yanweiliang323868-del

docx-md

作者 yanweiliang323868-del · GitHub ↗ · v1.0.1
cross-platform ⚠ suspicious
663
总下载
0
收藏
1
当前安装
2
版本数
在 OpenClaw 中安装
/install docx-md
功能描述
Low-level docx format tool for AI document review. Three operations: (1) read docx → output compact Markdown or JSON; (2) apply edits JSON back to docx (trac...
使用说明 (SKILL.md)

Word DOCX (OOXML) – docx-md

Overview

Three entry points: Read – output compact Markdown (default, token-efficient) or full JSON; Modify – apply AI-returned edits to the docx; Finalize – accept all revisions and remove all comments. Implemented via OOXML (ZIP + XML). No commercial Word libraries required.

Workflow

Goal Action
Get document for AI Read: run read script → Markdown (default) or JSON. Markdown includes \x3C!-- b:N --> blockIndex markers for edit targeting.
Apply AI edits to docx Modify: run apply script with docx + edits JSON → new docx with track changes and comments.
Deliver final version Finalize: run finalize script → new docx with no revisions/comments.

LLM-oriented pipeline

  1. Read – Parse docx; output Markdown (default) or JSON. Markdown uses \x3C!-- b:N --> prefix per block; revisions: {+inserted+} {-deleted-}; comments: [comment: text].
  2. Send the output + task prompt to the model; require the model to output only the edit JSON: blockIndex, originalContent, content, basis .
  3. Modify – Script infers op from blockIndex, originalContent, content, basis; converts to OOXML (w:ins / w:del / comment anchors), then write back to Word.
  4. Finalize – When the user confirms, run finalize to accept all revisions and remove all comments.

See references/llm-pipeline.md for the Markdown format, JSON schema, and edit format.

1. Read

  • Parse word/document.xml (w:body only) and word/comments.xml.
  • Output Markdown (default) or JSON. Markdown is compact and token-efficient.

Script: scripts/read_docx.py

# Default: Markdown output (token-efficient)
python3 skills/docx-md/scripts/read_docx.py document.docx
python3 skills/docx-md/scripts/read_docx.py document.docx -o result.md

# JSON output (full structure)
python3 skills/docx-md/scripts/read_docx.py document.docx -f json -o result.json

Options:

  • -o, --output – Output path (default: stdout)
  • -f, --formatmd (default) or json

2. Modify

  • Input: docx path + edit JSON { modifications: [{ blockIndex, originalContent, content, basis }] } (same blockIndex as read output).
  • Flow: Convert JSON to OOXML (w:ins / w:del / comments), then write back to Word.

Script: scripts/apply_edits_docx.py. Use - as edits file to read JSON from stdin.

python3 skills/docx-md/scripts/apply_edits_docx.py document.docx edits.json -o output.docx
python3 skills/docx-md/scripts/apply_edits_docx.py document.docx - -o output.docx  # stdin

Options: --author (default: "Review")

3. Finalize

  • Accept all revisions (flatten to final text), remove all comments. Save as new docx.
  • Uses docx-revisions to accept revisions (preserves encoding), then removes comment markup via regex on raw bytes.

Script: scripts/finalize_docx.py

Requires: pip install docx-revisions (see requirements.txt)

python3 skills/docx-md/scripts/finalize_docx.py input.docx -o output.docx

Resources

scripts/

  • read_docx.py – Read: python3 scripts/read_docx.py document.docx [-o out.md] [-f md|json]
  • apply_edits_docx.py – Modify: python3 scripts/apply_edits_docx.py document.docx edits.json -o output.docx
  • finalize_docx.py – Finalize: python3 scripts/finalize_docx.py input.docx -o output.docx

references/

  • ooxml.md – OOXML layout (document.xml, comments.xml, revisions, comments)
  • llm-pipeline.md – Pipeline: read → Markdown/JSON → model edits → modify; defines Markdown format, JSON shape (blockIndex, originalContent, content, basis)
安全使用建议
This package appears coherent and implements what it advertises. Before installing or running: (1) review and test on copies of documents (the finalize script uses regex on XML which can be fragile); (2) be aware the code is GPL-3.0 — incorporating it into proprietary code may have license implications; (3) install the required Python packages (lxml, docx-revisions) in an isolated environment; (4) because the source is 'unknown', if you need high assurance consider auditing the scripts (they are included) or running them in a sandbox; and (5) always supply explicit file paths — the scripts operate on files you give them and do not attempt network communication or secret collection.
功能分析
Type: OpenClaw Skill Name: docx-md Version: 1.0.1 The skill's purpose is legitimate, and there is no evidence of malicious intent or prompt injection attempts against the AI agent in the markdown files. However, the Python scripts (`apply_edits_docx.py`, `finalize_docx.py`, `read_docx.py`) are vulnerable to path traversal, as they accept user-controlled file paths for input and output without explicit sanitization, potentially allowing arbitrary file read/write. Additionally, `finalize_docx.py` uses regex for XML modification to remove comments, which is a brittle approach that could lead to document corruption or unexpected behavior.
能力评估
Purpose & Capability
The name/description (docx → compact markdown/JSON → apply edits → finalize) match the actual artifacts: three scripts (read, apply, finalize), requirements (lxml, docx-revisions), and documentation. There are no requested environment variables, binaries, or external credentials unrelated to DOCX processing.
Instruction Scope
SKILL.md and the scripts limit actions to reading a supplied .docx, producing Markdown/JSON, applying edits to a supplied .docx, and finalizing (accept changes/remove comments). All file IO is explicitly on user-provided paths. One implementation detail to note: finalize_docx removes comment markup by decoding document.xml as UTF-8 and applying regex replacements on raw XML bytes (fragile approach that can corrupt edge-case documents), but this is a scope/robustness issue rather than extraneous or malicious behavior.
Install Mechanism
There is no platform install spec (instruction-only install). The bundled requirements.txt lists lxml and docx-revisions (both reasonable for OOXML manipulation). No downloads from arbitrary URLs or archive extraction are present.
Credentials
The skill requests no environment variables or secrets. The only runtime inputs are file paths supplied by the user; dependencies are standard Python packages relevant to the stated functions.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or global agent configuration, and does not require ongoing background presence. It performs one-shot file operations when invoked.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install docx-md
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /docx-md 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- LICENSE.txt removed and replaced with LICENSE. - Documentation updated to specify that the "finalize" script now uses the docx-revisions package to accept revisions (preserving encoding) and removes comments with a regex on raw bytes. - Added requirement for the docx-revisions package in the finalize step, with install guidance in the documentation.
v1.0.0
- Initial release of docx-md: a low-level DOCX tool for AI document workflows. - Supports three operations: read (DOCX to compact Markdown or JSON), modify (apply edits as tracked changes/comments), and finalize (accept revisions and remove comments). - Token-efficient Markdown exports, with edit markers for AI targeting. - Fully open, no reliance on commercial Word libraries. - Includes documentation for LLM integration and script usage examples.
元数据
Slug docx-md
版本 1.0.1
许可证
累计安装 1
当前安装数 1
历史版本数 2
常见问题

docx-md 是什么?

Low-level docx format tool for AI document review. Three operations: (1) read docx → output compact Markdown or JSON; (2) apply edits JSON back to docx (trac... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 663 次。

如何安装 docx-md?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install docx-md」即可一键安装,无需额外配置。

docx-md 是免费的吗?

是的,docx-md 完全免费(开源免费),可自由下载、安装和使用。

docx-md 支持哪些平台?

docx-md 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 docx-md?

由 yanweiliang323868-del(@yanweiliang323868-del)开发并维护,当前版本 v1.0.1。

💬 留言讨论