← 返回 Skills 市场

multimodal-parser

Name: multimodal-parser
Author: ayalili

作者 Ayalili · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ 安全检测通过

630

总下载

当前安装

版本数

在 OpenClaw 中安装

/install multimodal-parser

功能描述

Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing

安全使用建议

What to consider before installing/using: - Trust & origin: the package has no homepage and an unknown source; run it only if you trust the author or after reviewing the code (you have the code). - Permissions & sandboxing: the skill uses Deno to run subprocesses and read files. Grant it only the minimal filesystem and subprocess permissions, or run in a sandbox/container. - Dependencies: it requires external CLI tools (tesseract, pdftotext/poppler, pandoc, whisper, ffmpeg). Install those from official package repositories to avoid malicious binaries. - Network/supply-chain: the code imports zod from deno.land at runtime — this fetch is expected but is a supply-chain/network fetch; if you need offline assurance, vendor the dependency or audit the fetched module. - Data sensitivity: the skill processes user-provided files locally and does not appear to transmit results externally, but avoid testing on highly sensitive files until you confirm runtime permissions and behavior in your environment. - Sanity checks: test on non-sensitive sample files first; verify produced outputs and any error messages. If you need stronger assurance, run the code in an isolated VM and/or review and pin remote dependency versions.

功能分析

Type: OpenClaw Skill Name: multimodal-parser Version: 1.0.1 The skill is a legitimate multi-modal content parser that uses standard open-source tools (Tesseract, Poppler, Pandoc, and Whisper) to process images, PDFs, Word documents, and audio files. The implementation in `index.ts` uses `Deno.Command` with argument arrays to execute these local binaries, which is a standard and relatively safe practice. There is no evidence of data exfiltration, malicious execution, or prompt injection; the code's behavior aligns perfectly with its stated purpose in `SKILL.md`.

能力评估

✓ Purpose & Capability

Name/description match the implementation: the code implements OCR, PDF/docx conversion and audio transcription via tesseract/pdftotext/pandoc/whisper. The SKILL.md's suggested dependency list aligns with what the code invokes.

ℹ Instruction Scope

Runtime instructions and README ask you to install system packages; the code runs those external CLI tools on a user-supplied file path and reads file metadata. It does not attempt to read unrelated system files, access credentials, or send data to remote endpoints, but it will require filesystem read permissions and the ability to spawn subprocesses.

✓ Install Mechanism

No automated install spec is provided (instruction-only for installing system packages). The code imports zod from deno.land at runtime (remote module fetch), which is normal for Deno but is a supply-chain/network fetch to be aware of.

✓ Credentials

The skill declares no environment variables, no credentials, and no config paths. The code does not reference any hidden env vars or secrets.

✓ Persistence & Privilege

always:false and default invocation settings. The skill does not persist or modify other skills or global configuration; it only executes when invoked and uses local subprocesses/IO.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install multimodal-parser
安装完成后，直接呼叫该 Skill 的名称或使用 /multimodal-parser 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Removed the skill.yaml file to streamline configuration. - Updated SKILL.md: moved metadata (name, slug, description) into frontmatter. - Cleaned up documentation structure by removing version, author, license, keywords, runtime, and entry fields from SKILL.md frontmatter.

v1.0.0

multimodal-parser v1.0.0 – Initial Release - Unified API for parsing images, PDFs, DOCX files, and audio into structured text. - Built-in OCR for images, transcription for audio, and document parsing with zero configuration required. - Supports multiple output formats: plain text, Markdown, and structured JSON for LLM-ready processing. - Helpful error messages with suggested dependency install commands. - Customizable parameters: file type, output format, OCR language, audio model, and PDF page range.

元数据

Slug multimodal-parser

版本 1.0.1

许可证 MIT-0

累计安装 3

当前安装数 3

历史版本数 2

常见问题

multimodal-parser 是什么？

Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 630 次。

如何安装 multimodal-parser？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install multimodal-parser」即可一键安装，无需额外配置。

multimodal-parser 是免费的吗？

是的，multimodal-parser 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

multimodal-parser 支持哪些平台？

multimodal-parser 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 multimodal-parser？

由 Ayalili（@ayalili）开发并维护，当前版本 v1.0.1。