← 返回 Skills 市场

OpenDataLoader PDF

Name: OpenDataLoader PDF
Author: zmy1006-sudo

作者 mingyuan · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install opendataloader-pdf-zmy

功能描述

Parse PDFs into Markdown, JSON, or HTML with OCR, table extraction, and AI-enriched descriptions for building RAG pipelines and knowledge bases.

安全使用建议

The skill looks like a legitimate PDF parser, but verify before you install or run anything: 1) Confirm the package source — find the opendataloader-pdf project on PyPI/GitHub and inspect the repository and release artifacts (the registry metadata currently lists no homepage/source). 2) Expect to run pip install which will fetch third-party code — only install from a trusted upstream and review the code if possible. 3) The hybrid backend opens a local port (default 5002); run it in a sandbox or controlled environment and ensure it does not inadvertently expose files or network access. 4) Be prepared to supply environment variables (JAVA_HOME, OPENDATALOADER_HYBRID_URL, and likely an LLM API key such as OPENAI_API_KEY) — treat those keys as sensitive and only provide them if you trust the package. 5) If you need higher assurance, ask the publisher for the canonical repository URL, versioned releases, and checksums, or run the package in an isolated VM/container and audit its network activity and files.

功能分析

Type: OpenClaw Skill Name: opendataloader-pdf-zmy Version: 1.0.0 The skill bundle provides documentation and instructions for 'opendataloader-pdf', a utility for converting PDF documents into Markdown, JSON, and HTML for use in RAG pipelines. The content across SKILL.md and the reference files is consistent with its stated purpose, offering standard Python API examples, CLI usage, and LangChain integrations. There are no signs of malicious intent, data exfiltration, or harmful prompt injection; the tool even includes a '--sanitize' flag to mitigate potential injection risks within source PDFs.

能力评估

ℹ Purpose & Capability

Name/description match the provided instructions and examples (PDF→Markdown/JSON/HTML, OCR, table extraction, hybrid AI backend). However the registry metadata lists 'source: unknown' and no homepage while SKILL.md claims a GitHub repo and pip package names — this mismatch reduces verifiability of the package origin.

⚠ Instruction Scope

SKILL.md instructs installing and running a pip package and a hybrid backend (opendataloader-pdf-hybrid) that listens on a port, and examples use local file system operations (expected). But the docs reference environment variables and services (JAVA_HOME, OPENDATALOADER_HYBRID_URL, and example use of OpenAIEmbeddings) that are not declared in the skill metadata — the agent may rely on secrets or network endpoints not surfaced to the registry.

ℹ Install Mechanism

This is an instruction-only skill (no install spec in registry). The SKILL.md explicitly tells users to pip install opendataloader-pdf and related packages; that will fetch third-party code from PyPI (or another index) at runtime. While normal for a library, the registry provides no pinned source or checksum and the registry metadata doesn't link to the claimed GitHub repo, so verifying the package before installation requires manual checking.

⚠ Credentials

The skill declares no required env vars or credentials in registry metadata, but the documentation references JAVA_HOME, OPENDATALOADER_HYBRID_URL, and examples call OpenAIEmbeddings (which typically requires an API key). This is a mismatch: sensitive environment variables or API keys may be needed in practice but are not declared, making it unclear what secrets the agent or user must provide.

✓ Persistence & Privilege

always is false and there are no install hooks declared. The skill does instruct starting a hybrid backend that listens on a port (network exposure) but it does not request permanent agent-level privileges in the registry metadata.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install opendataloader-pdf-zmy
安装完成后，直接呼叫该 Skill 的名称或使用 /opendataloader-pdf-zmy 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release: AI-ready PDF parser with Markdown/JSON/HTML output, OCR support, table extraction with bounding boxes, LangChain integration

元数据

Slug opendataloader-pdf-zmy

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

OpenDataLoader PDF 是什么？

Parse PDFs into Markdown, JSON, or HTML with OCR, table extraction, and AI-enriched descriptions for building RAG pipelines and knowledge bases. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 97 次。

如何安装 OpenDataLoader PDF？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install opendataloader-pdf-zmy」即可一键安装，无需额外配置。

OpenDataLoader PDF 是免费的吗？

是的，OpenDataLoader PDF 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

OpenDataLoader PDF 支持哪些平台？

OpenDataLoader PDF 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 OpenDataLoader PDF？

由 mingyuan（@zmy1006-sudo）开发并维护，当前版本 v1.0.0。