← Back to Skills Marketplace

OpenDataLoader PDF

Name: OpenDataLoader PDF
Author: zmy1006-sudo

by mingyuan · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install opendataloader-pdf-zmy

Description

Parse PDFs into Markdown, JSON, or HTML with OCR, table extraction, and AI-enriched descriptions for building RAG pipelines and knowledge bases.

Usage Guidance

The skill looks like a legitimate PDF parser, but verify before you install or run anything: 1) Confirm the package source — find the opendataloader-pdf project on PyPI/GitHub and inspect the repository and release artifacts (the registry metadata currently lists no homepage/source). 2) Expect to run pip install which will fetch third-party code — only install from a trusted upstream and review the code if possible. 3) The hybrid backend opens a local port (default 5002); run it in a sandbox or controlled environment and ensure it does not inadvertently expose files or network access. 4) Be prepared to supply environment variables (JAVA_HOME, OPENDATALOADER_HYBRID_URL, and likely an LLM API key such as OPENAI_API_KEY) — treat those keys as sensitive and only provide them if you trust the package. 5) If you need higher assurance, ask the publisher for the canonical repository URL, versioned releases, and checksums, or run the package in an isolated VM/container and audit its network activity and files.

Capability Analysis

Type: OpenClaw Skill Name: opendataloader-pdf-zmy Version: 1.0.0 The skill bundle provides documentation and instructions for 'opendataloader-pdf', a utility for converting PDF documents into Markdown, JSON, and HTML for use in RAG pipelines. The content across SKILL.md and the reference files is consistent with its stated purpose, offering standard Python API examples, CLI usage, and LangChain integrations. There are no signs of malicious intent, data exfiltration, or harmful prompt injection; the tool even includes a '--sanitize' flag to mitigate potential injection risks within source PDFs.

Capability Assessment

ℹ Purpose & Capability

Name/description match the provided instructions and examples (PDF→Markdown/JSON/HTML, OCR, table extraction, hybrid AI backend). However the registry metadata lists 'source: unknown' and no homepage while SKILL.md claims a GitHub repo and pip package names — this mismatch reduces verifiability of the package origin.

⚠ Instruction Scope

SKILL.md instructs installing and running a pip package and a hybrid backend (opendataloader-pdf-hybrid) that listens on a port, and examples use local file system operations (expected). But the docs reference environment variables and services (JAVA_HOME, OPENDATALOADER_HYBRID_URL, and example use of OpenAIEmbeddings) that are not declared in the skill metadata — the agent may rely on secrets or network endpoints not surfaced to the registry.

ℹ Install Mechanism

This is an instruction-only skill (no install spec in registry). The SKILL.md explicitly tells users to pip install opendataloader-pdf and related packages; that will fetch third-party code from PyPI (or another index) at runtime. While normal for a library, the registry provides no pinned source or checksum and the registry metadata doesn't link to the claimed GitHub repo, so verifying the package before installation requires manual checking.

⚠ Credentials

The skill declares no required env vars or credentials in registry metadata, but the documentation references JAVA_HOME, OPENDATALOADER_HYBRID_URL, and examples call OpenAIEmbeddings (which typically requires an API key). This is a mismatch: sensitive environment variables or API keys may be needed in practice but are not declared, making it unclear what secrets the agent or user must provide.

✓ Persistence & Privilege

always is false and there are no install hooks declared. The skill does instruct starting a hybrid backend that listens on a port (network exposure) but it does not request permanent agent-level privileges in the registry metadata.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install opendataloader-pdf-zmy
After installation, invoke the skill by name or use /opendataloader-pdf-zmy
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release: AI-ready PDF parser with Markdown/JSON/HTML output, OCR support, table extraction with bounding boxes, LangChain integration

Metadata

Slug opendataloader-pdf-zmy

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is OpenDataLoader PDF?

Parse PDFs into Markdown, JSON, or HTML with OCR, table extraction, and AI-enriched descriptions for building RAG pipelines and knowledge bases. It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install OpenDataLoader PDF?

Run "/install opendataloader-pdf-zmy" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is OpenDataLoader PDF free?

Yes, OpenDataLoader PDF is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does OpenDataLoader PDF support?

OpenDataLoader PDF is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created OpenDataLoader PDF?

It is built and maintained by mingyuan (@zmy1006-sudo); the current version is v1.0.0.

More Skills