← Back to Skills Marketplace
pdf-extract-skill
by
Lucas Moyano
· GitHub ↗
· v0.0.10
· MIT-0
190
Downloads
0
Stars
0
Active Installs
10
Versions
Install in OpenClaw
/install pdf-extract-skill
Description
OpenClaw PDF extraction skill using OpenDataLoader. Use when the user wants to extract and process PDF content for RAG, embeddings, or coordinate-based citat...
Usage Guidance
This skill appears to be a coherent CLI guide for the opendataloader-pdf tool and its hybrid backend, and it contains reasonable security advice (use venv/container, bind hybrid backend to localhost, verify package provenance). Before installing or running anything: (1) verify the opendataloader-pdf project on PyPI/GitHub matches an official source (the registry entry for this skill has no homepage/source URL), (2) install only into an isolated environment (venv/container/VM) and use a pinned version, (3) inspect the package/repository and its dependencies for unexpected network behavior (especially if you enable image-description or hybrid flags), (4) run the hybrid backend with --host 127.0.0.1 and verify listeners before processing sensitive PDFs. If you need higher assurance, request the skill author to provide an explicit install spec and source URL.
Capability Analysis
Type: OpenClaw Skill
Name: pdf-extract-skill
Version: 0.0.10
The pdf-extract-skill is a well-documented bundle for using the OpenDataLoader CLI tool to extract text and metadata from PDFs. It includes detailed instructions for the AI agent (SKILL.md) and comprehensive security guidance (docs/security-before-install.md) that encourages the use of virtual environments, pinned package versions, and local-only execution. No evidence of malicious intent, data exfiltration, or harmful prompt injection was found; the skill focuses entirely on local PDF processing and RAG preparation.
Capability Assessment
Purpose & Capability
Name/description, required binaries (java, python3, opendataloader-pdf), and the SKILL.md all describe running the OpenDataLoader CLI and hybrid backend — this is coherent and expected for a PDF extraction skill. The skill does not request unrelated services or credentials.
Instruction Scope
SKILL.md only instructs running local CLI commands, starting a local hybrid backend, and using flags for OCR, pages, formats, etc. It does not ask to read unrelated files, access unrelated environment variables, or send data to external endpoints. It explicitly recommends binding the hybrid backend to localhost and contains a security checklist.
Install Mechanism
There is no install spec (instruction-only), which limits the surface written to disk — low intrinsic risk. However, because opendataloader-pdf must be installed by the user, the lack of an author-provided pinned install command or source/homepage in the registry is notable; the docs do advise verifying PyPI/GitHub metadata and using pinned installs in an isolated environment.
Credentials
The skill requests no environment variables, no credentials, and no config paths. This is proportionate to its stated CLI-only purpose.
Persistence & Privilege
The skill is not flagged always:true, is user-invocable, and contains no instructions to persistently modify agent/system configuration or other skills. Autonomous invocation is allowed by default but is not combined with other concerning flags.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install pdf-extract-skill - After installation, invoke the skill by name or use
/pdf-extract-skill - Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.0.10
- Improved and streamlined the skill description for clarity and focus on PDF processing scenarios.
- Reworked and expanded the metadata section to include required binaries and runtimes for OpenClaw and ClawDBot.
- Updated installation and quick-start instructions; removed inline package install policy in favor of referencing security-before-install.md.
- Simplified examples and hybrid backend instructions, removing explicit localhost binding and clarifying usage steps.
- Added reminders to keep safety filters enabled and manage outputs in run-specific folders for traceability.
- Retained all modular helper documents and troubleshooting steps, ensuring continued maintainability and usability.
v0.0.9
- Added skill metadata section with homepage, repository, and package links for easy reference.
- No changes to functionality or CLI usage.
- Documentation is otherwise unchanged except for minor metadata improvements.
v0.0.8
pdf-extract-skill 0.0.8
- Added explicit compatibility information and license to the SKILL.md ("Requires Java 11+ and Python 3.10+" and Apache-2.0).
- Clarified installation method (pip: opendataloader-pdf and [hybrid] extra for hybrid mode).
- No changes to core workflow or usage instructions.
- No code or file structure changes detected; documentation improved for clarity.
v0.0.7
- Added explicit YAML frontmatter for structured metadata (name, description, requirements, homepage, docs).
- Clarified prerequisite install and provenance-verification instructions; now reference docs/security-before-install.md directly.
- Updated hybrid backend launch commands to always bind to localhost (127.0.0.1) for improved security.
- Strengthened best practices and recommendations for predictable usage, output storage, and content safety.
- No changes to core CLI usage or operational architecture.
v0.0.6
- Package provenance and install validation instructions clarified: users must manually verify publisher, homepage, and official references before installing opendataloader-pdf.
- Section 4 ("Robust Prerequisites") expanded with step-by-step provenance verification and linked official documentation sources.
- No structural or architectural changes; all command profiles and flows unchanged.
- Overall emphasis on enhanced dependency security and due diligence for users.
v0.0.5
## pdf-extract-skill v0.0.5 Changelog
- Added explicit metadata for runtime requirements and install mechanism.
- Clarified required binaries: java (11+), opendataloader-pdf, and opendataloader-pdf-hybrid.
- Noted that this is an instruction-only skill: all dependencies must be installed manually by the user.
- No functional or CLI profile changes.
- General documentation extended to emphasize verification of prerequisites before use.
v0.0.4
- Improved installation and security advice: new guidance to avoid unpinned installs, prefer isolated environments, and verify package sources (see docs/security-before-install.md).
- Updated prerequisites checking: added pip index and pip show commands for validation.
- All other content and workflows remain unchanged.
v0.0.3
- Removed the internal roadmap file (internal/roadmap-mejoras-interno.md) from the repository.
- No functional or interface changes to the skill itself.
- Documentation remains up-to-date and unchanged in public-facing files.
v0.0.2
**Resumen:** Modularización de la documentación y añadido de guías temáticas.
- Se agregaron documentos auxiliares (.md) para guiar sobre instalación, perfiles de uso, seguridad, OCR/híbrido, RAG/citas y troubleshooting.
- El SKILL.md ahora delega tareas comunes o avanzadas a estos archivos, mejorando la mantenibilidad y navegación.
- No se modificó la lógica de los flujos de uso ni los comandos principales.
- Se mejoró la claridad sobre cuándo consultar cada documento temático.
v0.0.1
- Primera versión: integra la extracción avanzada de PDFs para OpenClaw usando OpenDataLoader PDF.
- Procesamiento totalmente local para privacidad y alta calidad de extracción (columnas, tablas, estructura, OCR).
- CLI simple, sin MCP ni wrappers; comandos directos para cliente y backend híbrido.
- Listo para flujos RAG y LLM: genera salida en json y markdown preparada para embeddings y citas.
- Incluye perfiles de uso, parámetros robustos, troubleshooting y buenas prácticas para OpenClaw.
- Requiere Java 11+ y Python 3.10+; instrucciones de instalación y verificación incluidas.
Metadata
Frequently Asked Questions
What is pdf-extract-skill?
OpenClaw PDF extraction skill using OpenDataLoader. Use when the user wants to extract and process PDF content for RAG, embeddings, or coordinate-based citat... It is an AI Agent Skill for Claude Code / OpenClaw, with 190 downloads so far.
How do I install pdf-extract-skill?
Run "/install pdf-extract-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is pdf-extract-skill free?
Yes, pdf-extract-skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does pdf-extract-skill support?
pdf-extract-skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created pdf-extract-skill?
It is built and maintained by Lucas Moyano (@secondport); the current version is v0.0.10.
More Skills