← Back to Skills Marketplace
Pdfreader
by
Ivan Cetta
· GitHub ↗
· v1.0.3
643
Downloads
2
Stars
4
Active Installs
4
Versions
Install in OpenClaw
/install pdfreader
Description
Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format.
Usage Guidance
This skill appears to do what it claims: extract text and metadata from PDFs using PyMuPDF. Before installing or running it, consider: 1) Run pip install pymupdf in an isolated environment (virtualenv/container) — PyMuPDF includes compiled code from PyPI. 2) The script enforces 'within current working directory' but allows subdirectories and does not resolve symlinks; avoid placing untrusted symlinks inside the working directory to prevent escapes. 3) Because the source/homepage is unknown, prefer running the script in a sandbox and review the code yourself (or run it on non-sensitive PDFs) before giving it access to important files. If you need stricter confinement (no subdirectories or symlink protections), request a code change to use os.path.realpath checks and a configurable safe directory.
Capability Analysis
Type: OpenClaw Skill
Name: pdfreader
Version: 1.0.3
The OpenClaw skill bundle is designed to extract text from PDF files using PyMuPDF. The `SKILL.md` documentation provides clear, non-malicious instructions and explicitly states security restrictions. The `pdf_reader.py` script implements robust path validation (`is_safe_input_path`, `is_safe_output_path`) to prevent path traversal and restrict file operations to the current working directory and specific file types (.pdf for input, .json for output). There are no signs of data exfiltration, malicious execution, persistence, or prompt injection attempts against the agent. The code is well-contained and aligns with its stated purpose and security measures.
Capability Assessment
Purpose & Capability
Name/description match the files and instructions. The code uses PyMuPDF (fitz) to open PDFs, extract text and metadata, and produce JSON — exactly what the description promises. No extraneous binaries, credentials, or services are requested.
Instruction Scope
SKILL.md usage aligns with the script's behavior (pip install pymupdf; run python pdf_reader.py ...). The SKILL.md states files must be 'within the current working directory' and forbids '../' traversal; the script enforces that by checking absolute paths are inside os.getcwd(). However, the script allows files in subdirectories of the current working directory (contrary to an implication that only the top-level cwd is allowed) and uses os.path.abspath rather than realpath, so a symlink inside the cwd that points outside could bypass the directory restriction. This is an implementation caveat rather than evidence of malicious behavior.
Install Mechanism
No install spec is embedded (instruction-only install guidance in SKILL.md recommends 'pip install pymupdf'). That is low-risk from the skill bundle perspective. Note: installing PyMuPDF via pip will run compiled extension code from PyPI — treat pip installs from unknown sources with standard care.
Credentials
The skill requests no environment variables, credentials, or config paths. The functionality does not require additional secrets. The code does not read environment variables or access unrelated system configuration.
Persistence & Privilege
always is false and the skill does not request persistent/autoincluded privileges. It does not modify other skills or system-wide settings. Autonomous invocation remains the platform default but is not combined with other concerning privileges here.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install pdfreader - After installation, invoke the skill by name or use
/pdfreader - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.3
Fixed instruction mismatch: Separated input (.pdf) and output (.json) validation. Added security documentation to SKILL.md
v1.0.2
Security fix: Added .pdf extension validation to prevent arbitrary file read (CVE-like vulnerability)
v1.0.1
Security fix: Added path validation to prevent arbitrary file write (CVE-like vulnerability)
v1.0.0
Initial release of PDF Reader Skill for OpenClaw:
- Extracts text from any PDF using PyMuPDF.
- Supports large and multi-page PDF files.
- Outputs extracted content in JSON for AI reading compatibility.
- Handles text encoding issues.
- Displays PDF metadata (title, author, etc.).
- Includes clear installation and usage instructions.
Metadata
Frequently Asked Questions
What is Pdfreader?
Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format. It is an AI Agent Skill for Claude Code / OpenClaw, with 643 downloads so far.
How do I install Pdfreader?
Run "/install pdfreader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Pdfreader free?
Yes, Pdfreader is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Pdfreader support?
Pdfreader is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Pdfreader?
It is built and maintained by Ivan Cetta (@nantes); the current version is v1.0.3.
More Skills