Description

Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

README (SKILL.md)

PyMuPDF PDF

Name: PyMuPDF PDF Parser Clawdbot Skill
Author: kesslerio

Overview

Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.

Prereqs / when to read references

If you hit import errors (PyMuPDF not installed) or Nix libstdc++ issues, read:

references/pymupdf-notes.md

Quick start (single PDF)

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
  --format md \
  --outroot ./pymupdf-output

Options

--format md|json|both (default: md)
--images to extract images
--tables to extract a simple line-based table JSON (quick/rough)
--outroot DIR to change output root
--lang adds a language hint into JSON output metadata

Output conventions

Create ./pymupdf-output/\x3Cpdf-basename>/ by default.
Markdown output: output.md
JSON output: output.json (includes lang)
Images: images/ subdir
Tables: tables.json (rough line-based)

Notes

PyMuPDF is fast but less robust on complex PDFs.
For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.

Usage Guidance

This skill appears safe for its stated local PDF parsing purpose. Use it with PDFs you intend to process, choose an output directory you are comfortable writing to, and install PyMuPDF from a trusted Python environment.

Capability Analysis

Type: OpenClaw Skill Name: Developer: Version: Description: OpenClaw Agent Skill The OpenClaw skill bundle provides a local PDF parsing utility using PyMuPDF. The `scripts/pymupdf_parse.py` script correctly implements the stated functionality, reading a PDF and writing extracted content (Markdown, JSON, images, tables) to a local output directory. There is no evidence of network communication, access to sensitive system files or environment variables, external command execution, or obfuscation. The `SKILL.md` and `README.md` files contain only descriptive information and standard usage/installation instructions, with no prompt injection attempts aiming for malicious actions.

Capability Assessment

✓ Purpose & Capability

The README, SKILL.md, and script all align around locally parsing a user-provided PDF into Markdown, JSON, images, or simple table output.

✓ Instruction Scope

Instructions are limited to user-directed parsing commands and documented options; there are no hidden autonomous workflows, credential requests, or unrelated actions.

ℹ Install Mechanism

The skill is listed as instruction-only but the README asks users to install PyMuPDF with an unpinned pip command. This is expected for the purpose, but users should install it from a trusted environment.

✓ Credentials

Local PDF reads and local output writes are proportionate to the stated parsing purpose and are scoped to the provided PDF path and output directory.

✓ Persistence & Privilege

The script creates output files and folders only; it does not request credentials, elevated privileges, background persistence, or ongoing access.

Version History

v1.0.0

Initial release of PyMuPDF PDF parsing skill. - Fast local PDF extraction using PyMuPDF (fitz) with Markdown output by default. - Supports optional JSON, image extraction, and simple table outputs. - Designed for single-PDF runs with per-document output directories. - Includes command-line options for output format, images, tables, and language metadata. - Recommended as a quick alternative or fallback when heavier PDF parsers are unavailable.

Metadata

Slug pymupdf-pdf-parser-clawdbot-skill

Version 1.0.0

License —

All-time Installs 41

Active Installs 39

Total Versions 1

Frequently Asked Questions

What is PyMuPDF PDF Parser Clawdbot Skill?

Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders. It is an AI Agent Skill for Claude Code / OpenClaw, with 5382 downloads so far.

How do I install PyMuPDF PDF Parser Clawdbot Skill?

Run "/install pymupdf-pdf-parser-clawdbot-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PyMuPDF PDF Parser Clawdbot Skill free?

Yes, PyMuPDF PDF Parser Clawdbot Skill is completely free (open-source). You can download, install and use it at no cost.

Which platforms does PyMuPDF PDF Parser Clawdbot Skill support?

PyMuPDF PDF Parser Clawdbot Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PyMuPDF PDF Parser Clawdbot Skill?

It is built and maintained by kesslerio (@kesslerio); the current version is v1.0.0.

More Skills

PyMuPDF PDF Parser Clawdbot Skill