← Back to Skills Marketplace
mzlzyca

Extract Tables From Pdf

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ Security Clean
253
Downloads
0
Stars
0
Active Installs
6
Versions
Install in OpenClaw
/install extract-tables-from-pdf
Description
Extract tables from PDF documents using MinerU's table detection engine. Identifies and extracts structured table data from both native and scanned PDFs. Fea...
README (SKILL.md)

Extract Tables From Pdf

Convert and extract content from .pdf using MinerU (mineru-open-api).

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract tables from PDF (requires token)
mineru-open-api extract report.pdf -o ./out/

# With explicit table flag and OCR for scanned docs
mineru-open-api extract scanned.pdf --ocr --table -o ./out/

Authentication

Token required for extract and crawl:

mineru-open-api auth            # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supports local files and URLs
  • Requires token (mineru-open-api auth or MINERU_TOKEN env)
  • Supported input: .pdf
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (where applicable)

Notes

  • Table recognition requires extract with token. flash-extract does NOT support tables. Use --table flag (enabled by default).
  • Output goes to stdout by default; use -o \x3Cdir> to save to file
  • Binary formats (docx) require -o flag (cannot stream to stdout)
  • All progress/status messages go to stderr
  • MinerU is an open-source project by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill appears to be what it says: a wrapper around the mineru-open-api CLI that requires a MINERU_TOKEN. Before installing or using it: 1) Confirm mineru.net and the GitHub repo look legitimate and review their privacy/security docs; 2) Assume PDFs you process may be uploaded to MinerU servers—do not send sensitive or regulated data unless you trust the service or have an on-prem/self-hosted alternative; 3) Prefer installing in an isolated environment (container or VM) and inspect the npm/go package source if you require higher assurance; 4) Limit and rotate the MINERU_TOKEN and avoid storing it in shared shells; 5) If you need purely local-only processing, verify the CLI actually supports local-only mode or find an offline tool. If you want me to, I can fetch the mineru-open-api npm package or GitHub repo and highlight any concerning code or publish scripts before you install.
Capability Analysis
Type: OpenClaw Skill Name: extract-tables-from-pdf Version: 0.4.0 The skill is a legitimate wrapper for the MinerU document intelligence engine (developed by Shanghai AI Lab) used to extract tables from PDF files. It utilizes the 'mineru-open-api' CLI tool and requires a standard API token (MINERU_TOKEN) for its cloud-based extraction services. No evidence of malicious intent, data exfiltration beyond the stated purpose, or prompt injection was found in SKILL.md or _meta.json.
Capability Assessment
Purpose & Capability
Name/description match what is required: the skill requires the mineru-open-api binary and a MINERU_TOKEN, which are exactly what a MinerU-based PDF table extractor would need.
Instruction Scope
SKILL.md instructs the agent to run mineru-open-api commands, authenticate with MINERU_TOKEN, and operate on local files or URLs. This is within scope, but the instructions imply the CLI will use the token to contact MinerU services — meaning PDF contents may be transmitted to an external service; that is expected but important for privacy.
Install Mechanism
Installers are standard: npm package and go install from a GitHub repo. Both are reasonable for a CLI tool. Note: installing npm packages can run lifecycle scripts, so review the package or install in a controlled environment if you need extra safety.
Credentials
Only a single service credential (MINERU_TOKEN) is required and is declared as the primary credential. That is proportional to the described remote-API usage. The skill does not request unrelated credentials or system paths.
Persistence & Privilege
Skill does not request always:true or elevated platform persistence. It is user-invocable and can run autonomously (platform default), which is normal for skills of this type.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install extract-tables-from-pdf
  3. After installation, invoke the skill by name or use /extract-tables-from-pdf
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.1
SEO optimization v0.2.1
v0.2.0
SEO optimization v0.2.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Extract Tables from PDF - extract tables from PDF documents using MinerU. Use when a PDF contains da
Metadata
Slug extract-tables-from-pdf
Version 0.4.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 6
Frequently Asked Questions

What is Extract Tables From Pdf?

Extract tables from PDF documents using MinerU's table detection engine. Identifies and extracts structured table data from both native and scanned PDFs. Fea... It is an AI Agent Skill for Claude Code / OpenClaw, with 253 downloads so far.

How do I install Extract Tables From Pdf?

Run "/install extract-tables-from-pdf" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Extract Tables From Pdf free?

Yes, Extract Tables From Pdf is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Extract Tables From Pdf support?

Extract Tables From Pdf is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Extract Tables From Pdf?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments