← 返回 Skills 市场

Extract Tables From Pdf

Name: Extract Tables From Pdf
Author: mzlzyca

作者 mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0

cross-platform ✓ 安全检测通过

253

总下载

当前安装

版本数

在 OpenClaw 中安装

/install extract-tables-from-pdf

功能描述

Extract tables from PDF documents using MinerU's table detection engine. Identifies and extracts structured table data from both native and scanned PDFs. Fea...

使用说明 (SKILL.md)

Extract Tables From Pdf

Convert and extract content from .pdf using MinerU (mineru-open-api).

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract tables from PDF (requires token)
mineru-open-api extract report.pdf -o ./out/

# With explicit table flag and OCR for scanned docs
mineru-open-api extract scanned.pdf --ocr --table -o ./out/

Authentication

Token required for extract and crawl:

mineru-open-api auth            # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

Supports local files and URLs
Requires token (mineru-open-api auth or MINERU_TOKEN env)
Supported input: .pdf
Language hint with --language (default: ch, use en for English)
Page range with --pages (where applicable)

Notes

Table recognition requires extract with token. flash-extract does NOT support tables. Use --table flag (enabled by default).
Output goes to stdout by default; use -o \x3Cdir> to save to file
Binary formats (docx) require -o flag (cannot stream to stdout)
All progress/status messages go to stderr
MinerU is an open-source project by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

安全使用建议

This skill appears to be what it says: a wrapper around the mineru-open-api CLI that requires a MINERU_TOKEN. Before installing or using it: 1) Confirm mineru.net and the GitHub repo look legitimate and review their privacy/security docs; 2) Assume PDFs you process may be uploaded to MinerU servers—do not send sensitive or regulated data unless you trust the service or have an on-prem/self-hosted alternative; 3) Prefer installing in an isolated environment (container or VM) and inspect the npm/go package source if you require higher assurance; 4) Limit and rotate the MINERU_TOKEN and avoid storing it in shared shells; 5) If you need purely local-only processing, verify the CLI actually supports local-only mode or find an offline tool. If you want me to, I can fetch the mineru-open-api npm package or GitHub repo and highlight any concerning code or publish scripts before you install.

功能分析

Type: OpenClaw Skill Name: extract-tables-from-pdf Version: 0.4.0 The skill is a legitimate wrapper for the MinerU document intelligence engine (developed by Shanghai AI Lab) used to extract tables from PDF files. It utilizes the 'mineru-open-api' CLI tool and requires a standard API token (MINERU_TOKEN) for its cloud-based extraction services. No evidence of malicious intent, data exfiltration beyond the stated purpose, or prompt injection was found in SKILL.md or _meta.json.

能力评估

✓ Purpose & Capability

Name/description match what is required: the skill requires the mineru-open-api binary and a MINERU_TOKEN, which are exactly what a MinerU-based PDF table extractor would need.

ℹ Instruction Scope

SKILL.md instructs the agent to run mineru-open-api commands, authenticate with MINERU_TOKEN, and operate on local files or URLs. This is within scope, but the instructions imply the CLI will use the token to contact MinerU services — meaning PDF contents may be transmitted to an external service; that is expected but important for privacy.

✓ Install Mechanism

Installers are standard: npm package and go install from a GitHub repo. Both are reasonable for a CLI tool. Note: installing npm packages can run lifecycle scripts, so review the package or install in a controlled environment if you need extra safety.

✓ Credentials

Only a single service credential (MINERU_TOKEN) is required and is declared as the primary credential. That is proportional to the described remote-API usage. The skill does not request unrelated credentials or system paths.

✓ Persistence & Privilege

Skill does not request always:true or elevated platform persistence. It is user-invocable and can run autonomously (platform default), which is normal for skills of this type.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install extract-tables-from-pdf
安装完成后，直接呼叫该 Skill 的名称或使用 /extract-tables-from-pdf 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.4.0

SEO: expand description for better ClawHub vector search discovery

v0.3.0

Rollback to original version

v0.2.1

SEO optimization v0.2.1

v0.2.0

SEO optimization v0.2.0

v1.0.1

Fix: declare MINERU_TOKEN credential in metadata

v1.0.0

Extract Tables from PDF - extract tables from PDF documents using MinerU. Use when a PDF contains da

元数据

Slug extract-tables-from-pdf

版本 0.4.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 6

常见问题