← 返回 Skills 市场
mzlzyca

Extract Tables From Pdf

作者 mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ 安全检测通过
253
总下载
0
收藏
0
当前安装
6
版本数
在 OpenClaw 中安装
/install extract-tables-from-pdf
功能描述
Extract tables from PDF documents using MinerU's table detection engine. Identifies and extracts structured table data from both native and scanned PDFs. Fea...
使用说明 (SKILL.md)

Extract Tables From Pdf

Convert and extract content from .pdf using MinerU (mineru-open-api).

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract tables from PDF (requires token)
mineru-open-api extract report.pdf -o ./out/

# With explicit table flag and OCR for scanned docs
mineru-open-api extract scanned.pdf --ocr --table -o ./out/

Authentication

Token required for extract and crawl:

mineru-open-api auth            # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supports local files and URLs
  • Requires token (mineru-open-api auth or MINERU_TOKEN env)
  • Supported input: .pdf
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (where applicable)

Notes

  • Table recognition requires extract with token. flash-extract does NOT support tables. Use --table flag (enabled by default).
  • Output goes to stdout by default; use -o \x3Cdir> to save to file
  • Binary formats (docx) require -o flag (cannot stream to stdout)
  • All progress/status messages go to stderr
  • MinerU is an open-source project by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
安全使用建议
This skill appears to be what it says: a wrapper around the mineru-open-api CLI that requires a MINERU_TOKEN. Before installing or using it: 1) Confirm mineru.net and the GitHub repo look legitimate and review their privacy/security docs; 2) Assume PDFs you process may be uploaded to MinerU servers—do not send sensitive or regulated data unless you trust the service or have an on-prem/self-hosted alternative; 3) Prefer installing in an isolated environment (container or VM) and inspect the npm/go package source if you require higher assurance; 4) Limit and rotate the MINERU_TOKEN and avoid storing it in shared shells; 5) If you need purely local-only processing, verify the CLI actually supports local-only mode or find an offline tool. If you want me to, I can fetch the mineru-open-api npm package or GitHub repo and highlight any concerning code or publish scripts before you install.
功能分析
Type: OpenClaw Skill Name: extract-tables-from-pdf Version: 0.4.0 The skill is a legitimate wrapper for the MinerU document intelligence engine (developed by Shanghai AI Lab) used to extract tables from PDF files. It utilizes the 'mineru-open-api' CLI tool and requires a standard API token (MINERU_TOKEN) for its cloud-based extraction services. No evidence of malicious intent, data exfiltration beyond the stated purpose, or prompt injection was found in SKILL.md or _meta.json.
能力评估
Purpose & Capability
Name/description match what is required: the skill requires the mineru-open-api binary and a MINERU_TOKEN, which are exactly what a MinerU-based PDF table extractor would need.
Instruction Scope
SKILL.md instructs the agent to run mineru-open-api commands, authenticate with MINERU_TOKEN, and operate on local files or URLs. This is within scope, but the instructions imply the CLI will use the token to contact MinerU services — meaning PDF contents may be transmitted to an external service; that is expected but important for privacy.
Install Mechanism
Installers are standard: npm package and go install from a GitHub repo. Both are reasonable for a CLI tool. Note: installing npm packages can run lifecycle scripts, so review the package or install in a controlled environment if you need extra safety.
Credentials
Only a single service credential (MINERU_TOKEN) is required and is declared as the primary credential. That is proportional to the described remote-API usage. The skill does not request unrelated credentials or system paths.
Persistence & Privilege
Skill does not request always:true or elevated platform persistence. It is user-invocable and can run autonomously (platform default), which is normal for skills of this type.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install extract-tables-from-pdf
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /extract-tables-from-pdf 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.1
SEO optimization v0.2.1
v0.2.0
SEO optimization v0.2.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Extract Tables from PDF - extract tables from PDF documents using MinerU. Use when a PDF contains da
元数据
Slug extract-tables-from-pdf
版本 0.4.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 6
常见问题

Extract Tables From Pdf 是什么?

Extract tables from PDF documents using MinerU's table detection engine. Identifies and extracts structured table data from both native and scanned PDFs. Fea... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 253 次。

如何安装 Extract Tables From Pdf?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install extract-tables-from-pdf」即可一键安装,无需额外配置。

Extract Tables From Pdf 是免费的吗?

是的,Extract Tables From Pdf 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Extract Tables From Pdf 支持哪些平台?

Extract Tables From Pdf 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Extract Tables From Pdf?

由 mzlzyCA(@mzlzyca)开发并维护,当前版本 v0.4.0。

💬 留言讨论