← 返回 Skills 市场
jackkuo666

Sci Data Extractor

作者 JackKuo666 · GitHub ↗ · v0.1.0
cross-platform ⚠ suspicious
463
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install sci-data-extractor
功能描述
AI-powered tool for extracting structured data from scientific literature PDFs
安全使用建议
What to check before installing or running this skill: - Origin: The skill's Source/Homepage are unknown; prefer code from a trusted repository. If you got this from an external repo, inspect the repo and maintainer reputation. - API keys: The code will send extracted text (potentially entire PDF contents) to external LLMs/Mathpix. Only use API keys with limited scope or billing controls, and avoid uploading sensitive or private documents. - Registry mismatch: The registry lists no required env vars but the SKILL.md and code require EXTRACTOR_API_KEY (or API_KEY), EXTRACTOR_BASE_URL and optionally Mathpix keys. Do not provide secrets until you confirm how they are used and where traffic goes. - LLM/provider inconsistency: The README defaults to a Claude model name but the code uses the openai Python client and a configurable base_url. Verify that the client and base_url will actually work with your provider; otherwise keys might be misdirected or fail. - Avoid running curl | sh blindly: The installer suggests running an external script (https://astral.sh/uv/install.sh). Do not run that unless you trust the source—prefer to install uv/venv tooling via package manager or inspect the script first. - Sandbox test: Run the tool in a disposable environment (VM/container) first, with a throwaway API key and non-sensitive PDFs. Monitor network requests during a test run to confirm endpoints and data sent. - Code review focus: The key network actions are in extractor.py (requests to Mathpix and the OpenAI client usage). Confirm there are no hidden endpoints or telemetry sending keys elsewhere. If you are not comfortable, do not provide production API keys. If you want, I can point out the exact lines in the code that perform the network calls and the places where environment variables are read, or produce a minimal checklist for a safe sandboxed test run.
功能分析
Type: OpenClaw Skill Name: sci-data-extractor Version: 0.1.0 The skill is classified as suspicious due to several critical vulnerabilities that could be exploited by a malicious user, rather than intentional malicious behavior by the skill author. The most significant risks include an arbitrary file write vulnerability in `extractor.py` and `batch_extract.py` where user-controlled output paths could lead to writing to sensitive system files (e.g., `/etc/passwd`, `~/.ssh/authorized_keys`). Additionally, there's a potential Server-Side Request Forgery (SSRF) via the `EXTRACTOR_BASE_URL` configuration, allowing LLM API calls to be redirected to internal network resources. Local File Inclusion (LFI) is also possible through user-controlled PDF paths in `PDFProcessor.extract_text_pymupdf`. Finally, the installation instructions in `SKILL.md` and `README.md` use `curl | sh` for `uv` installation, which introduces a supply chain risk.
能力评估
Purpose & Capability
Name/description match the code and docs: the project extracts text from PDFs (PyMuPDF or Mathpix) and sends content to an LLM to produce structured outputs. That capability legitimately requires an LLM API key and optionally Mathpix credentials. However, the registry metadata declares no required environment variables or primary credential while the SKILL.md and code clearly expect EXTRACTOR_API_KEY (or API_KEY), EXTRACTOR_BASE_URL, and optional MATHPIX_APP_ID / MATHPIX_APP_KEY — the missing declaration in registry is an inconsistency and reduces transparency.
Instruction Scope
Runtime instructions and code will read local PDFs and .env, upload PDFs to Mathpix if chosen, and send extracted text to an external LLM endpoint. That is coherent with the stated purpose, but it does mean entire document content (potentially sensitive or copyrighted material) is transmitted to third-party services. The SKILL.md also suggests running external install scripts (see next dimension).
Install Mechanism
There is no formal install spec in the registry, but the SKILL.md recommends installing the 'uv' tool via curl -LsSf https://astral.sh/uv/install.sh | sh which runs a remote install script — a higher-risk pattern. The README also suggests adding the skill via npx or cloning a GitHub repo. Running an arbitrary curl|sh should be treated cautiously; the project otherwise relies on pip packages listed in requirements.txt (reasonable).
Credentials
The code requires an LLM API key and optionally Mathpix credentials (EXTRACTOR_API_KEY or API_KEY, EXTRACTOR_BASE_URL, MATHPIX_APP_ID/KEY). Those are proportionate for an extractor. The problem: the registry metadata lists no required env vars, creating a transparency gap. Also the README/SKILL.md default model is a Claude model name while the code uses the openai.OpenAI client and accepts EXTRACTOR_BASE_URL — this mismatch (client vs declared model/provider) is suspicious and should be verified before supplying keys.
Persistence & Privilege
The skill does not request always:true and does not claim to modify other skills or persistent system settings. It's a user-invoked tool and its runtime behavior is limited to reading local PDFs, optional .env, and making network calls to configured LLM/Mathpix endpoints.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install sci-data-extractor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /sci-data-extractor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial release of Sci-Data-Extractor. - Extracts structured data from scientific paper PDFs using LLMs and OCR methods. - Supports formula and table recognition with Mathpix OCR or PyMuPDF. - Outputs in Markdown tables or CSV files. - Provides preset extraction templates for enzyme kinetics, experiments, and literature reviews. - Allows usage of custom prompts for flexible data extraction needs. - Installation instructions for Python/pip, uv, or conda included, with API key configuration guidance.
元数据
Slug sci-data-extractor
版本 0.1.0
许可证
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Sci Data Extractor 是什么?

AI-powered tool for extracting structured data from scientific literature PDFs. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 463 次。

如何安装 Sci Data Extractor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install sci-data-extractor」即可一键安装,无需额外配置。

Sci Data Extractor 是免费的吗?

是的,Sci Data Extractor 完全免费(开源免费),可自由下载、安装和使用。

Sci Data Extractor 支持哪些平台?

Sci Data Extractor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Sci Data Extractor?

由 JackKuo666(@jackkuo666)开发并维护,当前版本 v0.1.0。

💬 留言讨论