← 返回 Skills 市场

文献精读小工具

Name: 文献精读小工具
Author: mxingchtongaelofficial2568

作者 Tiandoufayale · GitHub ↗ · v0.0.2 · MIT-0

cross-platform ✓ 安全检测通过

268

总下载

当前安装

版本数

在 OpenClaw 中安装

/install llm-paper-review-generator

功能描述

将论文 PDF 批处理为中文精读总结报告的工作流技能。适用于“PaddleOCR 或 pdfplumber 抽取文本 + 大模型总结论文”场景。使用时仅读取 skill 目录下 config.json 与 prompt.md，并运行 scripts 中所需脚本，用户可在prompt.md中定义用户研究主题与研究方向。

使用说明 (SKILL.md)

\r \r

paper-review-generator\r

1) 配置文件约束\r

仅使用当前 skill 目录下两个文件：\r
- config.json：包含运行所需全部信息（是否 OCR、OCR 参数、总结模型 base_url/model/api_key、可见窗口开关、线程数）\r
- prompt.md：总结提示词模板\r
api_key 支持两种写法：\r
- 在config.json中指定\r
- 环境变量引用：${ENV_VAR}，脚本会在运行时读取对应环境变量\r
不读取其他目录 secret。\r
不在日志和异常信息中回显任何 api_key 或 token。\r \r

2) 执行入口\r

必须先切换到本 skill 根目录再运行脚本（即 .../paper-review-generator），否则相对路径的 config.json / prompt.md 会找不到。\r
必须由用户明确传入输入与输出路径：\r
- --pdf（可重复，支持多个文件）\r
- --dir（可重复，支持多个文件夹）\r
- --output-dir（可选；不传时默认输出到每个输入 PDF 同目录下的 总结 文件夹）\r
示例（路径由 agent 按用户需求填入）：\r
- 单文件：python scripts/run_pipeline.py --pdf "{pdf_path}" --output-dir "{output_dir}"\r
- 多文件：python scripts/run_pipeline.py --pdf "{pdf_path_1}" --pdf "{pdf_path_2}" --output-dir "{output_dir}"\r
- 单文件夹：python scripts/run_pipeline.py --dir "{pdf_dir}" --output-dir "{output_dir}"\r
- 多文件夹：python scripts/run_pipeline.py --dir "{pdf_dir_1}" --dir "{pdf_dir_2}" --output-dir "{output_dir}"\r \r

3) 分流逻辑\r

读取 config.json.use_paddleocr：\r
- true：调用 extract_paddleocr.py 抽取文本（JSON 行输出，不落盘）。\r
- false：调用 extract_pdfplumber.py 抽取文本（JSON 行输出，不落盘）。\r
然后调用 summarize_reports.py：读取 prompt.md 与管道传递的抽取文本，调用 summarizer.provider 指定的模型配置生成 *_研读报告.md。\r \r

4) 环境检查与安全规范\r

执行前先检查 Python 是否可用（建议 3.10+）：\r
- 若用户电脑未安装 Python，必须先明确提示用户安装 Python，再继续后续步骤。\r
执行前检查依赖：\r
- 若缺少依赖包，agent 应在 skill 根目录按 scripts/requirements.txt 执行安装：\r
  - pip install -r scripts/requirements.txt\r
首次使用前必须做端点审查：\r
- 只保留你信任的 provider，删除或留空不用的 base_url/model\r
- 敏感文档场景优先使用自建/内网 OCR 与 LLM 端点\r
仅向用户明确确认过的 OCR/LLM 端点发请求。\r
若配置缺失（如 api_key/token/model/base_url），直接报错并提示补齐字段。\r
日志与异常必须脱敏，禁止输出原始 Authorization/API key/token 或完整远端响应体。\r

安全使用建议

This skill appears to do what it says: extract text from PDFs (either locally via pdfplumber or by submitting to a PaddleOCR job endpoint) and send the extracted text to an LLM endpoint to produce Chinese review reports. Before installing/running: - Inspect config.json and prompt.md in the skill folder. Confirm the 'summarizer.providers.*.base_url' and 'paddleocr.job_url' values — by default they point to third-party services. Replace with your trusted endpoints or leave empty to avoid accidental uploads. - Provide API keys deliberately: either place them in config.json or (preferable) use environment variable references like ${MY_KEY}. The script will error if base_url/api_key/model are missing. - Do not run this on sensitive or confidential PDFs unless you are sure the configured OCR/LLM endpoints are trusted and compliant with your data policy. The skill will transmit extracted text or upload PDFs to the configured service when OCR is used. - The SKILL.md recommends running 'pip install -r scripts/requirements.txt' if dependencies are missing. Run that in a controlled Python environment (venv/conda) to limit installation scope. - The scripts attempt to redact API keys in logs and errors, but you should still avoid printing or logging raw config content to shared consoles. If you want stronger assurance: run the scripts in an isolated VM, set provider base_url to a self-hosted endpoint, or remove/blank external endpoints in config.json before use.

功能分析

Type: OpenClaw Skill Name: paper-review-generator Version: 0.0.2 The paper-review-generator skill is a legitimate workflow for extracting text from PDF files and generating summaries using LLMs. It supports local extraction via pdfplumber or remote OCR via the PaddleOCR API, with results sent to configurable OpenAI-compatible endpoints. The code includes proactive security measures such as API key redaction in logs (via the redact function in multiple scripts) and uses safe subprocess execution with argument lists rather than shell strings. No evidence of data exfiltration beyond the stated purpose or malicious prompt injection was found.

能力评估

✓ Purpose & Capability

Name/description (batch-convert PDFs to Chinese review reports) match the included scripts: pdfplumber extraction, optional PaddleOCR remote job submission, and an LLM-based summarizer. The code paths, CLI arguments, and prompt.md align with the described workflow. The presence of multiple provider entries in config.json is reasonable (the skill supports different providers) though users must pick/authorize one.

ℹ Instruction Scope

SKILL.md and scripts constrain runtime to reading only config.json and prompt.md in the skill folder, taking explicit --pdf/--dir inputs, and writing reports to an output folder. The scripts do read environment variables only when referenced in config.json (syntax ${ENV_VAR}). However, the workflow will upload extracted text or PDFs to external endpoints configured in config.json (OCR job_url and chosen summarizer base_url). This is expected but important: user data (extracted text or PDFs) will be transmitted to whichever external provider is configured.

ℹ Install Mechanism

There is no automated install spec in the registry entry (instruction-only). SKILL.md instructs installing Python packages via 'pip install -r scripts/requirements.txt' if missing. The requirements (openai, requests, pdfplumber) are appropriate for the task but installing PyPI packages carries standard supply-chain risk; the skill does not download arbitrary archives or run opaque installers.

ℹ Credentials

The skill does not declare required env vars in registry metadata, but the code supports placing API keys directly in config.json or using ${ENV_VAR} references to environment variables. Requesting API keys for OCR/LLM providers is proportionate to the purpose. Users should be aware that config.json contains default base_url values (third-party endpoints) and must supply API keys (or env var names) to enable calls. The scripts attempt to redact keys in logs/errors.

✓ Persistence & Privilege

always is false and the skill does not request persistent or elevated platform privileges. It writes output reports to the specified output directory (or a '总结' folder next to each PDF) and may create a visible Tkinter progress window. It does not modify other skills or system-wide agent settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install llm-paper-review-generator
安装完成后，直接呼叫该 Skill 的名称或使用 /llm-paper-review-generator 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.0.2

- 更新安全规范：日志与异常均需脱敏，严禁输出原始敏感信息或完整远程响应体。 - 增加端点安全要求：首次使用前必须用户确认所用OCR/LLM服务端点，建议敏感应用优先使用内网或自建。 - 精简密钥读取范围，不再读取其他目录 secret。 - 优化配置错误提示，如缺失 api_key/token/model/base_url 时明确报错指引。

v0.0.1

详细使用说明请参考：https://github.com/mxingchtongaelofficial2568/openclaw-skills/blob/main/paper-review-generator/README.md 将论文 PDF 批处理为中文精读总结报告的技能。适用于“PaddleOCR 或 pdfplumber 抽取文本 + 大模型总结论文”场景。使用时仅读取 skill 目录下 config.json 与 prompt.md，并运行 scripts 中所需脚本，用户可在prompt.md中定义用户研究主题与研究方向。 config.json中内置了多个提供商的baseurl。

元数据

Slug llm-paper-review-generator

版本 0.0.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题