← 返回 Skills 市场
xukp20

Arxiv Paper Processor

作者 xukp20 · GitHub ↗ · v0.1.1
cross-platform ⚠ suspicious
1842
总下载
1
收藏
12
当前安装
2
版本数
在 OpenClaw 中安装
/install arxiv-paper-processor
功能描述
Tool for manual per-paper ArXiv paper processing: batch/source/pdf download then model-driven full-text reading and summary.md writing in chosen language.
使用说明 (SKILL.md)

\r \r

ArXiv Paper Processor\r

\r Use this skill for per-paper manual summarization, with optional batch artifact download.\r \r

  • Single-paper mode: process one paper directory (e.g. \x3Crun_dir>/\x3Carxiv_id>/).\r
  • Batch predownload mode: process many paper directories under one run dir before writing summaries.\r \r

Language Parameter\r

\r

  • Use a workflow language parameter (for example English or Chinese) and apply it manually.\r
  • The per-paper summary.md must be written in the selected language.\r
  • If download scripts are called directly, pass --language \x3CLANG> for traceability.\r \r

Core Principle\r

\r Scripts only fetch artifacts. The model performs reading and writing.\r \r

Non-negotiable Constraint\r

\r

  • Do not generate summary.md by script-based snippet extraction, regex harvesting, or template autofill.\r
  • Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments.\r
  • Scripts in this skill are only for artifact download (source/pdf) and trace logs.\r
  • The final summary.md must come from model-side reading and synthesis of the paper content.\r \r

Optional Batch Artifact Download (Many Papers)\r

\r Use this first when Stage B has many papers:\r \r

python3 scripts/download_papers_batch.py \\r
  --run-dir /path/to/run \\r
  --artifact source_then_pdf \\r
  --max-workers 3 \\r
  --min-interval-sec 5 \\r
  --language English\r
```\r
\r
Key behavior:\r
\r
- Supports `--artifact source`, `--artifact pdf`, or `--artifact source_then_pdf` (default).\r
- Supports concurrency (`--max-workers`) and safe throttling/retry (`--min-interval-sec`, retry args).\r
- Uses run-local throttle state by default (`\x3Crun_dir>/.runtime/arxiv_download_state.json`) to reduce 429 risk.\r
- Skips papers that already have usable `source/source_extract/*.tex` or existing `source/paper.pdf` (unless `--force`).\r
- Resume-friendly: if a paper already has a completed `summary.md`, you can skip that paper's summary-writing step.\r
- Writes batch log to `\x3Crun_dir>/download_batch_log.json` by default.\r
\r
## Step 1: Download Source (Preferred)\r
\r
```bash\r
python3 scripts/download_arxiv_source.py \\r
  --paper-dir /path/to/run/2602.00528 \\r
  --language English\r
```\r
\r
This writes:\r
\r
- `source/source_bundle.bin`\r
- `source/source_extract/`\r
- `source/download_source_log.json`\r
\r
If usable source already exists and `--force` is not set, the script reuses local artifacts.\r
\r
## Step 2: If Needed, Download PDF\r
\r
```bash\r
python3 scripts/download_arxiv_pdf.py \\r
  --paper-dir /path/to/run/2602.00528 \\r
  --language English\r
```\r
\r
This writes:\r
\r
- `source/paper.pdf`\r
- `source/download_pdf_log.json`\r
\r
If PDF already exists and `--force` is not set, the script reuses local artifacts.\r
\r
## Step 3: Model Reads and Summarizes\r
\r
1. If `summary.md` already exists and follows the required format, skip this paper and mark it complete.\r
2. Read `metadata.md` first.\r
3. If `source/source_extract/` already exists with readable `.tex` files, use it directly.\r
4. Otherwise, if `source/paper.pdf` already exists, use PDF directly.\r
5. If neither exists, run download scripts (single-paper scripts or batch script) first.\r
6. Manually write `summary.md` in the same paper directory, in the selected language.\r
\r
Do not rely on rule-based auto summarization.\r
Do not rely on auto-extracted snippets as the primary writing basis.\r
\r
## Quality Requirement\r
\r
- Every section should include paper-specific details that are traceable to full-text reading.\r
- Section 4/5/10 should reflect concrete method and evaluation details, not generic wording.\r
- If key details are unclear in the source, explicitly note uncertainty instead of guessing.\r
- Match the detail level shown in `references/summary-example-en.md` and `references/summary-example-zh.md`.\r
- If your draft is clearly shorter or less specific than the examples, expand it before finishing.\r
\r
## Required Output\r
\r
- `\x3Cpaper_dir>/summary.md` in fixed section format.\r
- Pay special attention to section `## 10. Brief Conclusion`: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details.\r
- In section `## 1. Paper Snapshot`, use exact keys: `ArXiv ID`, `Title`, `Authors`, `Publish date`, `Primary category`, `Reading basis`.\r
- Do not use key variants such as `Reading source`, `Author list`, `Published on`, or lowercase key names.\r
\r
See `references/summary-format.md` for exact section requirements.\r
\r
## Related Skills\r
\r
This skill is a sub-skill of `arxiv-summarizer-orchestrator`.\r
\r
Pipeline position:\r
\r
1. Step 1 (upstream): `arxiv-search-collector` produces the selected paper directories and metadata.\r
2. Step 2 (this skill): `arxiv-paper-processor` downloads artifacts and writes one `summary.md` per paper.\r
3. Step 3 (downstream): `arxiv-batch-reporter` uses these per-paper summaries to generate the final collection report.\r
\r
Use this skill together with Step 1 and Step 3 for full end-to-end execution.\r
安全使用建议
This skill appears internally consistent: it downloads arXiv source/pdf artifacts and asks the model to manually read those artifacts and write summary.md files. Before installing/using it, do the following checks: 1) Open the full scripts (the prompt contained truncated files) and confirm that all network requests are aimed at legitimate arXiv endpoints (e.g., arxiv.org) and not to unknown third-party URLs. 2) Run the scripts in an isolated workspace (or container) so downloads and extracted files are restricted to intended run directories. 3) The scripts write logs and extracted files under the run/paper directories — ensure those directories are the ones you expect. 4) No credentials are required, so never add secrets to make it 'work'. 5) If you will allow the agent to invoke this skill autonomously, be aware it can perform network downloads and write files; if you need stricter controls, keep autonomous invocation disabled or sandbox its execution. If you want higher confidence, provide the untruncated full source so URL-building and any remaining code paths can be fully audited.
功能分析
Type: OpenClaw Skill Name: arxiv-paper-processor Version: 0.1.1 The skill bundle 'arxiv-paper-processor' is designed for downloading and summarizing arXiv papers. The `SKILL.md` explicitly instructs the AI agent to perform model-driven summarization and forbids script-based content generation. The Python scripts (`download_arxiv_pdf.py`, `download_arxiv_source.py`) handle network requests to `arxiv.org` and local file operations, including safe tar extraction, without evidence of malicious intent or data exfiltration. However, `scripts/download_papers_batch.py` accepts arguments like `--python-bin`, `--source-script`, and `--pdf-script`. While `subprocess.run` is used safely with a list of arguments, if an untrusted caller (e.g., the AI agent with poor input sanitization or a malicious user) were to inject arbitrary commands or paths into these arguments, it could lead to arbitrary code execution, classifying this as suspicious due to a potential critical vulnerability.
能力评估
Purpose & Capability
Name/description match the included artifacts: three downloader scripts and a batch orchestrator. The files and SKILL.md describe downloading arXiv source/PDF, local throttling, extraction, and asking the model to manually produce summary.md. There are no unrelated environment variables, binaries, or config paths requested.
Instruction Scope
SKILL.md instructs the agent to only use the scripts for artifact download and to perform model-driven reading and manual summary writing. The instructions reference only per-paper directories, metadata files, extracted source, and PDFs. They explicitly forbid using scripts or regex-based extraction to auto-generate summaries. Note: parts of the code in the prompt were truncated, so I could not fully confirm every URL construction; verify that network requests target arXiv endpoints only.
Install Mechanism
There is no install spec (instruction-only skill with bundled scripts). This is lowest-risk from an install perspective: the skill will not download remote install artifacts on install time. The included Python scripts are run by the user/agent at runtime.
Credentials
The skill declares no required environment variables, credentials, or config paths. The scripts perform HTTP requests and write local files under per-paper directories; this is proportionate to the stated purpose.
Persistence & Privilege
Flags show always: false and normal autonomous invocation allowed. The skill does not request permanent system-wide presence or modify other skills. Its runtime behavior is limited to writing artifacts and logs in the provided run/paper directories.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install arxiv-paper-processor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /arxiv-paper-processor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.1
Document cross-skill relationships in all SKILL.md files
v0.1.0
Initial release: supports manual, model-driven summarization of arXiv papers with optional batch artifact downloading. - Provides scripts for downloading source files or PDFs for arXiv papers, with batch and single-paper modes. - Enforces manual summary writing by the model in a specified language parameter (no script-based summarization). - Batch download supports concurrency, safe throttling, resume, and skips already-processed papers. - Output summaries must follow a strict, detailed, sectioned format, as per provided examples, with concrete paper-specific detail. - Scripts are for fetching artifacts only; summarization is always based on model-side paper reading and synthesis.
元数据
Slug arxiv-paper-processor
版本 0.1.1
许可证
累计安装 12
当前安装数 12
历史版本数 2
常见问题

Arxiv Paper Processor 是什么?

Tool for manual per-paper ArXiv paper processing: batch/source/pdf download then model-driven full-text reading and summary.md writing in chosen language. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1842 次。

如何安装 Arxiv Paper Processor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install arxiv-paper-processor」即可一键安装,无需额外配置。

Arxiv Paper Processor 是免费的吗?

是的,Arxiv Paper Processor 完全免费(开源免费),可自由下载、安装和使用。

Arxiv Paper Processor 支持哪些平台?

Arxiv Paper Processor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Arxiv Paper Processor?

由 xukp20(@xukp20)开发并维护,当前版本 v0.1.1。

💬 留言讨论