/install arxiv-paper-processor
\r \r
ArXiv Paper Processor\r
\r Use this skill for per-paper manual summarization, with optional batch artifact download.\r \r
- Single-paper mode: process one paper directory (e.g.
\x3Crun_dir>/\x3Carxiv_id>/).\r - Batch predownload mode: process many paper directories under one run dir before writing summaries.\r \r
Language Parameter\r
\r
- Use a workflow language parameter (for example
EnglishorChinese) and apply it manually.\r - The per-paper
summary.mdmust be written in the selected language.\r - If download scripts are called directly, pass
--language \x3CLANG>for traceability.\r \r
Core Principle\r
\r Scripts only fetch artifacts. The model performs reading and writing.\r \r
Non-negotiable Constraint\r
\r
- Do not generate
summary.mdby script-based snippet extraction, regex harvesting, or template autofill.\r - Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments.\r
- Scripts in this skill are only for artifact download (
source/pdf) and trace logs.\r - The final
summary.mdmust come from model-side reading and synthesis of the paper content.\r \r
Optional Batch Artifact Download (Many Papers)\r
\r Use this first when Stage B has many papers:\r \r
python3 scripts/download_papers_batch.py \\r
--run-dir /path/to/run \\r
--artifact source_then_pdf \\r
--max-workers 3 \\r
--min-interval-sec 5 \\r
--language English\r
```\r
\r
Key behavior:\r
\r
- Supports `--artifact source`, `--artifact pdf`, or `--artifact source_then_pdf` (default).\r
- Supports concurrency (`--max-workers`) and safe throttling/retry (`--min-interval-sec`, retry args).\r
- Uses run-local throttle state by default (`\x3Crun_dir>/.runtime/arxiv_download_state.json`) to reduce 429 risk.\r
- Skips papers that already have usable `source/source_extract/*.tex` or existing `source/paper.pdf` (unless `--force`).\r
- Resume-friendly: if a paper already has a completed `summary.md`, you can skip that paper's summary-writing step.\r
- Writes batch log to `\x3Crun_dir>/download_batch_log.json` by default.\r
\r
## Step 1: Download Source (Preferred)\r
\r
```bash\r
python3 scripts/download_arxiv_source.py \\r
--paper-dir /path/to/run/2602.00528 \\r
--language English\r
```\r
\r
This writes:\r
\r
- `source/source_bundle.bin`\r
- `source/source_extract/`\r
- `source/download_source_log.json`\r
\r
If usable source already exists and `--force` is not set, the script reuses local artifacts.\r
\r
## Step 2: If Needed, Download PDF\r
\r
```bash\r
python3 scripts/download_arxiv_pdf.py \\r
--paper-dir /path/to/run/2602.00528 \\r
--language English\r
```\r
\r
This writes:\r
\r
- `source/paper.pdf`\r
- `source/download_pdf_log.json`\r
\r
If PDF already exists and `--force` is not set, the script reuses local artifacts.\r
\r
## Step 3: Model Reads and Summarizes\r
\r
1. If `summary.md` already exists and follows the required format, skip this paper and mark it complete.\r
2. Read `metadata.md` first.\r
3. If `source/source_extract/` already exists with readable `.tex` files, use it directly.\r
4. Otherwise, if `source/paper.pdf` already exists, use PDF directly.\r
5. If neither exists, run download scripts (single-paper scripts or batch script) first.\r
6. Manually write `summary.md` in the same paper directory, in the selected language.\r
\r
Do not rely on rule-based auto summarization.\r
Do not rely on auto-extracted snippets as the primary writing basis.\r
\r
## Quality Requirement\r
\r
- Every section should include paper-specific details that are traceable to full-text reading.\r
- Section 4/5/10 should reflect concrete method and evaluation details, not generic wording.\r
- If key details are unclear in the source, explicitly note uncertainty instead of guessing.\r
- Match the detail level shown in `references/summary-example-en.md` and `references/summary-example-zh.md`.\r
- If your draft is clearly shorter or less specific than the examples, expand it before finishing.\r
\r
## Required Output\r
\r
- `\x3Cpaper_dir>/summary.md` in fixed section format.\r
- Pay special attention to section `## 10. Brief Conclusion`: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details.\r
- In section `## 1. Paper Snapshot`, use exact keys: `ArXiv ID`, `Title`, `Authors`, `Publish date`, `Primary category`, `Reading basis`.\r
- Do not use key variants such as `Reading source`, `Author list`, `Published on`, or lowercase key names.\r
\r
See `references/summary-format.md` for exact section requirements.\r
\r
## Related Skills\r
\r
This skill is a sub-skill of `arxiv-summarizer-orchestrator`.\r
\r
Pipeline position:\r
\r
1. Step 1 (upstream): `arxiv-search-collector` produces the selected paper directories and metadata.\r
2. Step 2 (this skill): `arxiv-paper-processor` downloads artifacts and writes one `summary.md` per paper.\r
3. Step 3 (downstream): `arxiv-batch-reporter` uses these per-paper summaries to generate the final collection report.\r
\r
Use this skill together with Step 1 and Step 3 for full end-to-end execution.\r
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install arxiv-paper-processor - After installation, invoke the skill by name or use
/arxiv-paper-processor - Provide required inputs per the skill's parameter spec and get structured output
What is Arxiv Paper Processor?
Tool for manual per-paper ArXiv paper processing: batch/source/pdf download then model-driven full-text reading and summary.md writing in chosen language. It is an AI Agent Skill for Claude Code / OpenClaw, with 1842 downloads so far.
How do I install Arxiv Paper Processor?
Run "/install arxiv-paper-processor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Arxiv Paper Processor free?
Yes, Arxiv Paper Processor is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Arxiv Paper Processor support?
Arxiv Paper Processor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Arxiv Paper Processor?
It is built and maintained by xukp20 (@xukp20); the current version is v0.1.1.