Description

Search arXiv by keyword, filter by submitted date range, fetch arXiv papers from an arXiv ID or URL, convert papers into Markdown and PDF files in the worksp...

README (SKILL.md)

arXiv Paper Reader

Name: Arxiv Paper Reader
Author: elio040208

Use the bundled Python scripts before reasoning about arXiv content. They handle:

searching arXiv by keyword
filtering keyword results by submitted date range
downloading arXiv metadata and paper content
converting papers to Markdown and PDF in the workspace
syncing configured topics into daily archive folders

Inputs

Accept raw arXiv IDs like 1706.03762 or URLs such as https://arxiv.org/abs/1706.03762.
Only accept raw IDs or HTTPS arXiv URLs on arxiv.org, www.arxiv.org, or export.arxiv.org.
Accept keyword searches such as transformer, diffusion, or computer vision.
Accept optional submitted-date windows using YYYY-MM-DD.
Do not use category filters or alias-based domain shortcuts; search is intentionally keyword-only.

Search workflow

Pick a Python command:
- Prefer python
- Fall back to python3
If the user wants search results or the latest papers for a topic, run:

python {baseDir}/scripts/search_arxiv.py --query "\x3Ckeywords>" --limit \x3Cn>

Read search_results.md and search_results.json.
Use {baseDir}/references/search-usage.md to present the results.
If the user asks for the latest papers matching a keyword, pass --sort submittedDate.
If the user wants the default best-match ranking, omit --sort and let the script use relevance order.
If the user gives a date window, add --start-date YYYY-MM-DD --end-date YYYY-MM-DD.

Topic sync workflow

Tell the user to maintain {rootDir}/topics.json, or seed it from {baseDir}/references/topics.example.json.
For recurring daily updates, run:

python {baseDir}/scripts/sync_arxiv_topics.py --daily --root-dir \x3Croot-dir>

For manual backfill, run:

python {baseDir}/scripts/sync_arxiv_topics.py --start-date YYYY-MM-DD --end-date YYYY-MM-DD --root-dir \x3Croot-dir>

Read \x3Croot-dir>/runs/\x3Ccapture-date>/run_manifest.md first.
Each captured paper lives at topics/\x3Ctopic-slug>/\x3Ccapture-date>/\x3Cpaper-id>__\x3Ctitle-slug>/.
Expect each paper directory to contain paper.pdf, paper.md, metadata.json, and summary.md.
The batch summary is template-based and grounded in the abstract plus converted Markdown; treat it as a review aid, not a substitute for reading the paper.

Fetch workflow

Choose an output directory:
- If the user gives one, use it.
- Otherwise write to ./artifacts/arxiv/\x3Cpaper-id>/ in the current workspace.
Run the converter:

python {baseDir}/scripts/arxiv_to_md.py \x3Cpaper-id-or-url> --output-dir \x3Ctarget-dir>

Read the generated paper.pdf, paper.md, and metadata.json.
Summarize the paper in Markdown.
Save the summary to \x3Ctarget-dir>/summary.md if the user asked for files. Otherwise return the summary directly in chat.

Summary format

Use the headings in {baseDir}/references/summary-format.md.

Keep the summary grounded in the generated Markdown. If the conversion falls back to abstract-only mode, say so explicitly in the summary.

Safety

Pass IDs, URLs, and keywords as single CLI arguments. Do not splice untrusted text into shell pipelines.
Only pass raw arXiv IDs or HTTPS arXiv URLs; reject arbitrary third-party URLs.
TLS verification is strict. If requests fail because your machine lacks a valid CA bundle, install certifi or fix the system trust store.
arXiv source archives are processed in-memory, only .tex members are read, and suspicious paths plus oversized payloads are rejected before parsing.
Date windows use arXiv submittedDate and inclusive YYYY-MM-DD boundaries.
Do not invent claims that are not supported by paper.md or search_results.md.
Do not reintroduce hardcoded category or alias mappings; keep search behavior keyword-only.

Usage Guidance

This skill appears coherent and limited to arXiv interactions, but exercise normal caution: 1) run it in a controlled workspace (it will create artifacts/ and monitor topic folders), 2) inspect the bundled scripts yourself before running (they will execute arbitrary Python locally), 3) ensure your environment has a proper CA bundle (the code enforces TLS and references certifi), and 4) don't point it at non-arXiv URLs — the code enforces allowed hosts but follow the SKILL.md rule to only pass raw arXiv IDs or arXiv HTTPS URLs. If you need higher assurance, run the scripts in an isolated environment (container/VM) and review the remaining truncated code paths before granting broad autonomous invocation.

Capability Analysis

Type: OpenClaw Skill Name: arxiv-paper-reader Version: 1.0.3 The arxiv-paper-reader skill bundle is designed to search, download, and convert arXiv papers into Markdown and PDF formats. The Python scripts (arxiv_api.py, arxiv_to_md.py) include appropriate security measures, such as restricting network requests to official arXiv domains and implementing defenses against path traversal and resource exhaustion (zip bombs) during source archive extraction. The instructions in SKILL.md are well-aligned with the stated purpose and do not contain any malicious prompt-injection attempts.

Capability Assessment

✓ Purpose & Capability

Name/description (search/fetch/convert arXiv papers) matches the code and declared requirements. The scripts only require a Python interpreter and interact with arXiv endpoints (export.arxiv.org and arxiv.org). No unrelated binaries, credentials, or config paths are requested.

✓ Instruction Scope

SKILL.md instructs the agent to run the included Python scripts, read/write files under workspace directories (artifacts/, topics.json, runs/), and to only accept raw arXiv IDs or arXiv HTTPS URLs. The instructions emphasize safety (pass args as single CLI args, strict TLS) and tell the agent to read the generated search_results.md/json and produced paper files. There is no instruction to read unrelated system files or environment variables.

✓ Install Mechanism

No install spec — instruction-only with bundled scripts. This is low-risk from an installer perspective because nothing is downloaded or installed automatically. The only runtime requirement is a local Python interpreter.

✓ Credentials

No environment variables, credentials, or config paths are required. The scripts write outputs to workspace subdirectories (artifacts/arxiv* and a configurable root-dir) which is appropriate for the described functionality.

✓ Persistence & Privilege

always is false and the skill does not request persistent platform privileges. It does create and update files under user-specified or default workspace directories (artifacts, runs, sync_state.json). That file I/O is expected for an archiving/syncing tool.

Version History

v1.0.3

**arxiv-paper-reader 1.0.3 changelog** - Added support for filtering keyword searches by submitted date range. - Introduced recurring topic sync with daily archive folders and summary generation. - Added references/topics.example.json and scripts/sync_arxiv_topics.py to support topic management and sync features. - Fetch workflow now produces both Markdown and PDF files for each paper. - Updated documentation to reflect new inputs, workflows, and file outputs.

v1.0.2

arxiv-paper-reader 1.0.2 - Search is now keyword-only: category filters and alias-based shortcuts removed. - Only raw arXiv IDs and HTTPS arXiv URLs are accepted for fetching papers; stricter URL validation. - Improved safety: third-party URLs are rejected, and stricter TLS/CA handling enforced. - arXiv source archives are processed securely—only `.tex` files are read and extra checks are applied. - Search command and workflows updated to reflect new restrictions and clearer sorting logic.

v1.0.1

Major update: Adds arXiv search and browsing, improves workflows for both papers and searches. - New: Search arXiv by keyword or category, and list the latest papers in a domain using new scripts. - Added support for category codes and common AI/CS aliases (e.g., nlp, cv, ml). - Clearer workflows: separate paths for searching/browsing and for fetching specific papers. - Includes new documentation on search usage and updated summary and safety instructions. - Maintains safe CLI argument handling for all new inputs.

v1.0.0

Initial release of arXiv Paper Reader. - Fetches arXiv papers by ID or URL and converts them to Markdown in the workspace. - Automatically writes a concise summary based on the converted paper. - Supports multiple paper inputs, storing each in a separate directory. - Ensures summaries remain grounded in the content provided—explicitly notes if only the abstract is available. - Designed for safe usage by handling untrusted inputs securely and avoiding unsupported claims.

Metadata

Slug arxiv-paper-reader

Version 1.0.3

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 4

Frequently Asked Questions

What is Arxiv Paper Reader?

Search arXiv by keyword, filter by submitted date range, fetch arXiv papers from an arXiv ID or URL, convert papers into Markdown and PDF files in the worksp... It is an AI Agent Skill for Claude Code / OpenClaw, with 179 downloads so far.

How do I install Arxiv Paper Reader?

Run "/install arxiv-paper-reader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Arxiv Paper Reader free?

Yes, Arxiv Paper Reader is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Arxiv Paper Reader support?

Arxiv Paper Reader is cross-platform and runs anywhere OpenClaw / Claude Code is available (win32, linux, darwin).

Who created Arxiv Paper Reader?

It is built and maintained by elio040208 (@elio040208); the current version is v1.0.3.

More Skills

Arxiv Paper Reader