← 返回 Skills 市场
willoscar

Arxiv Search

作者 WILLOSCAR · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
403
总下载
0
收藏
8
当前安装
1
版本数
在 OpenClaw 中安装
/install arxiv-search
功能描述
Retrieve paper metadata from arXiv using keyword queries and save results as JSONL (`papers/papers_raw.jsonl`). **Trigger**: arXiv, arxiv, paper search, meta...
使用说明 (SKILL.md)

arXiv Search (metadata-first)

Collect an initial paper set with enough metadata to support downstream ranking, taxonomy building, and citation generation.

When online, prefer rich arXiv metadata (categories, arxiv_id, pdf_url, published/updated, etc.). When offline, accept an export and convert it cleanly.

Load Order

Always read:

  • references/domain_pack_overview.md — how domain packs drive topic-specific behavior

Domain packs (loaded by topic match):

  • assets/domain_packs/llm_agents.json — pinned IDs, query rewrite rules for LLM agent topics

Script Boundary

Use scripts/run.py only for:

  • arXiv API retrieval and XML parsing
  • offline export conversion (CSV/JSON/JSONL normalization)
  • metadata enrichment via id_list backfill

Do not treat run.py as the place for:

  • hardcoded topic detection or query rewriting (use domain packs)
  • domain-specific pinned paper lists (externalize to assets/domain_packs/)

Input

  • queries.md (keywords, excludes, time window)

Outputs

  • papers/papers_raw.jsonl (JSONL; 1 paper per line)
    • Each record includes at least: title, authors, year, url, abstract
    • When using the arXiv API online mode, records also include helpful metadata: arxiv_id, pdf_url, categories, primary_category, published, updated, doi, journal_ref, comment
  • Convenience index (optional but generated by the script):
    • papers/papers_raw.csv

Decision: online vs offline

  • If you have network access: run arXiv API retrieval.
  • If not: import an export the user provides (CSV/JSON/JSONL) and normalize fields.
  • Hybrid: if you import offline but still have network later, you can enrich missing fields (abstract/authors/categories) via arXiv id_list using --enrich-metadata or queries.md enrich_metadata: true.

Workflow (heuristic)

  1. Read queries.md and expand into concrete query strings.
  2. Retrieve results (online) or import an export (offline).
  3. Normalize every record to include at least:
    • title, authors (array), year, url, abstract
  4. Keep the set broad at this stage; dedupe/ranking comes next.
  5. Apply time window and max_results if specified.

Quality checklist

  • papers/papers_raw.jsonl exists.
  • Each line is valid JSON and contains title, authors, year, url.

Side effects

  • Allowed: create/overwrite papers/papers_raw.jsonl; append notes to STATUS.md.
  • Not allowed: write prose sections in output/ before writing is approved.

Script

Quick Start

  • python scripts/run.py --help
  • Online: python scripts/run.py --workspace \x3Cworkspace_dir> --query "\x3Cquery>" --max-results 200
  • Offline import: python scripts/run.py --workspace \x3Cworkspace_dir> --input \x3Cexport.csv|json|jsonl>

All Options

  • --query \x3Cq>: repeatable; multiple queries are unioned
  • --exclude \x3Cterm>: repeatable; excludes applied after retrieval
  • --max-results \x3Cn>: cap total retrieved
  • --input \x3Cexport.*>: offline mode (CSV/JSON/JSONL)
  • --enrich-metadata: best-effort enrich via arXiv id_list (needs network)
  • queries.md also supports: keywords, exclude, time window, max_results, enrich_metadata

Examples

  • Online (multi-query + excludes):
    • python scripts/run.py --workspace \x3Cws> --query "LLM agent" --query "tool use" --exclude "survey" --max-results 300
  • Fetch a single paper by arXiv ID (direct id_list fetch):
    • python scripts/run.py --workspace \x3Cws> --query 2509.02547 --max-results 1
  • Offline auto-detect (no flags):
    • Place papers/import.csv (or .json/.jsonl) under the workspace, then run: python scripts/run.py --workspace \x3Cws>
  • Offline import + time window (via queries.md):
    • Set - time window: { from: 2022, to: 2025 } then run offline import normally

Troubleshooting

Common Issues

Issue: papers/papers_raw.jsonl is empty

Symptom:

  • Script exits with “No results returned …” or output file is empty.

Causes:

  • Network is blocked (online mode).
  • Queries are too narrow or queries.md is empty.

Solutions:

  • Use offline import: place papers/import.csv|json|jsonl in the workspace or pass --input.
  • Broaden keywords and reduce excludes in queries.md.
  • Run with explicit --query to sanity-check the parser.

Issue: Offline import records miss fields

Symptom:

  • Downstream steps fail because records miss authors/year/abstract/url.

Causes:

  • Export columns don’t match expected fields; upstream export is incomplete.

Solutions:

  • Ensure the export contains at least title, authors, year, url, abstract.
  • If you later have network, use --enrich-metadata to backfill missing fields (best effort).

Recovery Checklist

  • Confirm queries.md has non-empty keywords (or pass --query).
  • If offline: confirm workspace has papers/import.* and rerun.
  • Spot-check 3–5 JSONL lines: valid JSON + required fields.
安全使用建议
This skill appears to do exactly what it says: query arXiv (or normalize offline exports) and write a JSONL index under the workspace. Before installing or running: (1) ensure you trust the workspace path the skill will write to (it will create/overwrite papers/papers_raw.jsonl and a CSV index); (2) verify you have Python available and are OK with network calls to export.arxiv.org/arxiv.org when running online; (3) review scripts/run.py locally if you need assurance (it contains the API calls and normalization logic); (4) if you plan to feed offline exports, only use trusted exports to avoid garbage input; (5) note the skill can be invoked autonomously by the agent (default) — if you want to restrict autonomous runs, adjust agent invocation policies accordingly.
功能分析
Type: OpenClaw Skill Name: arxiv-search Version: 1.0.0 The arxiv-search skill bundle is a legitimate tool designed for retrieving and processing research paper metadata. The primary script, `scripts/run.py`, interacts exclusively with the official arXiv API (export.arxiv.org) using standard Python libraries and includes robust logic for handling both online retrieval and offline data imports. The shared utilities in the `tooling/` directory, such as `executor.py` and `quality_gate.py`, provide necessary infrastructure for the OpenClaw agentic framework, including automated execution of sub-scripts and extensive validation of research artifacts (e.g., citation density and structural integrity). No evidence of malicious behavior, such as data exfiltration, unauthorized system access, or harmful prompt injection, was detected.
能力评估
Purpose & Capability
Name/description (arXiv metadata retrieval) matches the included scripts and assets. The skill only requires a Python runtime and reads/writes workspace files (queries.md, papers/*). Domain-pack JSON files and pipeline docs are coherent with retrieval/query-rewrite behavior.
Instruction Scope
SKILL.md confines actions to reading queries.md, domain packs in the repo, doing online arXiv API calls or offline import conversion, and writing papers/papers_raw.jsonl (and optional CSV index). It does not instruct the agent to read unrelated system files or to transmit data to external endpoints other than arXiv (export.arxiv.org / arxiv.org) and does not request broad discretionary data collection.
Install Mechanism
No install spec — the skill is delivered as Python scripts and documentation and expects python/python3 on PATH. No external downloads or archive extraction are specified.
Credentials
The skill declares no required environment variables or credentials. Its network access (to arXiv) is appropriate for the stated purpose. No unexpected secrets or unrelated service tokens are requested.
Persistence & Privilege
The skill is not force-enabled (always:false) and does not request modifications to other skills or system-wide configuration. Autonomous invocation is allowed (default) but that is normal and not combined with broad privileges or secret access.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install arxiv-search
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /arxiv-search 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of arxiv-search. - Enables retrieval of arXiv paper metadata using keyword queries. - Supports both online (arXiv API) and offline (CSV/JSON/JSONL import) workflows. - Outputs normalized results to `papers/papers_raw.jsonl` with key metadata fields. - Provides optional field enrichment via arXiv `id_list` if network is available. - Includes troubleshooting and quality guidance for smooth integration.
元数据
Slug arxiv-search
版本 1.0.0
许可证 MIT-0
累计安装 9
当前安装数 8
历史版本数 1
常见问题

Arxiv Search 是什么?

Retrieve paper metadata from arXiv using keyword queries and save results as JSONL (`papers/papers_raw.jsonl`). **Trigger**: arXiv, arxiv, paper search, meta... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 403 次。

如何安装 Arxiv Search?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install arxiv-search」即可一键安装,无需额外配置。

Arxiv Search 是免费的吗?

是的,Arxiv Search 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Arxiv Search 支持哪些平台?

Arxiv Search 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Arxiv Search?

由 WILLOSCAR(@willoscar)开发并维护,当前版本 v1.0.0。

💬 留言讨论