功能描述

Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes.

使用说明 (SKILL.md)

\r \r

ArXiv Summarizer Orchestrator\r

Name: Arxiv Summarizer Orchestrator
Author: xukp20

\r Run the full pipeline by composing three sub-skills.\r \r

Sub-skill Order\r

\r

arxiv-search-collector\r
arxiv-paper-processor\r
arxiv-batch-reporter\r \r

Workflow Parameters\r

\r

language: manual language parameter used by all stages. Default is English when omitted.\r
paper_processing_mode: subagent_parallel or serial.\r
max_parallel_papers: default 5 when paper_processing_mode=subagent_parallel.\r \r

Workflow\r

\r

Stage A: Collection Setup + Query Retrieval\r

\r

Initialize one run with arxiv-search-collector/scripts/init_collection_run.py.\r
Model generates multiple focused queries from original topic and writes a minimal query_plan.json (label + query only).\r
Run arxiv-search-collector/scripts/fetch_queries_batch.py with the plan file (recommended).\r
(Optional fallback) call arxiv-search-collector/scripts/fetch_query_metadata.py manually for one-by-one fetch.\r
Model reads each indexed query list and decides keep indexes.\r
Merge selected items with arxiv-search-collector/scripts/merge_selected_papers.py.\r
If relevance/coverage is still not good, iterate Stage A:\r
- generate another query plan with new labels,\r
- fetch again,\r
- re-merge with --incremental and updated selection-json.\r
- set weak labels to empty keep list ([]) to explicitly drop them.\r \r Pass --language \x3CLANG> to collector scripts so all generated markdown files in Stage A follow the selected language.\r Use serial query fetch in Stage A with conservative controls (for example --min-interval-sec 5, --retry-max 4).\r Default collector settings already include retries/backoff and run-local throttle state (\x3Crun_dir>/.runtime/arxiv_api_state.json), so manual tuning is usually unnecessary.\r Prefer cache reuse (no --force) unless query parameters changed or data refresh is required.\r \r Output: one run directory with per-paper metadata subdirectories.\r \r

Stage B: Per-paper Artifact Download + Manual Summary\r

\r For each paper directory, invoke sub-skill arxiv-paper-processor once and let that skill produce \x3Cpaper_dir>/summary.md.\r \r Recommended pre-step for many papers:\r \r

Run one batch artifact download before per-paper reading:\r \r

python3 arxiv-paper-processor/scripts/download_papers_batch.py \\r
  --run-dir /path/to/run \\r
  --artifact source_then_pdf \\r
  --max-workers 3 \\r
  --min-interval-sec 5 \\r
  --language \x3CLANG>\r
```\r
\r
Per-paper execution steps (inside `arxiv-paper-processor`):\r
\r
1. If `\x3Cpaper_dir>/summary.md` already exists and is complete, skip this paper.\r
2. If usable source (`source/source_extract/*.tex`) or PDF (`source/paper.pdf`) already exists, skip download.\r
3. If artifacts are missing, download source with `arxiv-paper-processor/scripts/download_arxiv_source.py`.\r
4. If source is unusable, download PDF with `arxiv-paper-processor/scripts/download_arxiv_pdf.py`.\r
5. Model reads content and manually writes `\x3Cpaper_dir>/summary.md` by reference format, in `language`.\r
\r
Parallel strategy for many papers:\r
\r
- Default: `paper_processing_mode=subagent_parallel` with `max_parallel_papers=5`.\r
- Optional: `paper_processing_mode=serial` to process one paper at a time.\r
- In parallel mode, run multiple `arxiv-paper-processor` instances in batches; concurrent papers must not exceed `max_parallel_papers`.\r
- Wait for one batch to finish before starting the next batch.\r
- In serial mode, run exactly one `arxiv-paper-processor` instance at a time.\r
- Subagent workers should only own one paper directory each to avoid file conflicts.\r
- Do not use scripts to auto-compose summary text; scripts are download-only tools.\r
\r
Output: all paper directories contain `summary.md`.\r
\r
### Stage C: Bundle + Final Hierarchical Report\r
\r
1. Run `arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language \x3CLANG>`.\r
2. Model reads `summaries_bundle.md` and writes `collection_report_template.md` in base dir.\r
3. In template, each paper leaf entry must include one standalone placeholder line: `{{ARXIV_BRIEF:\x3Carxiv_id>}}`.\r
4. Run `arxiv-batch-reporter/scripts/render_collection_report.py` to generate final `collection_report.md`.\r
5. Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper `summary.md` section 10 via script injection.\r
\r
If `language` is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language.\r
\r
## Periodic Scheduling\r
\r
This orchestrator is suitable for cron/scheduled execution in OpenClaw:\r
\r
- Frequency examples: daily, weekly, monthly.\r
- For rolling windows, use lookback (`1d`, `7d`, `30d`) when initializing runs.\r
\r
## Output Layout\r
\r
`\x3Coutput-root>/\x3Ctopic>-\x3Ctimestamp>-\x3Crange>/`\r
\r
- `task_meta.json`, `task_meta.md`\r
- `query_results/`, `query_selection/`\r
- `\x3Carxiv_id>/metadata.md` + downloaded source/pdf + `summary.md`\r
- `summaries_bundle.md`\r
- `collection_report_template.md`\r
- final rendered collection report (e.g. `collection_report.md`)\r
\r
Use `references/workflow-checklist.md` as execution checklist.\r
\r
## Related Skills\r
\r
This is the top-level orchestration skill.\r
\r
Before using it, install and enable these three sub-skills:\r
\r
- `arxiv-search-collector`\r
- `arxiv-paper-processor`\r
- `arxiv-batch-reporter`\r
\r
Execution order inside this orchestrator:\r
\r
1. `arxiv-search-collector` (Stage A)\r
2. `arxiv-paper-processor` (Stage B)\r
3. `arxiv-batch-reporter` (Stage C)\r

安全使用建议

This orchestrator itself appears coherent and low-risk, but it delegates all network access and downloads to three sub-skills. Before installing or scheduling this skill you should: (1) inspect the source and install metadata for arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter to confirm they come from trusted authors and do not exfiltrate data or call unexpected endpoints; (2) run the workflow in an isolated workspace (dedicated run_dir) with limited filesystem permissions and monitor network activity while testing; (3) verify any scheduling/cron settings, rate-limit configuration, and that language parameters are passed explicitly; (4) confirm no secrets or unrelated system files are needed by the sub-skills. If you can review the three sub-skills and are comfortable with their behavior, this orchestrator is safe to use.

功能分析

Type: OpenClaw Skill Name: arxiv-summarizer-orchestrator Version: 0.1.1 The skill bundle is designed for a legitimate purpose (arXiv summarization orchestration). However, it repeatedly instructs the AI agent to pass user-controlled parameters (e.g., `--language <LANG>`) directly to Python scripts executed via the shell (e.g., `python3 script.py --language <LANG>`). This pattern, found in `SKILL.md` and `references/workflow-checklist.md`, creates a significant command injection vulnerability if the OpenClaw agent does not rigorously sanitize the `<LANG>` input before constructing the shell command. While there is no evidence of intentional malicious behavior from the skill's author, this high-risk vulnerability makes the skill suspicious.

能力评估

✓ Purpose & Capability

The skill's name and description match the runtime instructions: it orchestrates three sub-skills (collector, per-paper processor, batch reporter). It requests no env vars, binaries, or installs, which is proportionate for an instruction-only orchestrator. The dependency on the three named sub-skills is expected and coherent.

✓ Instruction Scope

SKILL.md stays within the orchestration scope: it describes how to run scripts in the sub-skills, when to skip papers, how to batch/parallelize, and how to assemble reports. The only runtime reading it asks for is project/run-directory files (per-paper metadata, downloaded source/pdf, summary.md, and runtime throttle/state in the run directory). It does not instruct the agent to read system-wide config, secrets, or unrelated files, nor to post data to unexpected external endpoints.

✓ Install Mechanism

No install spec or code is included (instruction-only), so nothing is written to disk or fetched by this skill itself. That is the lowest-risk install model and is appropriate for an orchestrator.

✓ Credentials

The skill declares no environment variables, credentials, or config paths. This is proportionate. One caveat: the orchestration assumes the three sub-skills exist and those sub-skills (not this orchestrator) may require network access or API keys; those should be inspected separately.

✓ Persistence & Privilege

always is false and the skill does not request persistent presence or elevated platform privileges. It does not modify other skills' configs. Autonomous invocation remains enabled (platform default) but that is expected for skills and is not combined here with other red flags.

版本历史

v0.1.1

Document cross-skill relationships in all SKILL.md files

v0.1.0

arxiv-summarizer-orchestrator 0.1.0 initial release: - Introduces end-to-end orchestration for periodic arXiv collection and reporting, integrating three sub-skills: arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter. - Supports configurable manual language selection applied throughout all markdown outputs. - Provides parallel (default: max 5) or serial paper processing strategies. - Documents flexible workflows for query generation, artifact collection, per-paper summarization, and final report assembly. - Designed for scheduled (e.g., cron) runs with organized output structure and workflow checklist.

元数据

Slug arxiv-summarizer-orchestrator

版本 0.1.1

许可证 —

累计安装 1

当前安装数 1

历史版本数 2

常见问题

Arxiv Summarizer Orchestrator 是什么？

Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 866 次。

如何安装 Arxiv Summarizer Orchestrator？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install arxiv-summarizer-orchestrator」即可一键安装，无需额外配置。

Arxiv Summarizer Orchestrator 是免费的吗？

是的，Arxiv Summarizer Orchestrator 完全免费（开源免费），可自由下载、安装和使用。

Arxiv Summarizer Orchestrator 支持哪些平台？

Arxiv Summarizer Orchestrator 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Arxiv Summarizer Orchestrator？

由 xukp20（@xukp20）开发并维护，当前版本 v0.1.1。

Arxiv Summarizer Orchestrator