← Back to Skills Marketplace
xukp20

Arxiv Summarizer Orchestrator

by xukp20 · GitHub ↗ · v0.1.1
cross-platform ⚠ suspicious
866
Downloads
0
Stars
1
Active Installs
2
Versions
Install in OpenClaw
/install arxiv-summarizer-orchestrator
Description
Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes.
README (SKILL.md)

\r \r

ArXiv Summarizer Orchestrator\r

\r Run the full pipeline by composing three sub-skills.\r \r

Sub-skill Order\r

\r

  1. arxiv-search-collector\r
  2. arxiv-paper-processor\r
  3. arxiv-batch-reporter\r \r

Workflow Parameters\r

\r

  • language: manual language parameter used by all stages. Default is English when omitted.\r
  • paper_processing_mode: subagent_parallel or serial.\r
  • max_parallel_papers: default 5 when paper_processing_mode=subagent_parallel.\r \r

Workflow\r

\r

Stage A: Collection Setup + Query Retrieval\r

\r

  1. Initialize one run with arxiv-search-collector/scripts/init_collection_run.py.\r
  2. Model generates multiple focused queries from original topic and writes a minimal query_plan.json (label + query only).\r
  3. Run arxiv-search-collector/scripts/fetch_queries_batch.py with the plan file (recommended).\r
  4. (Optional fallback) call arxiv-search-collector/scripts/fetch_query_metadata.py manually for one-by-one fetch.\r
  5. Model reads each indexed query list and decides keep indexes.\r
  6. Merge selected items with arxiv-search-collector/scripts/merge_selected_papers.py.\r
  7. If relevance/coverage is still not good, iterate Stage A:\r
    • generate another query plan with new labels,\r
    • fetch again,\r
    • re-merge with --incremental and updated selection-json.\r
    • set weak labels to empty keep list ([]) to explicitly drop them.\r \r Pass --language \x3CLANG> to collector scripts so all generated markdown files in Stage A follow the selected language.\r Use serial query fetch in Stage A with conservative controls (for example --min-interval-sec 5, --retry-max 4).\r Default collector settings already include retries/backoff and run-local throttle state (\x3Crun_dir>/.runtime/arxiv_api_state.json), so manual tuning is usually unnecessary.\r Prefer cache reuse (no --force) unless query parameters changed or data refresh is required.\r \r Output: one run directory with per-paper metadata subdirectories.\r \r

Stage B: Per-paper Artifact Download + Manual Summary\r

\r For each paper directory, invoke sub-skill arxiv-paper-processor once and let that skill produce \x3Cpaper_dir>/summary.md.\r \r Recommended pre-step for many papers:\r \r

  1. Run one batch artifact download before per-paper reading:\r \r
python3 arxiv-paper-processor/scripts/download_papers_batch.py \\r
  --run-dir /path/to/run \\r
  --artifact source_then_pdf \\r
  --max-workers 3 \\r
  --min-interval-sec 5 \\r
  --language \x3CLANG>\r
```\r
\r
Per-paper execution steps (inside `arxiv-paper-processor`):\r
\r
1. If `\x3Cpaper_dir>/summary.md` already exists and is complete, skip this paper.\r
2. If usable source (`source/source_extract/*.tex`) or PDF (`source/paper.pdf`) already exists, skip download.\r
3. If artifacts are missing, download source with `arxiv-paper-processor/scripts/download_arxiv_source.py`.\r
4. If source is unusable, download PDF with `arxiv-paper-processor/scripts/download_arxiv_pdf.py`.\r
5. Model reads content and manually writes `\x3Cpaper_dir>/summary.md` by reference format, in `language`.\r
\r
Parallel strategy for many papers:\r
\r
- Default: `paper_processing_mode=subagent_parallel` with `max_parallel_papers=5`.\r
- Optional: `paper_processing_mode=serial` to process one paper at a time.\r
- In parallel mode, run multiple `arxiv-paper-processor` instances in batches; concurrent papers must not exceed `max_parallel_papers`.\r
- Wait for one batch to finish before starting the next batch.\r
- In serial mode, run exactly one `arxiv-paper-processor` instance at a time.\r
- Subagent workers should only own one paper directory each to avoid file conflicts.\r
- Do not use scripts to auto-compose summary text; scripts are download-only tools.\r
\r
Output: all paper directories contain `summary.md`.\r
\r
### Stage C: Bundle + Final Hierarchical Report\r
\r
1. Run `arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language \x3CLANG>`.\r
2. Model reads `summaries_bundle.md` and writes `collection_report_template.md` in base dir.\r
3. In template, each paper leaf entry must include one standalone placeholder line: `{{ARXIV_BRIEF:\x3Carxiv_id>}}`.\r
4. Run `arxiv-batch-reporter/scripts/render_collection_report.py` to generate final `collection_report.md`.\r
5. Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper `summary.md` section 10 via script injection.\r
\r
If `language` is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language.\r
\r
## Periodic Scheduling\r
\r
This orchestrator is suitable for cron/scheduled execution in OpenClaw:\r
\r
- Frequency examples: daily, weekly, monthly.\r
- For rolling windows, use lookback (`1d`, `7d`, `30d`) when initializing runs.\r
\r
## Output Layout\r
\r
`\x3Coutput-root>/\x3Ctopic>-\x3Ctimestamp>-\x3Crange>/`\r
\r
- `task_meta.json`, `task_meta.md`\r
- `query_results/`, `query_selection/`\r
- `\x3Carxiv_id>/metadata.md` + downloaded source/pdf + `summary.md`\r
- `summaries_bundle.md`\r
- `collection_report_template.md`\r
- final rendered collection report (e.g. `collection_report.md`)\r
\r
Use `references/workflow-checklist.md` as execution checklist.\r
\r
## Related Skills\r
\r
This is the top-level orchestration skill.\r
\r
Before using it, install and enable these three sub-skills:\r
\r
- `arxiv-search-collector`\r
- `arxiv-paper-processor`\r
- `arxiv-batch-reporter`\r
\r
Execution order inside this orchestrator:\r
\r
1. `arxiv-search-collector` (Stage A)\r
2. `arxiv-paper-processor` (Stage B)\r
3. `arxiv-batch-reporter` (Stage C)\r
Usage Guidance
This orchestrator itself appears coherent and low-risk, but it delegates all network access and downloads to three sub-skills. Before installing or scheduling this skill you should: (1) inspect the source and install metadata for arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter to confirm they come from trusted authors and do not exfiltrate data or call unexpected endpoints; (2) run the workflow in an isolated workspace (dedicated run_dir) with limited filesystem permissions and monitor network activity while testing; (3) verify any scheduling/cron settings, rate-limit configuration, and that language parameters are passed explicitly; (4) confirm no secrets or unrelated system files are needed by the sub-skills. If you can review the three sub-skills and are comfortable with their behavior, this orchestrator is safe to use.
Capability Analysis
Type: OpenClaw Skill Name: arxiv-summarizer-orchestrator Version: 0.1.1 The skill bundle is designed for a legitimate purpose (arXiv summarization orchestration). However, it repeatedly instructs the AI agent to pass user-controlled parameters (e.g., `--language <LANG>`) directly to Python scripts executed via the shell (e.g., `python3 script.py --language <LANG>`). This pattern, found in `SKILL.md` and `references/workflow-checklist.md`, creates a significant command injection vulnerability if the OpenClaw agent does not rigorously sanitize the `<LANG>` input before constructing the shell command. While there is no evidence of intentional malicious behavior from the skill's author, this high-risk vulnerability makes the skill suspicious.
Capability Assessment
Purpose & Capability
The skill's name and description match the runtime instructions: it orchestrates three sub-skills (collector, per-paper processor, batch reporter). It requests no env vars, binaries, or installs, which is proportionate for an instruction-only orchestrator. The dependency on the three named sub-skills is expected and coherent.
Instruction Scope
SKILL.md stays within the orchestration scope: it describes how to run scripts in the sub-skills, when to skip papers, how to batch/parallelize, and how to assemble reports. The only runtime reading it asks for is project/run-directory files (per-paper metadata, downloaded source/pdf, summary.md, and runtime throttle/state in the run directory). It does not instruct the agent to read system-wide config, secrets, or unrelated files, nor to post data to unexpected external endpoints.
Install Mechanism
No install spec or code is included (instruction-only), so nothing is written to disk or fetched by this skill itself. That is the lowest-risk install model and is appropriate for an orchestrator.
Credentials
The skill declares no environment variables, credentials, or config paths. This is proportionate. One caveat: the orchestration assumes the three sub-skills exist and those sub-skills (not this orchestrator) may require network access or API keys; those should be inspected separately.
Persistence & Privilege
always is false and the skill does not request persistent presence or elevated platform privileges. It does not modify other skills' configs. Autonomous invocation remains enabled (platform default) but that is expected for skills and is not combined here with other red flags.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install arxiv-summarizer-orchestrator
  3. After installation, invoke the skill by name or use /arxiv-summarizer-orchestrator
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.1
Document cross-skill relationships in all SKILL.md files
v0.1.0
arxiv-summarizer-orchestrator 0.1.0 initial release: - Introduces end-to-end orchestration for periodic arXiv collection and reporting, integrating three sub-skills: arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter. - Supports configurable manual language selection applied throughout all markdown outputs. - Provides parallel (default: max 5) or serial paper processing strategies. - Documents flexible workflows for query generation, artifact collection, per-paper summarization, and final report assembly. - Designed for scheduled (e.g., cron) runs with organized output structure and workflow checklist.
Metadata
Slug arxiv-summarizer-orchestrator
Version 0.1.1
License
All-time Installs 1
Active Installs 1
Total Versions 2
Frequently Asked Questions

What is Arxiv Summarizer Orchestrator?

Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes. It is an AI Agent Skill for Claude Code / OpenClaw, with 866 downloads so far.

How do I install Arxiv Summarizer Orchestrator?

Run "/install arxiv-summarizer-orchestrator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Arxiv Summarizer Orchestrator free?

Yes, Arxiv Summarizer Orchestrator is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Arxiv Summarizer Orchestrator support?

Arxiv Summarizer Orchestrator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Arxiv Summarizer Orchestrator?

It is built and maintained by xukp20 (@xukp20); the current version is v0.1.1.

💬 Comments