← 返回 Skills 市场
146
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install agent-survey-corpus
功能描述
Download a small corpus of open-access arXiv survey/review PDFs about LLM agents and extract text for style learning. **Trigger**: agent survey corpus, ref c...
使用说明 (SKILL.md)
Agent Survey Corpus (arXiv PDFs → text extracts)
Goal: create a small, local reference library so you can learn from real agent surveys when refining:
- C2 outline structure (paper-like sectioning)
- C4 tables/claims organization
- C5 writing style and density
This is intentionally not part of the pipeline; it is an optional, repo-level toolkit.
Inputs
ref/agent-surveys/arxiv_ids.txt
Outputs
ref/agent-surveys/pdfs/ref/agent-surveys/text/ref/agent-surveys/STYLE_REPORT.md(tracked; auto-generated summary)
Workflow
- Edit
ref/agent-surveys/arxiv_ids.txt(one arXiv id per line). - Run the downloader to fetch PDFs and extract the first N pages to text.
- Skim the extracted text under
ref/agent-surveys/text/:- look at section counts (H2), subsection granularity (H3), and how they transition between chapters.
- identify repeated rhetorical patterns you want the pipeline writer to imitate.
Script
Quick Start
python scripts/run.py --helppython scripts/run.py --workspace . --max-pages 20
All Options
--workspace \x3Cdir>(use.to write into repo root)--inputs \x3Csemicolon-separated>(default:ref/agent-surveys/arxiv_ids.txt)--max-pages \x3CN>(default: 20)--sleep \x3Cseconds>(default: 1.0)--overwrite(re-download + re-extract)
Examples
- Download/extract into repo root
ref/:python scripts/run.py --workspace . --max-pages 20
- Download/extract into a specific folder (treated as workspace root):
python scripts/run.py --workspace /tmp/surveys --max-pages 30
Troubleshooting
- Download fails / timeout: rerun with a larger
--sleep, or try fewer ids. - Text extract is empty: the PDF may be scanned; try another survey or increase
--max-pages. - Files showing up in git status: PDFs/text are ignored via
.gitignore(ref/**/pdfs/,ref/**/text/).
安全使用建议
This skill is coherent and limited in scope, but take these practical steps before running: 1) Run it in an explicit workspace directory (e.g., a temp folder) so PDFs/text are confined and not accidentally committed to a repo; SKILL.md notes .gitignore but verify your repo ignores ref/**/pdfs/ and ref/**/text/. 2) Ensure you install required Python packages (PyMuPDF / pymupdf and any YAML libs if you use other tooling files) — the skill does not provide an install step. 3) Confirm network access is acceptable and that downloading the listed arXiv IDs is permitted for your use case (arXiv PDFs are generally open-access but check licenses for reuse). 4) Inspect and control ref/agent-surveys/arxiv_ids.txt before running so you only download expected papers. If you want stricter isolation, run the script inside a disposable container or VM.
功能分析
Type: OpenClaw Skill
Name: agent-survey-corpus
Version: 1.0.0
The agent-survey-corpus skill bundle is a legitimate research utility designed to download arXiv survey PDFs and extract their text for structural analysis. The core logic in `scripts/run.py` uses standard libraries (`urllib`, `xml.etree`) and `PyMuPDF` (`fitz`) to fetch metadata and content from official arXiv endpoints (arxiv.org). The extensive `tooling/` directory provides a comprehensive framework for pipeline execution, quality gating, and research ideation, all of which align with the stated purpose of assisting in academic writing. No indicators of data exfiltration, malicious execution, or prompt injection were found.
能力评估
Purpose & Capability
Name/description (download arXiv survey PDFs and extract text for style analysis) aligns with the files and code. The downloader only targets arxiv.org export API and arxiv.org/pdf URLs and writes outputs under ref/agent-surveys/, which is consistent with the stated purpose.
Instruction Scope
SKILL.md instructs editing ref/agent-surveys/arxiv_ids.txt and running scripts/run.py with workspace and max-pages options; the script reads that id list, fetches metadata, downloads PDFs and extracts the first N pages to text. The instructions do not ask the agent to read unrelated files or secrets.
Install Mechanism
There is no install spec (instruction-only install), which is low risk. However, the Python script uses third-party packages (PyMuPDF as fitz, and other tooling files reference yaml/etc.) but the skill does not declare Python dependencies or installation steps; users must install required Python packages manually before running.
Credentials
The skill requires no environment variables, no credentials, and no privileged config paths. Network access is necessary and explicitly documented; all network calls are to arxiv.org/export.arxiv.org and arxiv.org/pdf only.
Persistence & Privilege
always is false and the skill does not request persistent platform privileges. It writes files to the provided workspace only and does not modify other skills or system-level agent settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install agent-survey-corpus - 安装完成后,直接呼叫该 Skill 的名称或使用
/agent-survey-corpus触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of agent-survey-corpus skill for downloading and extracting text from arXiv survey/review PDFs about LLM agents.
- Provides a toolkit to build a local reference library for analyzing real survey structures and writing styles.
- Supports customizable workspace, page limits, and safe download (arXiv-only) with guardrails to keep large files outside git.
- Includes clear workflow and CLI script for managing PDFs and extracting text for study and style learning.
元数据
常见问题
Agent Survey Corpus 是什么?
Download a small corpus of open-access arXiv survey/review PDFs about LLM agents and extract text for style learning. **Trigger**: agent survey corpus, ref c... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 146 次。
如何安装 Agent Survey Corpus?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install agent-survey-corpus」即可一键安装,无需额外配置。
Agent Survey Corpus 是免费的吗?
是的,Agent Survey Corpus 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Agent Survey Corpus 支持哪些平台?
Agent Survey Corpus 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Agent Survey Corpus?
由 WILLOSCAR(@willoscar)开发并维护,当前版本 v1.0.0。
推荐 Skills