/install agent-survey-corpus
Agent Survey Corpus (arXiv PDFs → text extracts)
Goal: create a small, local reference library so you can learn from real agent surveys when refining:
- C2 outline structure (paper-like sectioning)
- C4 tables/claims organization
- C5 writing style and density
This is intentionally not part of the pipeline; it is an optional, repo-level toolkit.
Inputs
ref/agent-surveys/arxiv_ids.txt
Outputs
ref/agent-surveys/pdfs/ref/agent-surveys/text/ref/agent-surveys/STYLE_REPORT.md(tracked; auto-generated summary)
Workflow
- Edit
ref/agent-surveys/arxiv_ids.txt(one arXiv id per line). - Run the downloader to fetch PDFs and extract the first N pages to text.
- Skim the extracted text under
ref/agent-surveys/text/:- look at section counts (H2), subsection granularity (H3), and how they transition between chapters.
- identify repeated rhetorical patterns you want the pipeline writer to imitate.
Script
Quick Start
python scripts/run.py --helppython scripts/run.py --workspace . --max-pages 20
All Options
--workspace \x3Cdir>(use.to write into repo root)--inputs \x3Csemicolon-separated>(default:ref/agent-surveys/arxiv_ids.txt)--max-pages \x3CN>(default: 20)--sleep \x3Cseconds>(default: 1.0)--overwrite(re-download + re-extract)
Examples
- Download/extract into repo root
ref/:python scripts/run.py --workspace . --max-pages 20
- Download/extract into a specific folder (treated as workspace root):
python scripts/run.py --workspace /tmp/surveys --max-pages 30
Troubleshooting
- Download fails / timeout: rerun with a larger
--sleep, or try fewer ids. - Text extract is empty: the PDF may be scanned; try another survey or increase
--max-pages. - Files showing up in git status: PDFs/text are ignored via
.gitignore(ref/**/pdfs/,ref/**/text/).
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install agent-survey-corpus - After installation, invoke the skill by name or use
/agent-survey-corpus - Provide required inputs per the skill's parameter spec and get structured output
What is Agent Survey Corpus?
Download a small corpus of open-access arXiv survey/review PDFs about LLM agents and extract text for style learning. **Trigger**: agent survey corpus, ref c... It is an AI Agent Skill for Claude Code / OpenClaw, with 146 downloads so far.
How do I install Agent Survey Corpus?
Run "/install agent-survey-corpus" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Agent Survey Corpus free?
Yes, Agent Survey Corpus is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Agent Survey Corpus support?
Agent Survey Corpus is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Agent Survey Corpus?
It is built and maintained by WILLOSCAR (@willoscar); the current version is v1.0.0.