← 返回 Skills 市场
isonaei

Literature Manager

作者 IsonaEi · GitHub ↗ · v1.1.1
cross-platform ⚠ suspicious
905
总下载
2
收藏
2
当前安装
3
版本数
在 OpenClaw 中安装
/install literature-manager
功能描述
Search, download, convert, organize, and audit academic literature collections. Use when asked to find papers, build a literature library, add papers to refe...
使用说明 (SKILL.md)

Literature Manager

Manage academic literature collections: search → download → convert → organize → verify.

Dependencies

  • pdftotext (poppler-utils) — PDF text extraction
  • curl — downloading
  • python3 — JSON processing in audit
  • file (coreutils) — PDF validation
  • uvx markitdown[pdf] (optional) — fallback PDF→MD converter (note: plain uvx markitdown does NOT work for PDFs — must use uvx markitdown[pdf])

Quick Start

# Download a single paper by DOI
bash scripts/download.sh "10.1038/s41592-024-02200-1" output_dir/

# Convert PDF to markdown
bash scripts/convert.sh paper.pdf output.md

# Verify a single PDF+MD pair
bash scripts/verify.sh paper.pdf paper.md

# Full audit of a references/ folder
bash scripts/audit.sh /path/to/references/

Workflow

1. Search

Use web_fetch on Google Scholar:

https://scholar.google.com/scholar?q=QUERY&as_ylo=YEAR

Extract: title, authors, year, journal, DOI, PDF links.

For each result, identify the best open-access PDF source (see Download Strategy).

2. Download

Run scripts/download.sh \x3CDOI_or_URL> \x3Coutput_dir/> per paper. The script tries sources in order:

  1. Direct publisher PDF (Nature, eLife, Frontiers, PNAS, bioRxiv, arXiv)
  2. EuropePMC (PMC_ID → PDF)
  3. bioRxiv/arXiv preprint
  4. Sci-Hubhttps://sci-hub.box/\x3CDOI> (use when publisher is paywalled)
# Sci-Hub download example:
curl -L "https://sci-hub.box/10.1038/nature12345" -o paper.pdf

⚠️ Legal note: Sci-Hub may violate publisher terms of service or copyright law in some jurisdictions. Use only if you understand and accept the legal implications in your context.

If all sources fail (including Sci-Hub), flag as permanent paywall. Provide the user with the DOI and ask for manual download.

3. Convert

Run scripts/convert.sh \x3Cinput.pdf> \x3Coutput.md>. Uses pdftotext (reliable) with uvx markitdown[pdf] as fallback.

# Correct markitdown command for PDFs:
uvx markitdown[pdf] input.pdf > output.md

# ⚠️ The following will NOT work for PDFs (missing [pdf] extra):
# uvx markitdown input.pdf

Prefer uvx markitdown[pdf] over pdftotext when full fidelity (tables, figures captions) matters.

4. Organize

Standard folder structure:

references/
├── README.md              # Human index (summaries per category)
├── index.json             # Machine index (structured metadata)
├── RESOURCES.md           # Code repos + datasets
├── resources.json         # Structured version
├── \x3Ccategory-1>/
│   ├── papers/            # PDFs
│   └── markdown/          # Converted text
└── \x3Ccategory-N>/
    ├── papers/
    └── markdown/

Categories are user-defined. Number-prefix for sort order (e.g., 01-theoretical-frameworks/).

index.json schema per paper

{
  "id": "short_id",
  "title": "Full title",
  "authors": ["Author1", "Author2"],
  "year": 2024,
  "journal": "Journal Name",
  "doi": "10.xxxx/...",
  "category": "category_name",
  "subcategory": "optional",
  "pdf_path": "category/papers/filename.pdf",
  "markdown_path": "category/markdown/filename.md",
  "tags": ["tag1", "tag2"],
  "one_line_summary": "English one-liner",
  "key_concepts": ["concept1"],
  "relevance_to_project": "English description"
}

README.md pattern

Per category section, per paper: title, authors, year, journal, DOI, short summary in user's language.

4b. DOI-Based Filenames & Path Mapping

Downloaded files are often named using DOI format rather than AuthorYear:

10-1038_ncomms3018.md        # DOI: 10.1038/ncomms3018
10-1016_j-neuron-2015-03-034.md

When markdown_path entries in index.json become stale (e.g., after folder reorganization), maintain a separate mapping file:

// temp/paper_md_mapping.json
{
  "author2024_keyword": "references/new-downloads/10-1038_s41592-024-02200-1.md",
  ...
}

To build this mapping: cross-reference each paper's DOI in index.json against actual files on disk. Use find + Python to automate.

index.json Known Pitfalls

  • id: null corruption: If many entries have id=null and share the same pdf_path, the index was likely corrupted during a batch write. Rebuild from actual files on disk.
  • DOI errors: Verify DOIs resolve correctly — typos in DOI fields are common (e.g., wrong suffix digits). Always cross-check with publisher page.
  • Dead markdown_path: After restructuring folders, markdown_path in index.json often points to old locations. Use the mapping file above as the source of truth.

5. Verify

Run scripts/audit.sh \x3Creferences_dir/> for full verification:

  • Every PDF is valid (file -b = PDF)
  • Every PDF title matches filename (pdftotext | head)
  • Every PDF has matching markdown (and vice versa)
  • index.json is valid, complete, paths exist, no duplicate IDs
  • README.md stats match actual counts

6. Collect Resources

For tool/method papers, find GitHub repos and public datasets. Store in RESOURCES.md + resources.json.

Sub-agent Strategy

For large batches, parallelize:

  • Download: 1 sub-agent per batch of ~5-8 papers
  • Organize: 1 sub-agent to build indexes
  • Verify: 1 independent sub-agent (never the same as organizer)

Always use a separate sub-agent for verification (QC should not self-grade).

⚠️ Sub-agent Rules (Learned from Practice)

  1. One batch at a time — do not spawn multiple note-writing batches simultaneously; LLM rate limits will cause silent failures
  2. Set a cron monitor whenever spawning long-running agents — agents can fail silently without triggering auto-announce; cron catches this
  3. Cron monitor pattern:
    1. Spawn agent(s)
    2. Immediately set a cron job (every 10-15 min, isolated agentTurn)
       → Check if expected output files exist
       → Re-spawn failed agents
       → When all complete: announce + delete cron
    3. After task finishes, confirm cron was removed
    

Adding Papers Incrementally

To add papers to an existing collection:

  1. Download + convert new papers into correct category folder
  2. Append entries to index.json
  3. Update README.md stats
  4. Run audit to verify consistency
安全使用建议
This skill appears to do what it says: it downloads, converts, indexes, and audits literature collections using the included shell scripts. Before installing or running it, consider: (1) Legal/ethical: the download workflow explicitly uses Sci‑Hub as a fallback for paywalled papers — that may violate publisher terms or local law; enable/disable that step according to your jurisdiction and policies. (2) Safety: the scripts perform network downloads (curl) and will write files to whatever output path you provide — run them in a sandboxed or isolated directory if you want to limit impact. (3) Review dependencies: ensure pdftotext, file, python3, and any optional uvx markitdown[pdf] are installed from trusted sources. (4) Rate limits and politeness: mass downloads can trigger rate limits or IP blocks; consider throttling and respecting publisher robots/terms. (5) Audit the scripts yourself if you have concerns: they are short, readable shell scripts that do not request secrets or modify system configs. If you want to avoid legal risk, remove or comment out the Sci‑Hub step in scripts/download.sh before use.
功能分析
Type: OpenClaw Skill Name: literature-manager Version: 1.1.1 The skill is classified as suspicious primarily due to two factors: 1) The `SKILL.md` explicitly instructs the AI agent to set up and manage `cron` jobs for monitoring and re-spawning other agents. While framed as a reliability feature, this grants the agent the capability to interact with system-level persistence mechanisms, which is a significant security risk and a potential prompt-injection vector if exploited. 2) The `scripts/download.sh` script and `SKILL.md` openly include `sci-hub.box` as a download source. While declared, connecting to a service known for copyright infringement introduces a connection to a potentially untrusted third-party domain, which carries inherent risks. There is no clear evidence of intentional malicious behavior like data exfiltration or unauthorized remote control, but these capabilities represent high-risk behaviors.
能力评估
Purpose & Capability
The name/description (literature search, download, convert, organize, audit) align with the provided scripts and instructions. The listed dependencies (pdftotext, curl, python3, file, uvx markitdown[pdf]) are appropriate. The one outlier is the explicit Sci‑Hub fallback for paywalled content — this is consistent with 'download' but raises legal/ethical considerations rather than a capability mismatch.
Instruction Scope
SKILL.md and the scripts confine actions to document-oriented tasks: web downloads (publisher, EuropePMC, arXiv, Sci‑Hub), PDF validation, text extraction, conversion, indexing, and file-system checks within a 'references' tree. There are no instructions to read unrelated system files, environment variables, or to exfiltrate arbitrary data. The SKILL.md does recommend spawning sub-agents for batching; that may cause broader network activity but is within the scope of large-batch literature processing.
Install Mechanism
No install spec; this is instruction-plus-scripts only. All code is provided as shell scripts—no remote downloads or install-time fetches are embedded in an install step. That reduces risk compared with arbitrary remote installers. The scripts themselves invoke curl at runtime to fetch PDFs (expected for the purpose).
Credentials
The skill requires no environment variables, no credentials, and no config-path access. Network access is necessary to fetch papers; scripts use public endpoints (publisher URLs, europepmc/NCBI, arXiv, and Sci‑Hub). There are no requests for unrelated secrets or broad credential access.
Persistence & Privilege
always is false and the skill does not request persistent system privileges or modify other skills. Scripts write files under user-specified output/reference directories only. Autonomous agent invocation (allowed by default) could trigger network downloads at runtime, but that is expected for this skill and is not elevated by additional privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install literature-manager
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /literature-manager 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.1
Fix doc/code mismatch: convert.sh now uses uvx markitdown[pdf]; download.sh now includes Sci-Hub as Strategy 5; add legal disclaimer for Sci-Hub in SKILL.md
v1.1.0
Fix markitdown command (uvx markitdown[pdf]); add Sci-Hub fallback download; add DOI-based filename & paper_md_mapping guidance; add index.json pitfall notes; add sub-agent cron monitor rules
v1.0.0
Initial release: search, download, convert, organize, and audit academic literature collections. Includes multi-source PDF download with fallback, PDF-to-markdown conversion, structured indexing (index.json + README.md), and full library audit scripts.
元数据
Slug literature-manager
版本 1.1.1
许可证
累计安装 2
当前安装数 2
历史版本数 3
常见问题

Literature Manager 是什么?

Search, download, convert, organize, and audit academic literature collections. Use when asked to find papers, build a literature library, add papers to refe... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 905 次。

如何安装 Literature Manager?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install literature-manager」即可一键安装,无需额外配置。

Literature Manager 是免费的吗?

是的,Literature Manager 完全免费(开源免费),可自由下载、安装和使用。

Literature Manager 支持哪些平台?

Literature Manager 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Literature Manager?

由 IsonaEi(@isonaei)开发并维护,当前版本 v1.1.1。

💬 留言讨论