← 返回 Skills 市场
diamond2nv

Hfpclawer Paper Search

作者 diamond2nv · GitHub ↗ · v0.5.0 · MIT-0
cross-platform ✓ 安全检测通过
38
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install hfpclawer-paper-search
功能描述
Discover, download, and organize academic papers from arXiv, HuggingFace Papers, and OpenReview. Multi-source search → dedup → PDF download → Markdown conver...
使用说明 (SKILL.md)

hfpclawer Paper Search & Download

A multi-source academic paper pipe: search across arXiv / HuggingFace Papers / OpenReview / PapersWithCode, deduplicate by title, download PDFs, convert to Markdown, and optionally sync to a wiki.

Who this is for: Researchers who want a daily "new papers on my topic" feed without manually checking multiple websites.

Overview

Typical workflow in one command:

hfpclawer search           # Discover new papers across sources
   └── ranked by relevance to your keywords
hfpclawer download         # Download PDFs for matched papers  
   └── 8 concurrent streams
hfpclawer convert --to-wiki # PDF → readable Markdown + wiki sync

Or run the full pipeline at once:

hfpclawer full --max-pages 3 --to-wiki

Prerequisites

pip install hfpclawer>=0.5.0
hfpclawer init                      # Creates config.yaml in current directory

Edit config.yaml with your search interests (see Configuration section below).

Quick Start

1. First-time Setup

# Create default config
hfpclawer init

# Edit the config to match your research interests
vim config.yaml
# → Change: search.queries, keywords.include_high, keywords.exclude

2. One-Shot Full Pipeline (daily use)

# Discover → Download → Convert → Wiki sync in one command
hfpclawer full

# Limit pages for a quick check
hfpclawer full --max-pages 3 --to-wiki

3. Step-by-Step (for debugging)

# Step 1: Search across all sources
hfpclawer search --max-pages 5

# Step 2: Download PDFs for matched papers
hfpclawer download

# Step 3: Convert PDFs to Markdown
hfpclawer convert

# Step 4: Sync to wiki directory
hfpclawer convert --to-wiki

4. Monitor New Papers Regularly

# Check what papers have been downloaded
hfpclawer list

# Show paper store statistics
hfpclawer store stats

# Start the real-time download monitor
hfpclawer monitor start

Configuration

The config file config.yaml controls what papers are searched and downloaded:

search:
  max_per_dim: 50           # Papers per search query per source
  queries:
    - query: "neural operator"
      category: neural-operator
    - query: "physics-informed"
      category: physics-informed
    - query: "PDE solver deep learning"
      category: pde-solver

keywords:
  include_high:              # Papers must match these (OR)
    - "neural operator"
    - "pde"
    - "deep learning"
  include_low:               # Optional bonus keywords
    - "fourier"
    - "self-attention"
  exclude:                   # Exclude these topics
    - "quantum"
    - "llm"

classification:
  threshold_pass: 30         # Relevance score threshold (0-100)
  title_similarity_min: 0.40 # Dedup threshold

paths:
  data_dir: "data"           # SQLite DB location
  pdf_dir: "pdfs"            # Downloaded PDFs
  md_dir: "mds"              # Converted Markdown files

Available Commands

Command Purpose Common Flags
hfpclawer search Discover new papers --max-pages, --dry-run
hfpclawer download Download PDFs (runs from search results)
hfpclawer convert Convert PDF → MD --to-wiki syncs to raw/papers/
hfpclawer full All-in-one pipeline --max-pages, --to-wiki
hfpclawer list List downloaded papers
hfpclawer store stats Paper store statistics
hfpclawer store export Export store as JSON/CSV --format json
hfpclawer store verify Cross-verify paper metadata --arxiv-id
hfpclawer config Show current config
hfpclawer mcp Start MCP server (for LLM integration)
hfpclawer monitor Download daemon control start, stop, status
hfpclawer dedup Show dedup statistics

Daily Routine Examples

Morning — Check What's New

# Quick scan (3 pages per query, ~50 papers)
hfpclawer search --max-pages 3

# View results
hfpclawer store stats

Afternoon — Download & Read

# Download all new papers
hfpclawer download

# Convert to readable markdown
hfpclawer convert

# Read the best one
cat mds/2010.08895.md | head -80

Weekly — Full Pipeline

# Full sweep with wiki sync
hfpclawer full --max-pages 10 --to-wiki

# Validate references in newly added papers
hfpclawer audit verify "Key cited paper" --source openalex

Data Storage

hfpclawer uses three tiers:

Storage Location Content Persistence
SQLite data/papers.db Metadata, dedup, cross-ref Persistent
PDFs pdfs/ Raw paper PDFs Download once, keep
Markdown mds/ Converted text Regeneratable from PDFs

The paper store tracks:

  • arXiv ID, title, authors, abstract
  • Source of discovery (HF / arXiv / OpenReview)
  • Download status, conversion status
  • Wikified path (if synced)
  • Cross-verification with Crossref (DOI validation)

Common Pitfalls

  1. pip install needs to be in the right venv. If hfpclawer command is not found, check the active Python environment.
  2. HuggingFace CLI rate limits. Too many queries per minute will trigger 429s. Reduce max_per_dim to 10 if this happens.
  3. Scrapy spiders need scrapy extra installed. If you see ModuleNotFoundError: scrapy, run pip install hfpclawer[scrapy].
  4. PDF conversion needs pymupdf4llm. Run pip install hfpclawer[pdf] if hfpclawer convert complains about missing pymupdf4llm.
  5. Wiki sync defaults to raw/papers/. If you do not have a wiki directory, skip --to-wiki and read from mds/ directly.
  6. First run creates a config.yaml. Edit it before running hfpclawer full, otherwise the default queries may not match your research area.

Verification Checklist

  • hfpclawer init creates a valid config.yaml
  • hfpclawer search --dry-run validates config without network calls
  • hfpclawer search --max-pages 3 returns real papers
  • hfpclawer download downloads PDFs correctly
  • hfpclawer convert produces readable Markdown
  • hfpclawer store stats shows non-zero counts
  • hfpclawer store verify --arxiv-id 2010.08895 cross-checks via Crossref
安全使用建议
This result is incomplete: reinstall or rerun the scan in an environment where metadata.json and artifact/ can be read before making an install decision.
能力评估
Purpose & Capability
Unable to evaluate purpose against capabilities because metadata.json and artifact/ could not be read in this run.
Instruction Scope
Unable to evaluate runtime instructions because artifact contents could not be read.
Install Mechanism
Unable to evaluate install behavior because install specs and manifests could not be read.
Credentials
Unable to evaluate environment access or proportionality because workspace inspection failed.
Persistence & Privilege
Unable to evaluate persistence or privilege behavior because no artifact files were accessible through the command tool.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install hfpclawer-paper-search
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /hfpclawer-paper-search 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.5.0
Version 0.5.0 summary: Major update with full pipeline documentation and enhanced usability for daily research paper monitoring. - Comprehensive SKILL.md added with setup, workflow, and troubleshooting guide - Clear step-by-step instructions for searching, downloading, converting, and syncing papers - Multi-source support outlined: arXiv, HuggingFace Papers, OpenReview, PapersWithCode - Detailed configuration and daily use examples for streamlined research workflows - New sections on data storage, available commands, common pitfalls, and verification checklist - Designed for daily academic paper discovery and management across sources
元数据
Slug hfpclawer-paper-search
版本 0.5.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Hfpclawer Paper Search 是什么?

Discover, download, and organize academic papers from arXiv, HuggingFace Papers, and OpenReview. Multi-source search → dedup → PDF download → Markdown conver... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 38 次。

如何安装 Hfpclawer Paper Search?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install hfpclawer-paper-search」即可一键安装,无需额外配置。

Hfpclawer Paper Search 是免费的吗?

是的,Hfpclawer Paper Search 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Hfpclawer Paper Search 支持哪些平台?

Hfpclawer Paper Search 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Hfpclawer Paper Search?

由 diamond2nv(@diamond2nv)开发并维护,当前版本 v0.5.0。

💬 留言讨论