← 返回 Skills 市场
bigdogaaa

academic-talon(学术利爪)

作者 TongChaodong · GitHub ↗ · v1.1.2 · MIT-0
cross-platform ✓ 安全检测通过
218
总下载
0
收藏
0
当前安装
9
版本数
在 OpenClaw 中安装
/install academic-talon
功能描述
🎓 Full-stack academic research assistant - Search papers → Extract publication-ready BibTeX (header) → Full TEI XML document structure parsing (via GROBID)...
使用说明 (SKILL.md)

🎓 Academic Talon Skill

Your AI-powered academic research assistant for paper search → BibTeX extraction → Zotero archiving → local PDF serving.

Save hours of manual work searching papers, copying citations, and organizing your library.


🎯 What it does (when to use this skill)

Trigger this skill when the user wants to:

Task Description
🔍 Search papers Find papers across multiple academic search engines (arXiv, Google Scholar, Semantic Scholar, Tavily)
📝 Extract BibTeX (header analysis) Parse PDF header and output publication-ready BibTeX matching AI conference/journal standards
📄 Full text analysis Extract full document structure in TEI XML format for further processing
🗄️ Archive to Zotero Automatically save papers to your Zotero library, default to openclaw collection, auto-create collections
📂 Local PDF library Maintain a local PDF collection and serve it via HTTP for direct access from Zotero

🔧 Architecture & Dependencies

This is a toolbox skill that provides multiple independent academic research tools. You can use just the features you need. A common complete workflow looks like this:

User Query
    ↓
[academic-talon] ← this skill
    ↓
1. Search → Multiple search APIs (arXiv, Google Scholar via SerpAPI, etc.)
    ↓
2. PDF Download → saved to local `pdfs/` directory
    ↓
3. PDF Parsing → **GROBID service** processes PDF
    ↓
   - Header analysis → extracts metadata → skill generates clean BibTeX
   - Full text analysis → returns complete TEI XML with full document structure
    ↓
4. If header analysis: BibTeX Generation → skill formats clean publication-ready output
    ↓
5. Zotero Archiving → via **pyzotero** → your Zotero library → auto-add to collection
    ↓
6. PDF Serving → built-in HTTP server serves PDFs from your intranet
    ↓
Result: Paper in Zotero with working PDF link, clean BibTeX ready for citation

You don't have to use this full workflow - use individual tools as needed.

Required External Services

Service Purpose Why do you need it? Required?
GROBID PDF metadata extraction Parses PDF headers to extract title, authors, publication info for BibTeX Required
Zotero API Paper archiving Stores papers in your Zotero library with correct metadata Required for archiving
SerpAPI Key Google Scholar search enables searching Google Scholar ⚙️ Optional (enables more results)
Semantic Scholar API Key Semantic Scholar search enables Semantic Scholar results ⚙️ Optional
Tavily API Key Tavily search enables Tavily results ⚙️ Optional

⚙️ Setup Instructions

1. Install Python dependencies

pip install -r skills/academic-talon/requirements.txt

2. Configure environment variables (skills/academic-talon/.env)

# ========== Zotero Configuration (Required for archiving) ==========
ZOTERO_API_KEY=your_zotero_api_key_here
ZOTERO_LIBRARY_ID=your_library_id_here
ZOTERO_LIBRARY_TYPE=user  # or "group" for group libraries

# ========== GROBID Configuration (Required for PDF parsing) ==========
GROBID_API_URL=http://localhost:8070/api
# Or if you use Docker Compose behind nginx:
# GROBID_API_URL=http://localhost:8080/api

# ========== Optional Search API Keys ==========
# Get these from their respective websites
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
SERPAPI_KEY=your_serpapi_key_for_google_scholar
TAVILY_API_KEY=your_tavily_api_key

# ========== Local PDF Serving (Optional) ==========
# After starting the PDF server, set this to your intranet URL:
# Example: PDF_BASE_URL=http://192.168.1.100:8000/
PDF_BASE_URL=http://your-server-ip:port/
Environment Variable What it does
ZOTERO_API_KEY Your Zotero API key from Zotero settings
ZOTERO_LIBRARY_ID Your Zotero library ID (found in Zotero API URL)
ZOTERO_LIBRARY_TYPE "user" for your personal library, "group" for group libraries
GROBID_API_URL URL of your GROBID service endpoint
PDF_BASE_URL Base URL for your locally running PDF server (e.g. http://10.26.20.168:18001/)

3. Start GROBID (for PDF parsing)

Option A: Docker Compose (Recommended)

Create compose.yml in your GROBID directory:

version: "3.9"
services:
  grobid:
    # Choose the right image for your hardware:
    # - For non-GPU environments: grobid/grobid:0.8.2-crf (CRF-only model, smaller)
    # - For GPU environments: grobid/grobid:0.8.2-full (includes CRF + deep learning models)
    image: grobid/grobid:0.8.2-crf
    container_name: grobid
    restart: unless-stopped
    expose:
      - "8070"
    environment:
      JAVA_OPTS: "-Xms512m -Xmx4g"
    volumes:
      - ./grobid/tmp:/opt/grobid/tmp
      - ./grobid/logs:/opt/grobid/logs

💡 Image selection: Use grobid/grobid:0.8.2-crf for CPU-only / non-GPU environments (smaller image, faster startup). Use grobid/grobid:0.8.2-full if you have GPU and want maximum accuracy with deep learning models.

Start:

docker-compose up -d

Option B: Direct run

Follow GROBID documentation to run directly.

4. (Optional) Start the Local PDF Server

If you want to serve downloaded PDFs locally:

# Start on port 8000, allow all intranet access
python skills/academic-talon/scripts/start_pdf_server.py start 8000 内网

# Check status
python skills/academic-talon/scripts/start_pdf_server.py status

# Stop
python skills/academic-talon/scripts/start_pdf_server.py stop

The server:

  • Serves only from the pdfs/ directory (sandboxed, no access outside)
  • Default binds to all interfaces → accessible from your entire intranet
  • Filenames are citation keys (e.g. zhang2025hallucinationdetection.pdf)
  • When PDF_BASE_URL is configured, archived papers automatically get the correct local URL

📖 Usage (for LLM)

Input Schema

Parameter Type Description Required Default
action string Action to perform: search, download, analyze, archive Yes search
query string Search keywords Yes (search) -
limit integer Max results to return No 10
source string Search source: all, arxiv, google_scholar, semantic_scholar, tavily No all
engine_weights object How many results from each engine No {"arxiv": 5, "google_scholar": 3, "semantic_scholar": 1, "tavily": 1}
url string PDF URL to download Yes (download) -
filename string Custom filename for downloaded PDF No auto from citation key
paper_info object Paper metadata (title, authors, year) for citation key generation No -
pdf_input string Path to local PDF or URL to remote PDF Yes (analyze) -
analysis_type string header → outputs publication-ready BibTeX; fulltext → outputs TEI XML of full document No header
collection string Zotero collection name to add paper to No openclaw

Output Format

All actions return JSON in this format:

{
  "success": true,
  "action": "search",
  "query": "your search query",
  "results": [
    {
      "title": "Paper Title",
      "authors": ["Author One", "Author Two"],
      "year": "2025",
      "abstract": "Paper abstract...",
      "url": "https://...",
      "pdf_url": "https://...",
      "source": "arxiv"
    }
  ]
}

✨ Features (and how they help your research)

1. Fixed arXiv Search

  • Before: arXiv API defaults to OR semantics → searching "LLM judge knowledge possession" returns papers with just one keyword → many irrelevant results
  • Now: Proper AND semantics matches what you get in browser search. Every result contains all query terms in title or abstract.
  • 🎯 Benefit: Get relevant results first try, no scrolling through irrelevant papers

2. Publication-Ready BibTeX Generation

  • Follows exactly the format used by top AI conferences (NeurIPS, ICML, ICLR, CVPR, etc.)
  • Correct entry types:
    • Journal article → @article
    • Conference paper → @inproceedings with conference name in booktitle
    • arXiv preprint → @article with journal = {arXiv preprint xxxx.xxxxx} exactly matching your example
  • Cleans up junk: removes unnecessary fields like date, month, publisher, day that shouldn't be in final submissions
  • Correct citation keys: lastnameYearTitlezhang2025hallucinationdetection matches standard academic practice

Example output (ready to paste into your manuscript):

@article{zhang2025hallucinationdetection,
  author = {Zhang, Chenggong and Wang, Haopeng},
  title = {Hallucination Detection and Evaluation of Large Language Model},
  year = {2025},
  journal = {arXiv preprint 2512.22416},
  abstract = {Hallucinations in Large Language Models...},
}
@inproceedings{gal2016dropout,
  author = {Gal, Yarin and Ghahramani, Zoubin},
  title = {Dropout as a bayesian approximation: Representing model uncertainty in deep learning},
  booktitle = {ICML},
  year = {2016},
}

3. Smart Zotero Archiving

  • 🎯 Default collection: all papers go to openclaw unless you specify otherwise
  • 🪄 Auto-creation: if the collection doesn't exist, skill automatically creates it
  • 🔄 Smart duplicate handling: if paper already exists in your library, skill adds it to the target collection instead of failing
  • 🏷️ Correct Zotero types: preprint → preprint, conference → conferencePaper, journal → journalArticle
  • 📍 Local PDF links: when you run the local PDF server, links point directly to your local copy

Benefit: Build your research library without repetitive manual clicking.

4. Local PDF Library Serving

  • Maintain all your PDFs locally
  • Built-in HTTP server with start/stop/status management
  • Designed for intranet access → you can access your PDFs from any device on your network
  • Zotero links point directly to local files → no downloading the same PDF multiple times

🔒 Security Considerations

⚠️ Important Security Notes

  1. PDF Processing goes to GROBID:

    • This skill sends PDF content to the configured GROBID_API_URL for metadata extraction
    • Recommendation: Run GROBID locally on your own machine/infrastructure for privacy
    • If you use a third-party GROBID service, be aware that they will see your PDFs
  2. Local PDF Server:

    • This skill runs an HTTP server that serves PDF files from the pdfs/ directory
    • It is designed for intranet/private network use only
    • The server does NOT include authentication
    • Do NOT expose this server directly to the public internet
    • ✅ Only run on trusted private networks, or put it behind a reverse proxy with authentication
  3. File Access Restrictions:

    • All file operations (download, analysis) are sandboxed to the pdfs/ directory within this skill's installation
    • Directory traversal attacks are prevented by path checking
    • The skill cannot access or modify files outside its own directory
  4. API Key Storage:

    • All API keys are stored locally in the .env file
    • Never commit .env to version control
    • Keys are only used for API requests directly from your machine to the service providers

Best Security Practices

  • ✅ Run GROBID locally (don't send sensitive PDFs to third parties)
  • ✅ Keep PDF server on private/intranet network only
  • ✅ Use reverse proxy with authentication if you need public access
  • ✅ Use a dedicated Zotero API key with limited permissions
  • ✅ Don't expose GROBID directly to the internet (use the recommended nginx proxy with IP whitelist)

📋 Complete Workflow Example

# 1. Search for papers
result = skill.run({
  "action": "search",
  "query": "LLM judge knowledge possession",
  "limit": 5
})

# 2. Download PDF for first result
paper = result["results"][0]
download_result = skill.run({
  "action": "download",
  "url": paper["pdf_url"],
  "paper_info": paper
})

# 3. Extract BibTeX from downloaded PDF
analyze_result = skill.run({
  "action": "analyze",
  "pdf_input": download_result["pdf_path"],
  "analysis_type": "header"
})

# 4. Archive to Zotero (goes to openclaw collection by default)
paper["bibtex"] = analyze_result["result"]
archive_result = skill.run({
  "action": "archive",
  "paper_info": paper
})

if archive_result["success"]:
  print(f"✅ Paper archived to Zotero: {archive_result['result']['item_id']}")

🐛 Troubleshooting

Problem Solution
GROBID server not accessible Check GROBID is running, verify GROBID_API_URL in .env
Zotero API error Check ZOTERO_API_KEY and ZOTERO_LIBRARY_ID are correct
arXiv search returns nothing Check network connectivity, arXiv API sometimes blocks unusual IPs
PDF analysis returns empty Check PDF isn't corrupted, verify GROBID is working
Local PDF link doesn't work Check PDF server is running, verify PDF_BASE_URL matches server address
Duplicate papers in Zotero Skill detects duplicates by title/DOI and adds to collection, safe to ignore

📊 Benefits for Academic Research

  • Saves time: Go from keywords → archived paper in minutes instead of manually copying everything
  • Consistent citations: Always get clean BibTeX ready for journal/conference submission
  • Organized library: Automatic collection management keeps your papers organized
  • Local access: Keep all PDFs locally and access them from anywhere on your network
  • Correct search: Get relevant results from arXiv with proper AND semantics

📦 Dependencies Summary

  • Python: 3.6+
  • Python packages: requests, python-dotenv, pyzotero
  • External services: GROBID (PDF parsing), Zotero API (archiving)
  • Optional APIs: SerpAPI (Google Scholar), Semantic Scholar API, Tavily API

📄 License

MIT License - free for academic and commercial use.

安全使用建议
This skill appears to do what it says, but it intentionally downloads arbitrary PDFs and runs a local HTTP server — both are the main risk vectors. Before installing/running: 1) Only provide your Zotero API key to skills you trust; the key gives access to your Zotero library. 2) Run the skill in a network-isolated environment (VM or private host). 3) Do not expose the built-in PDF server to the public internet; bind it to localhost or a private IP and place it behind an authenticated reverse proxy/firewall. 4) Be careful about passing untrusted URLs to the skill (an attacker-controlled URL could cause the host to fetch internal resources). 5) Review and remove any secrets from the included .env file; store real credentials in a secure secrets manager. 6) Keep GROBID and Zotero API endpoints under your control (run Grobid locally if possible). If you need higher assurance, ask the publisher for provenance (source/homepage) or perform an independent code review in your environment.
能力评估
Purpose & Capability
Name/description (search, GROBID parsing, generate BibTeX, Zotero archiving, local PDF serving) match the code, declared env vars (ZOTERO_API_KEY, ZOTERO_LIBRARY_ID, GROBID_API_URL), and required binary (python). Required dependencies (requests, python-dotenv, pyzotero) are proportionate to the stated functionality.
Instruction Scope
SKILL.md and the code instruct the agent to call search APIs, download arbitrary PDF URLs, send PDFs to the user-provided GROBID endpoint, and POST metadata to Zotero. The code constrains file writes to a local pdfs/ directory and the HTTP server only serves .pdf files (path traversal checks and directory listing disabled). However, downloading arbitrary URLs and posting to external services means the skill will make outbound network requests (including user-supplied URLs) and could be used to trigger requests to internal services (SSRF-like behavior) if given untrusted inputs.
Install Mechanism
Instruction-only install spec (pip install -r requirements.txt) — dependencies come from PyPI and are minimal. No remote binary downloads or obscure install URLs. This is an expected installation mechanism for a Python skill.
Credentials
Requested environment variables (ZOTERO_API_KEY, ZOTERO_LIBRARY_ID, GROBID_API_URL) are directly used by the Zotero archiver and Grobid integration. Optional search API keys (Semantic Scholar, SerpAPI, Tavily) are referenced in SKILL.md and code as optional; they are not required. No unrelated credentials or surprising secret access is requested.
Persistence & Privilege
The skill does not request always:true and does not modify other skill configs. It creates local artifacts (.pdfs/, .cache/, .pdf_server.pid) and can fork a background HTTP server (defaults to 0.0.0.0). Running a server bound to all interfaces and writing PID/files is normal for its purpose but increases exposure if misconfigured or if you run it on a public-facing host.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install academic-talon
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /academic-talon 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.2
academic-talon 1.1.2 - No code changes detected in this release. - All setup instructions, descriptions, and metadata remain unchanged. - Functionality and configuration requirements are the same as previous version.
v1.1.1
academic-talon 1.1.1 - Added compiled Python cache files for core scripts to improve performance. - Enhanced full text analysis: PDF parsing now returns full TEI XML document structure (via GROBID) for deeper document understanding. - Security note clarified: the skill now warns to only serve PDFs on private intranet (not public internet). - Updated documentation with explicit environment requirements and clarified hardware choices for GROBID Docker images. - Minor metadata and documentation improvements to reflect expanded features and setup guidance.
v1.1.0
Version 1.1.0 of academic-talon introduces local PDF serving, improved BibTeX generation, and robust search and Zotero archiving. - Added scripts/start_pdf_server.py for secure, intranet-wide local PDF serving. - Enhanced BibTeX extraction for publication-ready output matching major conference/journal formats. - Fixed arXiv search to require all query terms (AND semantics), increasing relevance. - Zotero archiving now auto-creates collections and saves accurate PDF links, supporting local server URLs. - Updated documentation for streamlined setup and new workflows.
v1.0.5
academic-talon v1.0.5 - Added clarification that Zotero API credentials are optional and only required for archive functionality. - Updated Docker Compose and GROBID setup instructions for clarity and accuracy. - Improved metadata with additional notes about API keys and service requirements. - Minor documentation refinements for greater usability and clearer setup guidance.
v1.0.4
academic-talon 1.0.4 - Updated OpenClaw metadata: environment variable requirements (`env`) are now empty. - Added detailed GROBID server setup instructions, including Docker Compose and NGINX proxy configuration. - Provided explicit examples for securing the GROBID API with IP whitelisting and (optional) HTTP basic authentication. - No changes to logic or functionality; documentation improvements and metadata adjustment only.
v1.0.3
No user-facing changes in this version. - No file or documentation changes detected. - Functionality and instructions remain unchanged.
v1.0.2
academic-talon 1.0.2 - Updated code usage examples: Python import statements now use from skill import skill instead of from skills.paper_reader.skill. - No functional or interface changes; documentation and examples clarified.
v1.0.1
- Removes SEMANTIC_SCHOLAR_API_KEY, SERPAPI_KEY, and TAVILY_API_KEY from required environment variables—now optional. - Only ZOTERO_API_KEY and ZOTERO_LIBRARY_ID are required for setup. - No code or functional changes detected; documentation update only.
v1.0.0
Initial release: Provides academic paper search, PDF analysis, metadata extraction, and Zotero archiving. - Search academic papers across multiple engines (Semantic Scholar, arXiv, Google Scholar, Tavily) - Download and analyze PDF files for metadata in BibTeX or full-text XML formats - Archive papers in Zotero, supporting duplicate checking and collection assignment - Clear input schema and documentation for usage and setup
元数据
Slug academic-talon
版本 1.1.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 9
常见问题

academic-talon(学术利爪) 是什么?

🎓 Full-stack academic research assistant - Search papers → Extract publication-ready BibTeX (header) → Full TEI XML document structure parsing (via GROBID)... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 218 次。

如何安装 academic-talon(学术利爪)?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install academic-talon」即可一键安装,无需额外配置。

academic-talon(学术利爪) 是免费的吗?

是的,academic-talon(学术利爪) 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

academic-talon(学术利爪) 支持哪些平台?

academic-talon(学术利爪) 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 academic-talon(学术利爪)?

由 TongChaodong(@bigdogaaa)开发并维护,当前版本 v1.1.2。

💬 留言讨论