功能描述

Fetches and summarizes recent arXiv and Hugging Face papers with Agentic Paper Digest. Use when the user wants a paper digest, a JSON feed of recent papers, or to run the arXiv/HF pipeline.

使用说明 (SKILL.md)

Agentic Paper Digest

Name: Agentic Paper Digest Skill
Author: matanle51

When to use

Fetch a recent paper digest from arXiv and Hugging Face.
Produce JSON output for downstream agents.
Run a local API server when a polling workflow is needed.

Prereqs

Python 3 and network access.
LLM access via OPENAI_API_KEY or an OpenAI-compatible provider via LITELLM_API_BASE + LITELLM_API_KEY.
git is optional for bootstrap; otherwise curl/wget (or Python) is used to download the repo.

Get the code and install

Preferred: run the bootstrap helper script. It uses git when available or falls back to a zip download.

bash "{baseDir}/scripts/bootstrap.sh"

Override the clone location by setting PROJECT_DIR.

PROJECT_DIR="$HOME/agentic_paper_digest" bash "{baseDir}/scripts/bootstrap.sh"

Run (CLI preferred)

bash "{baseDir}/scripts/run_cli.sh"

Pass through CLI flags as needed.

bash "{baseDir}/scripts/run_cli.sh" --window-hours 24 --sources arxiv,hf

Run (API optional)

bash "{baseDir}/scripts/run_api.sh"

Trigger runs and read results.

curl -X POST http://127.0.0.1:8000/api/run
curl http://127.0.0.1:8000/api/status
curl http://127.0.0.1:8000/api/papers

Stop the API server if needed.

bash "{baseDir}/scripts/stop_api.sh"

Outputs

CLI --json prints run_id, seen, kept, window_start, and window_end.
Data store: data/papers.sqlite3 (under PROJECT_DIR).
API: POST /api/run, GET /api/status, GET /api/papers, GET/POST /api/topics, GET/POST /api/settings.

Configuration

Config files live in PROJECT_DIR/config. Environment variables can be set in the shell or via a .env file. The wrappers here auto-load .env from PROJECT_DIR (override with ENV_FILE=/path/to/.env).

Environment (.env or exported vars)

OPENAI_API_KEY: required for OpenAI models (litellm reads this).
LITELLM_API_BASE, LITELLM_API_KEY: use an OpenAI-compatible proxy/provider.
LITELLM_MODEL_RELEVANCE, LITELLM_MODEL_SUMMARY: models for relevance and summarization (summary defaults to relevance model if unset).
LITELLM_TEMPERATURE_RELEVANCE, LITELLM_TEMPERATURE_SUMMARY: lower for more deterministic output.
LITELLM_MAX_RETRIES: retry count for LLM calls.
LITELLM_DROP_PARAMS=1: drop unsupported params to avoid provider errors.
WINDOW_HOURS, APP_TZ: recency window and timezone.
ARXIV_CATEGORIES: comma-separated categories (default includes cs.CL,cs.AI,cs.LG,stat.ML,cs.CR).
ARXIV_API_BASE, HF_API_BASE: override source endpoints if needed.
ARXIV_MAX_RESULTS, ARXIV_PAGE_SIZE: arXiv paging limits.
MAX_CANDIDATES_PER_SOURCE: cap candidates per source before LLM filtering.
FETCH_TIMEOUT_S, REQUEST_TIMEOUT_S: source fetch and per-request timeouts.
ENABLE_PDF_TEXT=1: include first-page PDF text in summaries; requires PyMuPDF (pip install pymupdf).
DATA_DIR: location for papers.sqlite3.
CORS_ORIGINS: comma-separated origins allowed by the API server (UI use).
Path overrides: TOPICS_PATH, SETTINGS_PATH, AFFILIATION_BOOSTS_PATH.

Config files

config/topics.json: list of topics with id, label, description, max_per_topic, and keywords. The relevance classifier must output topic IDs exactly as defined here. max_per_topic also caps results in GET /api/papers when apply_topic_caps=1.
config/settings.json: overrides fetch limits (arxiv_max_results, arxiv_page_size, fetch_timeout_s, max_candidates_per_source). Updated via POST /api/settings.
config/affiliations.json: list of {pattern, weight} boosts applied by substring match over affiliations. Weights add up and are capped at 1.0. Invalid JSON disables boosts, so keep the file strict JSON (no trailing commas).

Mandatory workflow (follow step-by-step)

You first MUST open and read the configuration from the github repo: https://github.com/matanle51/agentic_paper_digest you downloaded:
- Load config/topics.json, config/settings.json, and config/affiliations.json (if present).
- Note current topic IDs, caps, and fetch limits before asking the user to change them.
ASK THE USER TO PROVIDE IT'S PREFERENCES ABOUT THE FOLLOWING (HELP THE USER):
- Topics of interest → update config/topics.json (topics[].id/label/description/keywords, max_per_topic).
  Show current defaults and ask whether to keep or change them.
- Time window (hours) → set WINDOW_HOURS (or pass --window-hours to CLI) only if the user cares; otherwise keep default to 24h.
- ASK THE USER TO FILL THE FOLLOWING PARAMETERS (explain the user why are their intent): ARXIV_CATEGORIES, ARXIV_MAX_RESULTS, ARXIV_PAGE_SIZE, MAX_CANDIDATES_PER_SOURCE.
  Ask whether to keep defaults and show the current values.
- Model/provider → set OPENAI_API_KEY or LITELLM_API_KEY (+ LITELLM_API_BASE if proxy), and set LITELLM_MODEL_RELEVANCE/LITELLM_MODEL_SUMMARY.
- Do NOT ask by default: timezone, quality vs cost, timeouts, PDF text, affiliation biasing, sources list. Use defaults unless the user requests changes.
Confirm workspace path: Ask where to clone/run. Default to PROJECT_DIR="$HOME/agentic_paper_digest" if the user doesn’t care. Never hardcode /Users/... paths.
Bootstrap the repo: Run the bootstrap script (unless the repo already exists and the user says to skip).
Create or verify .env:
- If .env is missing, create it from .env.example (in the repo), then ask the user to fill keys and any requested preferences.
- Ensure at least one of OPENAI_API_KEY or LITELLM_API_KEY is set before running.
Apply config changes:
- Edit JSON files directly (or use POST /api/topics and POST /api/settings if running the API).
Run the pipeline:
- Prefer scripts/run_cli.sh for one-off JSON output.
- Use scripts/run_api.sh only if the user explicitly asks for UI/API access or polling.
Report results:
- If results are sparse, suggest increasing WINDOW_HOURS, ARXIV_MAX_RESULTS, or broadening topics.

Getting good results

Help the user define and keep topics focused and mutually exclusive so the classifier can choose the right IDs.
Use a stronger model for summaries than for relevance if quality matters.
If using openAI's model, defualy to gpt-5-mini for good tradeoff.
Increase WINDOW_HOURS or ARXIV_MAX_RESULTS when results are sparse, or lower them if results are too noisy.
Tune ARXIV_CATEGORIES to your research domains.
Enable PDF text (ENABLE_PDF_TEXT=1) when abstracts are too thin.
Use modest affiliation weights to bias ranking without swamping relevance.
BE PROACTIVE AND HELP THE USER TUNE THE SKILL FOR GOOD RESULTS!

Troubleshooting

Port 8000 busy: run bash "{baseDir}/scripts/stop_api.sh" or pass --port to the API command.
Empty results: increase WINDOW_HOURS or verify the API key in .env.
Missing API key errors: export OPENAI_API_KEY or LITELLM_API_KEY in the shell before running.

安全使用建议

Before installing/running: 1) Review the upstream GitHub repository (https://github.com/matanle51/agentic_paper_digest) and inspect requirements.txt and the package code (paper_finder) so you understand what code will be installed and run. 2) Do not paste your real OPENAI_API_KEY (or other secrets) into .env until you trust the repo — consider using a restricted/test key. 3) Run the bootstrap and the service inside an isolated environment (container or dedicated VM) if possible, since pip will install third-party packages from the repo. 4) Set PROJECT_DIR to a non-sensitive, dedicated directory (not your home root) and check the contents of any auto-created .env. 5) If you require higher assurance, manually clone the repo, inspect files, and run pip install yourself rather than running bootstrap.sh blindly. If you want, I can list the exact files to inspect (requirements.txt, main package entrypoints) or help craft a safe sandbox command-line to run the bootstrap.

功能分析

Type: OpenClaw Skill Name: agentic-paper-digest-skill Version: 0.3.3 The skill provides a legitimate tool for fetching and summarizing papers. The `SKILL.md` instructions guide the agent through standard setup, configuration, and execution steps, including asking the user for necessary LLM API keys. The `scripts/bootstrap.sh` downloads code from a hardcoded, legitimate GitHub repository (https://github.com/matanle51/agentic_paper_digest), and other scripts manage the application's lifecycle. There is no evidence of data exfiltration, malicious execution, persistence mechanisms, or prompt injection attempts against the agent to perform harmful actions.

能力评估

ℹ Purpose & Capability

The name/description (paper digests from arXiv/Hugging Face) align with the runtime instructions and scripts. The skill legitimately needs Python, network access, and an LLM API key. However, registry metadata does not declare required env vars (OPENAI_API_KEY / LITELLM_*), and the SKILL.md explicitly requires network/git access and LLM credentials — this metadata mismatch is worth noting.

⚠ Instruction Scope

Runtime instructions require you (or the agent) to open and read config files from the downloaded repo and to source a .env file. The provided run scripts will export and source ENV_FILE (.env) automatically, which may expose any secrets in that file to the running process. The SKILL.md also instructs the agent to ask the user for LLM credentials and other configuration; that is expected for operation but increases the sensitive-surface the skill touches (local config + API keys).

⚠ Install Mechanism

There is no registry install spec, but the included bootstrap.sh downloads the GitHub repository (zip or git clone), creates/activates a virtualenv and runs pip install -r requirements.txt from that repo. This is a common pattern but carries moderate risk: arbitrary Python packages and code from the upstream repo will be installed/executed on your system. The download URL is a GitHub repo (not a shortener or unknown host), which reduces but does not eliminate risk.

⚠ Credentials

The registry lists no required env vars, yet SKILL.md and the scripts expect LLM credentials (OPENAI_API_KEY or LITELLM_API_KEY/BASE) and many optional envs. The run scripts auto-source an ENV_FILE (.env) and export its contents, which can include unrelated secrets. Requesting an LLM API key is proportional to the stated purpose, but the lack of that declaration in registry metadata and the automatic sourcing of .env are mismatched and increase exposure.

✓ Persistence & Privilege

always is false and the skill does not demand permanent system-wide presence. The skill's scripts install into a user-controlled PROJECT_DIR and create a virtualenv there; they don't modify other skills or global agent settings. Autonomous invocation is allowed (platform default) but not exceptional here.

版本历史

v0.3.3

- Updated workflow instructions to prioritize proactive user interaction and guidance. - Clarified that the user must first open and read the configuration from the downloaded GitHub repo. - Emphasized assisting the user in providing preferences for topics, time window, and core parameters. - Added a note to default to 24 hours for the time window unless the user specifies otherwise. - Added recommendation to default OpenAI model to "gpt-5-mini" for optimal tradeoff. - Added reminders for skill developers to help the user tune for good results. - Minor edits for clarity, directness, and improved step-by-step guidance.

v0.3.2

- Expanded environment variable and config file options, including new settings for retry counts, per-request timeouts, API base URLs, and CORS origins. - Added a mandatory step-by-step workflow for guided configuration and setup, clarifying what to ask the user and when to use defaults. - Updated path guidance to default to `$HOME/agentic_paper_digest` and avoid hardcoded user paths. - Clarified roles of various scripts and improved instructions for both CLI and API usage. - Enhanced instructions for managing keys, environment variables, and applying configuration changes.

v0.3.1

No changes detected in this version. - No file or documentation updates; the skill remains unchanged from the previous version.

v0.3.0

No changes detected in this version. - Version 0.3.0 introduces no file or documentation updates compared to the previous release.

v0.2.0

No changes were detected in this version. - Released version 0.2.0 with identical content as the previous release. - No updates to features, documentation, or configuration.

v0.1.0

- Initial release of agentic-paper-digest-skill. - Fetches and summarizes recent arXiv and Hugging Face papers. - Provides CLI and API workflows for paper digests and JSON feeds. - Configurable via environment variables and project config files. - Supports paper filtering, topic classification, and affiliation-based ranking.

v1.0.0

Initial release of the agentic-paper-digest-skill. - Fetches and summarizes recent arXiv and Hugging Face papers using Agentic Paper Digest. - Supports both CLI and local API workflows for retrieving paper digests as JSON. - Configurable with environment variables and config files for API keys, categories, recency window, and ranking options. - Outputs include JSON feeds, an SQLite data store, and REST API endpoints for papers, topics, and settings. - Optional PDF text extraction and affiliation-based ranking boosts are supported.

元数据

Slug agentic-paper-digest-skill

版本 0.3.3

许可证 —

累计安装 12

当前安装数 12

历史版本数 7

常见问题