← 返回 Skills 市场
chrisling-dev

Links to PDFs

作者 chrisling-dev · GitHub ↗ · v0.0.1
cross-platform ⚠ suspicious
2122
总下载
2
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install links-to-pdfs
功能描述
Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.
使用说明 (SKILL.md)

docs-scraper

CLI tool that scrapes documents from various sources into local PDF files using browser automation.

Installation

npm install -g docs-scraper

Quick start

Scrape any document URL to PDF:

docs-scraper scrape https://example.com/document

Returns local path: ~/.docs-scraper/output/1706123456-abc123.pdf

Basic scraping

Scrape with daemon (recommended, keeps browser warm):

docs-scraper scrape \x3Curl>

Scrape with named profile (for authenticated sites):

docs-scraper scrape \x3Curl> -p \x3Cprofile-name>

Scrape with pre-filled data (e.g., email for DocSend):

docs-scraper scrape \x3Curl> -D [email protected]

Direct mode (single-shot, no daemon):

docs-scraper scrape \x3Curl> --no-daemon

Authentication workflow

When a document requires authentication (login, email verification, passcode):

  1. Initial scrape returns a job ID:

    docs-scraper scrape https://docsend.com/view/xxx
    # Output: Scrape blocked
    #         Job ID: abc123
    
  2. Retry with data:

    docs-scraper update abc123 -D [email protected]
    # or with password
    docs-scraper update abc123 -D [email protected] -D password=1234
    

Profile management

Profiles store session cookies for authenticated sites.

docs-scraper profiles list     # List saved profiles
docs-scraper profiles clear    # Clear all profiles
docs-scraper scrape \x3Curl> -p myprofile  # Use a profile

Daemon management

The daemon keeps browser instances warm for faster scraping.

docs-scraper daemon status     # Check status
docs-scraper daemon start      # Start manually
docs-scraper daemon stop       # Stop daemon

Note: Daemon auto-starts when running scrape commands.

Cleanup

PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.

Manual cleanup:

docs-scraper cleanup                    # Delete all PDFs
docs-scraper cleanup --older-than 1h    # Delete PDFs older than 1 hour

Job management

docs-scraper jobs list         # List blocked jobs awaiting auth

Supported sources

  • Direct PDF links - Downloads PDF directly
  • Notion pages - Exports Notion page to PDF
  • DocSend documents - Handles DocSend viewer
  • LLM fallback - Uses Claude API for any other webpage

Scraper Reference

Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.

DirectPdfScraper

Handles: URLs ending in .pdf

Data fields: None (downloads directly)

Example:

docs-scraper scrape https://example.com/document.pdf

DocsendScraper

Handles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com)

URL patterns:

  • Documents: https://docsend.com/view/{id} or https://docsend.com/v/{id}
  • Folders: https://docsend.com/view/s/{id}
  • Subdomains: https://{subdomain}.docsend.com/view/{id}

Data fields:

Field Type Description
email email Email address for document access
password password Passcode/password for protected documents
name text Your name (required for NDA-gated documents)

Examples:

# Pre-fill email for DocSend
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected]

# With password protection
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected] -D password=secret123

# With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected] -D name="John Doe"

# Retry blocked job
docs-scraper update abc123 -D [email protected] -D password=secret123

Notes:

  • DocSend may require any combination of email, password, and name
  • Folders are scraped as a table of contents PDF with document links
  • The scraper auto-checks NDA checkboxes when name is provided

NotionScraper

Handles: notion.so/*, *.notion.site/*

Data fields:

Field Type Description
email email Notion account email
password password Notion account password

Examples:

# Public page (no auth needed)
docs-scraper scrape https://notion.so/Public-Page-abc123

# Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123 \
  -D [email protected] -D password=mypassword

# Custom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123

Notes:

  • Public Notion pages don't require authentication
  • Toggle blocks are automatically expanded before PDF generation
  • Uses session profiles to persist login across scrapes

LlmFallbackScraper

Handles: Any URL not matched by other scrapers (automatic fallback)

Data fields: Dynamic - determined by Claude analyzing the page

The LLM scraper uses Claude to analyze the page HTML and detect:

  • Login forms (extracts field names dynamically)
  • Cookie banners (auto-dismisses)
  • Expandable content (auto-expands)
  • CAPTCHAs (reports as blocked)
  • Paywalls (reports as blocked)

Common dynamic fields:

Field Type Description
email email Login email (if detected)
password password Login password (if detected)
username text Username (if login uses username)

Examples:

# Generic webpage (no auth)
docs-scraper scrape https://example.com/article

# Webpage requiring login
docs-scraper scrape https://members.example.com/article \
  -D [email protected] -D password=secret

# When blocked, check the job for required fields
docs-scraper jobs list
# Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret

Notes:

  • Requires ANTHROPIC_API_KEY environment variable
  • Field names are extracted from the page's actual form fields
  • Limited to 2 login attempts before failing
  • CAPTCHAs require manual intervention

Data field summary

Scraper email password name Other
DirectPdf - - - -
DocSend -
Notion - -
LLM Fallback ✓* ✓* - Dynamic*

*Fields detected dynamically from page analysis

Environment setup (optional)

Only needed for LLM fallback scraper:

export ANTHROPIC_API_KEY=your_key

Optional browser settings:

export BROWSER_HEADLESS=true   # Set false for debugging

Common patterns

Archive a Notion page:

docs-scraper scrape https://notion.so/My-Page-abc123

Download protected DocSend:

docs-scraper scrape https://docsend.com/view/xxx
# If blocked:
docs-scraper update \x3Cjob-id> -D [email protected] -D password=1234

Batch scraping with profiles:

docs-scraper scrape https://site.com/doc1 -p mysite
docs-scraper scrape https://site.com/doc2 -p mysite

Output

Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf) Blocked: Job ID + required credential types

Troubleshooting

  • Timeout: docs-scraper daemon stop && docs-scraper daemon start
  • Auth fails: docs-scraper jobs list to check pending jobs
  • Disk full: docs-scraper cleanup to remove old PDFs
安全使用建议
Before installing or using this skill: 1) Treat the npm package as unverified — find its npm/GitHub page and inspect the source and maintainer. 2) Do not provide real account passwords or sensitive credentials until you confirm how and where they are stored; profile/session cookies will be written to disk (~/.docs-scraper). 3) The LLM fallback will upload page HTML to an external service (Claude) — that can leak private document contents; verify what API key is required and how data is sent. 4) Prefer running the scraper in a sandboxed environment or use a browser/manual export for sensitive documents. 5) If you need this capability, ask the publisher for a homepage/repo, a signed release, and clear docs on credential handling and where files/processes are persisted; absence of those is a red flag.
功能分析
Type: OpenClaw Skill Name: links-to-pdfs Version: 0.0.1 The skill bundle is classified as suspicious due to its inherent high-risk capabilities, even though they align with the stated purpose. The `SKILL.md` instructs the agent to install an external `docs-scraper` CLI tool via `npm`, which introduces a supply chain risk. It explicitly handles sensitive user credentials (email, password) for authentication and requires access to the `ANTHROPIC_API_KEY` environment variable for external API calls, which are significant data access capabilities. Furthermore, it manages a persistent background daemon process and stores session cookies, allowing for continued operation and session hijacking if compromised. While these actions are presented as necessary for document scraping, they represent a broad attack surface and high privilege requirements without clear malicious intent within the provided instructions.
能力评估
Purpose & Capability
The SKILL.md describes a scraper that uses a globally-installed npm package, session profiles, and an LLM fallback (Claude). That functionality aligns with 'download/convert webpages to PDF', but the skill metadata declares no install, no config paths, and no required credentials — which is inconsistent with the described capabilities (daemon, profiles, and LLM API access all imply filesystem and credential usage).
Instruction Scope
The runtime instructions instruct installing and running an external CLI that will perform browser automation, accept site credentials (email/password), persist session cookies/profiles, auto-check NDA checkboxes, and send page HTML to an LLM (Claude) as a fallback. Those behaviors go beyond a simple 'download a PDF' helper and involve collecting/transmitting potentially sensitive content and credentials.
Install Mechanism
Although the skill bundle contains no install spec, the SKILL.md explicitly tells users to run `npm install -g docs-scraper` (global install from the npm registry). That is a moderate-to-high risk action because it fetches and executes third-party code outside the skill bundle; no source URL, homepage, or verified release is provided in the metadata to validate the package.
Credentials
The SKILL.md mentions an LLM fallback using Claude and also describes handling site credentials and session profiles, yet the skill metadata declares no environment variables or config paths. Missing declarations for an external LLM API key (or where profiles are stored/secured) is a proportionality and transparency mismatch — the tool will likely require secrets and filesystem storage that are not declared.
Persistence & Privilege
The scraper runs a daemon that auto-starts, keeps browser instances and session profiles, and stores files under ~/.docs-scraper/output. The skill metadata does not declare these config paths or mention persistent background activity. The lack of disclosure about persistent files/processes is a concern for persistence and privilege scope.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install links-to-pdfs
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /links-to-pdfs 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.0.1
Initial public release of the links-to-pdfs skill. - Scrapes documents from Notion, DocSend, direct PDFs, and other web sources into local PDF files. - Supports authentication workflows and session persistence via profiles for protected documents. - Includes a command-line interface with profile and job management, daemon for faster scrapes, and automatic cleanup. - Provides fallback to LLM-based scraping for unsupported or dynamic websites. - Returns local file paths to downloaded PDFs.
元数据
Slug links-to-pdfs
版本 0.0.1
许可证
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Links to PDFs 是什么?

Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 2122 次。

如何安装 Links to PDFs?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install links-to-pdfs」即可一键安装,无需额外配置。

Links to PDFs 是免费的吗?

是的,Links to PDFs 完全免费(开源免费),可自由下载、安装和使用。

Links to PDFs 支持哪些平台?

Links to PDFs 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Links to PDFs?

由 chrisling-dev(@chrisling-dev)开发并维护,当前版本 v0.0.1。

💬 留言讨论