← 返回 Skills 市场
yumiu8103-hue

Scrapling Web Extractor

作者 yumiu8103-hue · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
462
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install web-markdown-scraper
功能描述
Fetch one or more public webpages with Scrapling, extract the main content, and convert HTML into Markdown using html2text. Supports static HTTP, concurrent...
安全使用建议
This skill appears internally consistent, but check the following before installing: 1) The script dynamically imports and relies on the external 'scrapling' package and Playwright — audit or trust those packages before installing them. 2) Using stealth mode and proxies is legitimately used to reach anti-bot protected pages, but you must not use the tool to bypass login walls, CAPTCHAs, paywalls, or access-restricted content (the SKILL.md states this). 3) Playwright installation downloads a Chromium binary; ensure you accept that download. 4) Proxy credentials passed at runtime will be used to route requests — keep them secure and avoid supplying credentials you don't trust. 5) The tool writes Markdown files and an automatch DB to the output directory; review and manage those local files as needed.
功能分析
Type: OpenClaw Skill Name: web-markdown-scraper Version: 1.0.0 The skill is a legitimate web-to-markdown scraper that utilizes the 'scrapling' and 'html2text' libraries. The core script (scripts/scrape_to_markdown.py) is well-structured, includes URL validation, and sanitizes filenames to prevent path traversal. While there is a significant discrepancy between the advanced features described in the documentation (SKILL.md and README.md)—such as stealth mode, proxy support, and anti-bot bypass—and the actual implementation in the Python script, this appears to be a functional bug or incomplete implementation rather than a security threat. There is no evidence of data exfiltration, malicious execution, or prompt injection attacks against the agent.
能力评估
Purpose & Capability
Name, description, README, SKILL.md and the included Python script all align: they implement fetching public web pages (static or JS), extracting main content and converting HTML to Markdown. Features like stealth, proxies, Playwright, and automatch are legitimate for robust scraping and are consistent with the stated purpose.
Instruction Scope
SKILL.md and the script limit network calls to user-supplied URLs and an optional proxy. The skill provides flags to enable stealth, proxying, and Playwright rendering; these are powerful but described and constrained (rules state not to bypass logins/paywalls). The code dynamically imports the 'scrapling' package at runtime, so actual fetching behavior depends on that external dependency.
Install Mechanism
No install spec is included (instruction-only); the README suggests installing third-party Python packages (scrapling, html2text, Playwright). That is a normal, low-risk pattern for an instruction-only Python skill, but it does mean the fetched packages and Playwright binaries will be installed separately by the user.
Credentials
The skill declares no required environment variables or credentials. Proxy credentials can be supplied as runtime flags (appropriate for a scraper). The script's security manifest claims it reads only user-provided URL/file inputs and writes only to the chosen output directory and the Scrapling-managed local DB—no unexpected secrets are requested.
Persistence & Privilege
always is false and the skill is user-invocable. It writes local output files and (per its manifest) a Scrapling automatch SQLite DB; this is reasonable for its functionality but does create persistent local artifacts that a user should be aware of.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install web-markdown-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /web-markdown-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release. - 4 fetcher modes: http, async, stealth (Camoufox), dynamic (Playwright) - CSS selector-based content extraction with auto_save / auto_match - Proxy support with humanize, geoip, block-webrtc options - --disable-resources and --block-images for faster scraping - --retry N with exponential backoff - Structured JSON output with per-page title, markdown, and status
元数据
Slug web-markdown-scraper
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Scrapling Web Extractor 是什么?

Fetch one or more public webpages with Scrapling, extract the main content, and convert HTML into Markdown using html2text. Supports static HTTP, concurrent... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 462 次。

如何安装 Scrapling Web Extractor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install web-markdown-scraper」即可一键安装,无需额外配置。

Scrapling Web Extractor 是免费的吗?

是的,Scrapling Web Extractor 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Scrapling Web Extractor 支持哪些平台?

Scrapling Web Extractor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Scrapling Web Extractor?

由 yumiu8103-hue(@yumiu8103-hue)开发并维护,当前版本 v1.0.0。

💬 留言讨论