← 返回 Skills 市场
Smart Scraper
作者
yadanzheng68-cmyk
· GitHub ↗
· v1.0.0
· MIT-0
166
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install smart-scraper
功能描述
AI-powered web scraper with intelligent structure recognition. Extracts lists, articles, and tables from any website with automatic type detection.
安全使用建议
This skill appears to be a legitimate Playwright-based web scraper. Before installing or running it:
- Expect a large download: Playwright will fetch browser binaries and many npm packages. Install in a machine or container where large downloads are acceptable.
- Run npm install (not just installing 'playwright' alone) so the 'tsx' script runner is available; the SKILL.md install metadata is incomplete compared to package.json.
- Be cautious about what URLs you scrape: the tool captures full-page screenshots and page text (including any sensitive information visible on the page). Do not point it at private/internal sites, pages behind SSO, or pages containing secrets unless you understand and control the environment.
- Review package-lock.json and verify dependencies come from the official npm registry (it appears so here). If you need higher assurance, run the install in an isolated sandbox and/or inspect network activity during install.
- If you plan to run this as an automated agent skill, consider limits and policies so it cannot be pointed at sensitive targets autonomously.
If you want, I can: (1) point out the exact commands to run to install and run it safely in a container, or (2) highlight the few packaging mismatches (e.g., 'tsx' usage vs. install metadata) and propose a corrected install spec.
功能分析
Type: OpenClaw Skill
Name: smart-scraper
Version: 1.0.0
The smart-scraper bundle is a legitimate web scraping utility using Playwright to extract structured data (lists, articles, tables) from websites. The code is well-structured, providing modular extractors and a CLI interface as described in the documentation. No evidence of malicious intent, data exfiltration, or prompt injection was found; the tool's network and browser capabilities are strictly aligned with its stated purpose of web scraping. A minor functional bug (incorrect import paths in src/utils/DataFormatter.ts) was identified, but it does not pose a security risk.
能力评估
Purpose & Capability
Name/description match the implementation: the code uses Playwright to load pages, auto-detect structure (list/article/table), and format outputs. Required binaries (node, npm) and the declared dependency on the Playwright package align with the scraper purpose. Minor oddity: SKILL.md install metadata lists only the 'playwright' package and marks a binary 'npx' — that doesn't fully reflect how the project is invoked (the CLI script uses 'tsx', which is listed only as a devDependency in package.json). This is an implementation/packaging mismatch but not evidence of malicious intent.
Instruction Scope
SKILL.md and the CLI limit behavior to visiting the provided URL and extracting content; instructions do not reference unrelated local files, other credentials, or remote endpoints. The code takes full-page screenshots (screenshot Buffer produced) and executes page.evaluate in the page context — both expected for headless-browser scraping but they can capture sensitive page content if you point the tool at authenticated or internal URLs. The skill does not itself transmit data to third-party endpoints, but the captured screenshot and extracted data will be available to whoever runs the CLI or the agent invoking the skill.
Install Mechanism
Install spec uses the public npm package 'playwright', which is typical for this functionality. Installing Playwright will also pull browser binaries (Playwright's install actions) and other npm packages from the registry; that is standard but increases install size and network download surface. The install metadata is minimal (only 'playwright') while the project expects to run via 'npm run scrape' using 'tsx' — the install spec does not explicitly install 'tsx' or devDependencies; users should run a full 'npm install' in a safe environment. The npm registry usage is normal and traceable (not a raw URL download).
Credentials
The skill requests no environment variables or credentials, which is appropriate for a generic scraper. Note: scraping authenticated or internal sites would require supplying credentials or cookies externally; the skill itself does not request or store any secrets.
Persistence & Privilege
The skill does not request permanent 'always' presence, does not alter other skills' config, and contains no self-enabling behavior. It runs as a CLI-driven tool and uses Playwright only when invoked.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install smart-scraper - 安装完成后,直接呼叫该 Skill 的名称或使用
/smart-scraper触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
AI-powered web scraper with intelligent structure recognition. Auto-detects lists, articles, and tables.
元数据
常见问题
Smart Scraper 是什么?
AI-powered web scraper with intelligent structure recognition. Extracts lists, articles, and tables from any website with automatic type detection. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 166 次。
如何安装 Smart Scraper?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install smart-scraper」即可一键安装,无需额外配置。
Smart Scraper 是免费的吗?
是的,Smart Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Smart Scraper 支持哪些平台?
Smart Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Smart Scraper?
由 yadanzheng68-cmyk(@yadanzheng68-cmyk)开发并维护,当前版本 v1.0.0。
推荐 Skills