← 返回 Skills 市场
wenbozhao279-code

Web Scraping Tool Selection Strategy

作者 wenbozhao279-code · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
220
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install web-scraping-tool-selection-strategy
功能描述
如何选择合适的网页抓取工具进行数据采集。当用户提到网页抓取、数据采集、爬虫、自动化测试、浏览器自动化、网站监控、竞品分析、价格监控、评论抓取、社交媒体数据分析、电商数据采集、小红书/知乎/京东/淘宝/1688抓取、结构化数据提取、反爬绕过、浏览器复用、API抓取、实时数据监控等场景时使用此技能。包含opencli...
使用说明 (SKILL.md)

网页抓取工具选型策略

建立高效的网页数据采集策略,通过合理选择工具最大化抓取成功率和数据质量。

When to use this skill

  • 当你需要从不同网站抓取数据但不确定使用哪种工具时
  • 面对反爬机制需要绕过的复杂网站抓取场景
  • 需要结构化数据输出或快速API级访问时
  • 要复用已登录浏览器状态抓取私有数据时

Steps

  1. 优先使用opencli进行有适配器的平台抓取

    • 对于小红书、知乎、微博、B站等有官方适配器的平台,使用opencli \x3Cplatform> \x3Caction> --limit \x3Cnumber> -f json
    • 例如:opencli xiaohongshu search "关键词" --limit 3 -f json
    • 为什么:提供结构化JSON输出,速度快,稳定性高,包含作者、标题、点赞数、发布时间等完整字段
  2. 使用playwright-cli作为兜底方案

    • 对于京东、淘宝、1688、抖音、拼多多等复杂电商网站,使用playwright-cli goto "\x3CURL>"
    • 例如:playwright-cli goto "https://item.jd.com/44541018110.html#comment"
    • 为什么:能够复用已登录的Chrome浏览器状态,绕过反爬机制,支持动态加载内容和登录后可见数据
  3. 根据平台特性选择工具

    • 社交媒体平台(小红书/知乎/微博/B站)→ 优先使用opencli
    • 电商平台(京东/淘宝/1688/抖音/拼多多)→ 使用playwright-cli
    • 为什么:opencli针对特定平台有优化适配器,playwright-cli提供通用浏览器级解决方案
  4. 验证工具连通性和状态

    • 在正式抓取前测试工具是否正常运行
    • 检查Chrome浏览器是否已正确连接
    • 为什么:避免在演示或生产环境中出现连接失败的问题

Pitfalls and solutions

❌ 盲目使用单一工具 → 无法适应不同网站的反爬机制和结构差异 → ✅ 根据平台特性选择合适工具 ❌ 忽略已登录浏览器状态 → 错过登录后数据和增加登录验证步骤 → ✅ 优先复用已登录Chrome标签页 ❌ 不区分API级和浏览器级抓取 → 效率低下或数据不准确 → ✅ 结构化数据用opencli,复杂页面用playwright-cli ❌ 缺乏工具状态检查 → 演示时出现意外故障 → ✅ 演示前进行最小检查验证

Key code and configuration

# opencli小红书搜索示例
opencli xiaohongshu search "宠物猫" --limit 3 -f json

# opencli知乎热榜示例  
opencli zhihu hot --limit 5 -f json

# playwright-cli京东评论抓取示例
playwright-cli goto "https://item.jd.com/44541018110.html#comment"

# playwright-cli 1688供应链抓取示例
playwright-cli goto "https://s.1688.com/selloffer/offer_search.htm?keywords=静脉曲张袜"

Environment and prerequisites

  • opencli工具已安装并配置
  • playwright-cli工具已安装并配置
  • Chrome浏览器已安装且可被工具访问
  • 网络连接稳定,能够访问目标网站
  • 目标网站账号已登录(用于playwright-cli复用登录态)

Companion files

  • scripts/web_scraping_validator — 工具连通性验证脚本
  • references/platform_mapping_table — 平台与工具对应关系参考表
安全使用建议
This skill is a coherent, instruction-only guide for choosing opencli vs playwright-cli. Before using it: 1) Verify you will manually install and review opencli/playwright-cli from official sources (don’t run unknown installers). 2) Be cautious about reusing logged-in browser state — don’t give an agent access to your browser profile, cookies, or passwords unless you explicitly trust the environment; doing so can expose private account data. 3) The SKILL.md mentions companion scripts that aren’t bundled here—inspect any such scripts before running. 4) Ensure your scraping activities comply with target sites’ terms of service and applicable laws. 5) Prefer manual review and least-privilege testing (use throwaway accounts or isolated browser profiles) when validating the recommended commands.
功能分析
Type: OpenClaw Skill Name: web-scraping-tool-selection-strategy Version: 1.0.0 The skill bundle provides a legitimate strategy for web scraping using 'opencli' and 'playwright-cli' across various Chinese social media and e-commerce platforms (e.g., JD, Xiaohongshu, 1688). The instructions in SKILL.md and the reference guides focus on tool selection, data structure mapping, and browser state reuse to handle anti-scraping mechanisms. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.
能力评估
Purpose & Capability
The skill's name and description match the instructions: it is a tool-selection strategy between opencli and playwright-cli. It does not request unrelated credentials or binaries. Minor inconsistency: SKILL.md references companion scripts/files (e.g., scripts/web_scraping_validator, references/platform_mapping_table) that are not present in the file manifest—this is a documentation/packaging omission but does not imply malicious behavior.
Instruction Scope
Instructions stay on-topic (how to choose and invoke opencli/playwright-cli). They explicitly recommend reusing logged-in Chrome browser state to access post-login data and to bypass anti-bot measures; while coherent for the stated purpose, this step can expose private account data if performed automatically or without care. The skill does not instruct the agent to read arbitrary system files or exfiltrate data to external endpoints, but following its guidance requires elevated access to a browser profile/session outside the skill's own control.
Install Mechanism
No install spec and no code files to execute — instruction-only skill. This minimizes surface area: nothing is downloaded or written by the skill itself.
Credentials
The skill declares no required env vars or credentials (proportional). However it implicitly depends on user-managed credentials/sessions (logged-in browser state and site accounts). That dependence is reasonable for the guidance given, but users should not hand over browser profiles, cookies, or credentials to untrusted agents.
Persistence & Privilege
The skill is not always-enabled and makes no requests to modify other skills or system configuration. Autonomous invocation is allowed by platform default but the skill does not request elevated persistent privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install web-scraping-tool-selection-strategy
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /web-scraping-tool-selection-strategy 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of web-scraping-tool-selection-strategy: - Provides step-by-step guidance for selecting between opencli and playwright-cli for web scraping based on platform and requirements. - Includes recommended usage scenarios, tool selection rules, and example commands for common platforms (e.g., 小红书、知乎、京东、淘宝、1688). - Lists key pitfalls and solutions to improve scraping success and data quality. - Details essential environment prerequisites and companion files for tool validation and platform mapping.
元数据
Slug web-scraping-tool-selection-strategy
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Web Scraping Tool Selection Strategy 是什么?

如何选择合适的网页抓取工具进行数据采集。当用户提到网页抓取、数据采集、爬虫、自动化测试、浏览器自动化、网站监控、竞品分析、价格监控、评论抓取、社交媒体数据分析、电商数据采集、小红书/知乎/京东/淘宝/1688抓取、结构化数据提取、反爬绕过、浏览器复用、API抓取、实时数据监控等场景时使用此技能。包含opencli... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 220 次。

如何安装 Web Scraping Tool Selection Strategy?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install web-scraping-tool-selection-strategy」即可一键安装,无需额外配置。

Web Scraping Tool Selection Strategy 是免费的吗?

是的,Web Scraping Tool Selection Strategy 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Web Scraping Tool Selection Strategy 支持哪些平台?

Web Scraping Tool Selection Strategy 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Web Scraping Tool Selection Strategy?

由 wenbozhao279-code(@wenbozhao279-code)开发并维护,当前版本 v1.0.0。

💬 留言讨论