Description

JavaScript渲染网站抓取工具。当需要抓取JS渲染的页面（如企微文档、Vue/React SPA）、企查查企业数据获取）、绕过反爬、或者普通curl/wget/web_fetch无法获取内容的网站时使用此技能。支持Playwright和scrapling双引擎自动切换。

README (SKILL.md)

huo15-js-scraper

Name: Huo15 Js Scraper
Author: zhaobod1

JavaScript渲染网站抓取技能，支持Playwright和scrapling双引擎。

快速使用

# 基本用法（自动选择引擎）
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL>

# 指定选择器
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL> --selector ".content"

# 输出JSON
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL> --output json

# 强制使用scrapling引擎
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL> --engine scrapling

引擎选择策略

场景	推荐引擎
企微文档 / 微信相关	Playwright
Cloudflare保护站	scrapling (stealth)
Vue/React SPA	Playwright
简单静态页	scrapling (basic)
未知站	Playwright（更稳定）

Python API

from huo15_js_scraper import scrape

# 方式1：自动选择（推荐）
result = scrape('https://example.com')
print(result['content'])

# 方式2：强制Playwright
result = scrape('https://developer.work.weixin.qq.com/document/path/91756', engine='playwright')

企业微信文档知识库

已构建完整的企微官方文档知识库，位于： ~/workspace/knowledge-base/企业微信文档/

知识库结构

企业微信文档/
├── README.md (索引)
├── 01-快速入门/      - 开发前必读
├── 02-服务端API/     - 通讯录、消息、客户联系、企业支付...
├── 03-客户端API/     - 小程序API、JS-SDK
├── 04-工具资源/       - WeUI、错误码、频率限制
└── 99-附录/          - FAQ、更新日志

更新企微文档知识库

# 列出所有可抓取文档
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/wecom_docs_scraper.py --list

# 抓取单个文档
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/wecom_docs_scraper.py --path-id 90556 --category "01-快速入门" --title "快速入门"

# 批量抓取（更新全部52个文档）
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/wecom_docs_scraper.py --all

核心文档

文档	路径ID	说明
快速入门	90556	开发前必读
获取access_token	91039	API认证基础
发送应用消息	90235	消息推送核心
创建成员	90195	通讯录管理
客户联系概述	92109	客户管理基础
JS-SDK签名算法	90506	前端开发必备

企查查企业数据

企查查（qcc.com）企业信息查询，支持两种方式：

✅ 推荐：MCP方式（官方API，稳定可靠）
备用：直接抓取（需要账号登录，有反爬限制）

推荐方案：企查查MCP（官方API）

企查查提供官方MCP服务，支持 OpenClaw，已封装20+企业查询 SKILL。

数据规模：

3.65亿+ 市场主体
2.5亿+ 司法诉讼
2.1亿+ 知识产权
1.7亿+ 招投标

MCP Servers（4个）：

Server	别名	主要能力
qcc-company	企业基座	工商登记、股权结构
qcc-risk	风控大脑	34项风险扫描工具
qcc-ipr	知产引擎	专利、商标、软著
qcc-operation	经营罗盘	招投标、资质、舆情

安装步骤：

# 1. 注册获取API Key
# 访问 https://agent.qcc.com 注册

# 2. 添加到OpenClaw配置
# 在OpenClaw插件配置中添加企查查MCP服务器

# MCP接入地址: https://agent.qcc.com/mcp
# 需要配置 API Key 认证

预置 SKILL（发送消息给AI即可加载）：

请加载并使用这个 SKILL：https://github.com/duhu2000/financial-services-qcc

SKILL命令示例：

# KYB企业核验（~30秒）
/kyb-verification-qcc 华为技术有限公司

# IC Memo投资备忘录（~30秒）
/ic-memo-qcc 宁德时代 --round Series-B

# 企业画像速览（~3分钟）
/strip-profile-qcc 美团平台有限公司

# 知识产权尽调
/ip-due-diligence-qcc 企业名称 --peer 竞品

# 供应链风险评估
/supply-chain-risk-qcc 企业名称 --tier 1

# 关联方穿透
/related-party-qcc 企业名称 --depth 5

输出格式： 支持 .md / .docx / .pptx

备用方案：直接抓取

如无法使用MCP，可使用直接抓取方式（需要企查查账号）。

安装依赖

pip3 install playwright --break-system-packages
playwright install chromium

登录（首次使用）

# 生成二维码截图，扫码登录
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --login

登录后Cookie自动保存到 ~/.cache/huo15-js-scraper/qichacha_cookies.json

搜索企业

# 搜索企业
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --search "腾讯" --limit 10

# 输出JSON
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --search "腾讯" --output json

企业详情

# 获取企业详细信息（部分需要VIP）
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --company "https://www.qcc.com/firm/xxxxx.html"

返回信息示例

搜索结果（无需登录可查看基础信息）：

公司名称
企业状态（开业/存续/吊销）
行业分类
注册资本
法定代表人

详细信息（可能需要VIP）：

工商信息
股东信息
年报数据
风险信息

注意事项

企查查搜索功能需要登录才能访问
详细信息（如年报、股东）需要VIP账号
Cookie有效期约7天，过期需重新登录
建议设置 --wait 5 等待页面渲染

常见问题

Q: 企微文档怎么抓？

python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \
  "https://developer.work.weixin.qq.com/document/path/91756" \
  --wait 5

Q: 提示playwright未安装？

pip3 install playwright --break-system-packages
playwright install chromium

Q: scrapling安装？

pip3 install "scrapling[all]" --break-system-packages
scrapling install

Q: 内容为空或获取到跳转页面？

增加 --wait 时间，让JS有更多时间渲染：

python3 ...scrape.py \x3CURL> --wait 5

依赖安装

# Playwright（主引擎）
pip3 install playwright --break-system-packages
playwright install chromium

# scrapling（降级引擎）
pip3 install "scrapling[all]" --break-system-packages
scrapling install

工作原理

优先使用 Playwright（chromium headless）加载页面，等待networkidle
等待指定时间让JS渲染完成
通过CSS选择器提取内容
如果Playwright失败，自动降级到scrapling

Usage Guidance

This skill appears to be what it says: a Playwright/scrapling-based scraper. Before installing or running: 1) Be aware the scripts will save login cookies and screenshots under ~/.cache/huo15-js-scraper and will write scraped docs to ~/workspace/knowledge-base/企业微信文档 — treat those cookie files as sensitive. 2) The setup steps run 'pip3 install' and 'playwright install' (which downloads browser binaries) and the examples use --break-system-packages; prefer running in a virtualenv or isolated VM/container to avoid modifying system Python. 3) The skill source is unknown and the SKILL.md references an external GitHub skill for Qichacha MCP — do not blindly follow external links or provide API keys unless you trust the upstream. 4) If you will log into third-party services (e.g., qcc), use an account you control and consider the legal/ToS implications of scraping. If you want higher assurance, review the full scripts locally or run them in a sandbox before granting them access to your primary environment.

Capability Analysis

Type: OpenClaw Skill Name: huo15-js-scraper Version: 1.2.2 The bundle provides a set of tools for scraping JavaScript-rendered websites using Playwright and Scrapling, with specialized scripts for Enterprise WeChat (WeCom) documentation and Qichacha business data. The code logic is transparent and aligns with the stated purpose: it manages session cookies locally (~/.cache/huo15-js-scraper/qichacha_cookies.json) to maintain login states and writes scraped content to the user's workspace knowledge base. While SKILL.md contains a prompt-injection-style instruction directing the agent to load an external skill from GitHub (duhu2000/financial-services-qcc), this appears to be a functional recommendation for related enterprise data services rather than a malicious attempt to exfiltrate data or gain unauthorized access.

Capability Tags

requires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description (JS-rendered scraping, Qichacha, WeCom docs) align with the included Python scripts that use Playwright and scrapling. The files and commands are consistent with a scraper that needs to run headful/headless browsers and save output to disk.

ℹ Instruction Scope

SKILL.md instructs running the included scripts and to install Playwright/scrapling; the scripts read/write files under the user's home (~/.cache/huo15-js-scraper and ~/workspace/knowledge-base/企业微信文档) and save site cookies for logged-in scraping. Those filesystem operations are expected for this purpose but are sensitive because cookies contain authentication tokens.

ℹ Install Mechanism

No packaged install spec is present (instruction-only), but SKILL.md requires pip installs and 'playwright install' / 'scrapling install' which will download browser binaries. The pip commands shown use --break-system-packages which can change system Python packages — this is expected for Playwright but worth noting as higher-impact than a pure local script.

✓ Credentials

The skill does not request environment variables, API keys, or unrelated secrets. It does store cookies and screenshots in user home paths; those stored cookies are effectively authentication material and should be treated as sensitive by the user.

✓ Persistence & Privilege

always is false and the skill does not attempt to modify other skills or global agent config. It persists only its own cookies and knowledge-base files under the user's home directories.

Version History

v1.2.2

No visible changes detected for version 1.2.2 (no file modifications). - Version bump only; no code or documentation changes.

v1.2.1

v1.2.1 把本地工作态同步到 clawhub（之前本地版本号落后于 clawhub）

v1.2.0

- 新增对企查查（qcc.com）企业数据采集能力，包括MCP官方API接入与网页抓取两种方式 - 增加了 scripts/qichacha_scraper.py 脚本，实现扫码登录、企业搜索和详情采集 - SKILL.md 详细补充了企查查企业信息获取的用法和注意事项 - 原有企微文档采集能力不变，整体功能拓展至更多数据源

v1.1.0

- Added scripts/wecom_docs_scraper.py for 企业微信文档批量抓取和知识库管理 - SKILL.md 增加了“企业微信文档知识库”相关说明与使用示例 - 新增批量更新、按 path-id 抓取、文档列表等功能说明 - 现支持一键构建和维护 ~/workspace/knowledge-base/企业微信文档/ 知识库

v1.0.0

Initial release of huo15-js-scraper: - JavaScript渲染页面抓取工具，支持Playwright和scrapling双引擎自动切换 - 适用于企微文档、Vue/React SPA、Cloudflare保护站等复杂网页内容抓取 - 提供命令行和Python API两种使用方式 - 支持使用CSS选择器提取内容和输出JSON - 附带详细依赖安装与常见问题说明

Metadata

Slug huo15-js-scraper

Version 1.2.2

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 5

Frequently Asked Questions

What is Huo15 Js Scraper?

JavaScript渲染网站抓取工具。当需要抓取JS渲染的页面（如企微文档、Vue/React SPA）、企查查企业数据获取）、绕过反爬、或者普通curl/wget/web_fetch无法获取内容的网站时使用此技能。支持Playwright和scrapling双引擎自动切换。 It is an AI Agent Skill for Claude Code / OpenClaw, with 180 downloads so far.

How do I install Huo15 Js Scraper?

Run "/install huo15-js-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Huo15 Js Scraper free?

Yes, Huo15 Js Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Huo15 Js Scraper support?

Huo15 Js Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Huo15 Js Scraper?

It is built and maintained by Job Zhao (@zhaobod1); the current version is v1.2.2.

More Skills

Huo15 Js Scraper