← Back to Skills Marketplace
zhaobod1

Huo15 Js Scraper

by Job Zhao · GitHub ↗ · v1.2.2 · MIT-0
cross-platform ✓ Security Clean
180
Downloads
0
Stars
1
Active Installs
5
Versions
Install in OpenClaw
/install huo15-js-scraper
Description
JavaScript渲染网站抓取工具。当需要抓取JS渲染的页面(如企微文档、Vue/React SPA)、企查查企业数据获取)、绕过反爬、或者普通curl/wget/web_fetch无法获取内容的网站时使用此技能。支持Playwright和scrapling双引擎自动切换。
README (SKILL.md)

huo15-js-scraper

JavaScript渲染网站抓取技能,支持Playwright和scrapling双引擎。

快速使用

# 基本用法(自动选择引擎)
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL>

# 指定选择器
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL> --selector ".content"

# 输出JSON
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL> --output json

# 强制使用scrapling引擎
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \x3CURL> --engine scrapling

引擎选择策略

场景 推荐引擎
企微文档 / 微信相关 Playwright
Cloudflare保护站 scrapling (stealth)
Vue/React SPA Playwright
简单静态页 scrapling (basic)
未知站 Playwright(更稳定)

Python API

from huo15_js_scraper import scrape

# 方式1:自动选择(推荐)
result = scrape('https://example.com')
print(result['content'])

# 方式2:强制Playwright
result = scrape('https://developer.work.weixin.qq.com/document/path/91756', engine='playwright')

企业微信文档知识库

已构建完整的企微官方文档知识库,位于: ~/workspace/knowledge-base/企业微信文档/

知识库结构

企业微信文档/
├── README.md (索引)
├── 01-快速入门/      - 开发前必读
├── 02-服务端API/     - 通讯录、消息、客户联系、企业支付...
├── 03-客户端API/     - 小程序API、JS-SDK
├── 04-工具资源/       - WeUI、错误码、频率限制
└── 99-附录/          - FAQ、更新日志

更新企微文档知识库

# 列出所有可抓取文档
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/wecom_docs_scraper.py --list

# 抓取单个文档
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/wecom_docs_scraper.py --path-id 90556 --category "01-快速入门" --title "快速入门"

# 批量抓取(更新全部52个文档)
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/wecom_docs_scraper.py --all

核心文档

文档 路径ID 说明
快速入门 90556 开发前必读
获取access_token 91039 API认证基础
发送应用消息 90235 消息推送核心
创建成员 90195 通讯录管理
客户联系概述 92109 客户管理基础
JS-SDK签名算法 90506 前端开发必备

企查查企业数据

企查查(qcc.com)企业信息查询,支持两种方式:

  1. ✅ 推荐:MCP方式(官方API,稳定可靠)
  2. 备用:直接抓取(需要账号登录,有反爬限制)

推荐方案:企查查MCP(官方API)

企查查提供官方MCP服务,支持 OpenClaw,已封装20+企业查询 SKILL。

数据规模:

  • 3.65亿+ 市场主体
  • 2.5亿+ 司法诉讼
  • 2.1亿+ 知识产权
  • 1.7亿+ 招投标

MCP Servers(4个):

Server 别名 主要能力
qcc-company 企业基座 工商登记、股权结构
qcc-risk 风控大脑 34项风险扫描工具
qcc-ipr 知产引擎 专利、商标、软著
qcc-operation 经营罗盘 招投标、资质、舆情

安装步骤:

# 1. 注册获取API Key
# 访问 https://agent.qcc.com 注册

# 2. 添加到OpenClaw配置
# 在OpenClaw插件配置中添加企查查MCP服务器

# MCP接入地址: https://agent.qcc.com/mcp
# 需要配置 API Key 认证

预置 SKILL(发送消息给AI即可加载):

请加载并使用这个 SKILL:https://github.com/duhu2000/financial-services-qcc

SKILL命令示例:

# KYB企业核验(~30秒)
/kyb-verification-qcc 华为技术有限公司

# IC Memo投资备忘录(~30秒)
/ic-memo-qcc 宁德时代 --round Series-B

# 企业画像速览(~3分钟)
/strip-profile-qcc 美团平台有限公司

# 知识产权尽调
/ip-due-diligence-qcc 企业名称 --peer 竞品

# 供应链风险评估
/supply-chain-risk-qcc 企业名称 --tier 1

# 关联方穿透
/related-party-qcc 企业名称 --depth 5

输出格式: 支持 .md / .docx / .pptx


备用方案:直接抓取

如无法使用MCP,可使用直接抓取方式(需要企查查账号)。

安装依赖

pip3 install playwright --break-system-packages
playwright install chromium

登录(首次使用)

# 生成二维码截图,扫码登录
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --login

登录后Cookie自动保存到 ~/.cache/huo15-js-scraper/qichacha_cookies.json

搜索企业

# 搜索企业
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --search "腾讯" --limit 10

# 输出JSON
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --search "腾讯" --output json

企业详情

# 获取企业详细信息(部分需要VIP)
python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/qichacha_scraper.py --company "https://www.qcc.com/firm/xxxxx.html"

返回信息示例

搜索结果(无需登录可查看基础信息):

  • 公司名称
  • 企业状态(开业/存续/吊销)
  • 行业分类
  • 注册资本
  • 法定代表人

详细信息(可能需要VIP):

  • 工商信息
  • 股东信息
  • 年报数据
  • 风险信息

注意事项

  • 企查查搜索功能需要登录才能访问
  • 详细信息(如年报、股东)需要VIP账号
  • Cookie有效期约7天,过期需重新登录
  • 建议设置 --wait 5 等待页面渲染

常见问题

Q: 企微文档怎么抓?

python3 ~/.openclaw/workspace/skills/huo15-js-scraper/scripts/scrape.py \
  "https://developer.work.weixin.qq.com/document/path/91756" \
  --wait 5

Q: 提示playwright未安装?

pip3 install playwright --break-system-packages
playwright install chromium

Q: scrapling安装?

pip3 install "scrapling[all]" --break-system-packages
scrapling install

Q: 内容为空或获取到跳转页面?

增加 --wait 时间,让JS有更多时间渲染:

python3 ...scrape.py \x3CURL> --wait 5

依赖安装

# Playwright(主引擎)
pip3 install playwright --break-system-packages
playwright install chromium

# scrapling(降级引擎)
pip3 install "scrapling[all]" --break-system-packages
scrapling install

工作原理

  1. 优先使用 Playwright(chromium headless)加载页面,等待networkidle
  2. 等待指定时间让JS渲染完成
  3. 通过CSS选择器提取内容
  4. 如果Playwright失败,自动降级到scrapling
Usage Guidance
This skill appears to be what it says: a Playwright/scrapling-based scraper. Before installing or running: 1) Be aware the scripts will save login cookies and screenshots under ~/.cache/huo15-js-scraper and will write scraped docs to ~/workspace/knowledge-base/企业微信文档 — treat those cookie files as sensitive. 2) The setup steps run 'pip3 install' and 'playwright install' (which downloads browser binaries) and the examples use --break-system-packages; prefer running in a virtualenv or isolated VM/container to avoid modifying system Python. 3) The skill source is unknown and the SKILL.md references an external GitHub skill for Qichacha MCP — do not blindly follow external links or provide API keys unless you trust the upstream. 4) If you will log into third-party services (e.g., qcc), use an account you control and consider the legal/ToS implications of scraping. If you want higher assurance, review the full scripts locally or run them in a sandbox before granting them access to your primary environment.
Capability Analysis
Type: OpenClaw Skill Name: huo15-js-scraper Version: 1.2.2 The bundle provides a set of tools for scraping JavaScript-rendered websites using Playwright and Scrapling, with specialized scripts for Enterprise WeChat (WeCom) documentation and Qichacha business data. The code logic is transparent and aligns with the stated purpose: it manages session cookies locally (~/.cache/huo15-js-scraper/qichacha_cookies.json) to maintain login states and writes scraped content to the user's workspace knowledge base. While SKILL.md contains a prompt-injection-style instruction directing the agent to load an external skill from GitHub (duhu2000/financial-services-qcc), this appears to be a functional recommendation for related enterprise data services rather than a malicious attempt to exfiltrate data or gain unauthorized access.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
Name/description (JS-rendered scraping, Qichacha, WeCom docs) align with the included Python scripts that use Playwright and scrapling. The files and commands are consistent with a scraper that needs to run headful/headless browsers and save output to disk.
Instruction Scope
SKILL.md instructs running the included scripts and to install Playwright/scrapling; the scripts read/write files under the user's home (~/.cache/huo15-js-scraper and ~/workspace/knowledge-base/企业微信文档) and save site cookies for logged-in scraping. Those filesystem operations are expected for this purpose but are sensitive because cookies contain authentication tokens.
Install Mechanism
No packaged install spec is present (instruction-only), but SKILL.md requires pip installs and 'playwright install' / 'scrapling install' which will download browser binaries. The pip commands shown use --break-system-packages which can change system Python packages — this is expected for Playwright but worth noting as higher-impact than a pure local script.
Credentials
The skill does not request environment variables, API keys, or unrelated secrets. It does store cookies and screenshots in user home paths; those stored cookies are effectively authentication material and should be treated as sensitive by the user.
Persistence & Privilege
always is false and the skill does not attempt to modify other skills or global agent config. It persists only its own cookies and knowledge-base files under the user's home directories.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install huo15-js-scraper
  3. After installation, invoke the skill by name or use /huo15-js-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.2.2
No visible changes detected for version 1.2.2 (no file modifications). - Version bump only; no code or documentation changes.
v1.2.1
v1.2.1 把本地工作态同步到 clawhub(之前本地版本号落后于 clawhub)
v1.2.0
- 新增对企查查(qcc.com)企业数据采集能力,包括MCP官方API接入与网页抓取两种方式 - 增加了 scripts/qichacha_scraper.py 脚本,实现扫码登录、企业搜索和详情采集 - SKILL.md 详细补充了企查查企业信息获取的用法和注意事项 - 原有企微文档采集能力不变,整体功能拓展至更多数据源
v1.1.0
- Added scripts/wecom_docs_scraper.py for 企业微信文档批量抓取和知识库管理 - SKILL.md 增加了“企业微信文档知识库”相关说明与使用示例 - 新增批量更新、按 path-id 抓取、文档列表等功能说明 - 现支持一键构建和维护 ~/workspace/knowledge-base/企业微信文档/ 知识库
v1.0.0
Initial release of huo15-js-scraper: - JavaScript渲染页面抓取工具,支持Playwright和scrapling双引擎自动切换 - 适用于企微文档、Vue/React SPA、Cloudflare保护站等复杂网页内容抓取 - 提供命令行和Python API两种使用方式 - 支持使用CSS选择器提取内容和输出JSON - 附带详细依赖安装与常见问题说明
Metadata
Slug huo15-js-scraper
Version 1.2.2
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 5
Frequently Asked Questions

What is Huo15 Js Scraper?

JavaScript渲染网站抓取工具。当需要抓取JS渲染的页面(如企微文档、Vue/React SPA)、企查查企业数据获取)、绕过反爬、或者普通curl/wget/web_fetch无法获取内容的网站时使用此技能。支持Playwright和scrapling双引擎自动切换。 It is an AI Agent Skill for Claude Code / OpenClaw, with 180 downloads so far.

How do I install Huo15 Js Scraper?

Run "/install huo15-js-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Huo15 Js Scraper free?

Yes, Huo15 Js Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Huo15 Js Scraper support?

Huo15 Js Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Huo15 Js Scraper?

It is built and maintained by Job Zhao (@zhaobod1); the current version is v1.2.2.

💬 Comments