← Back to Skills Marketplace
zhangjia-ie

xueqiu-collector

by zhangjia-ie · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
125
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install xueqiu-collector
Description
雪球帖子全量采集 Skill。采集任意雪球用户的全部帖子(含完整正文、图片下载、OCR识别), 自动做 V4 规则分析(帖子类型/投资相关性/情感/操作意图/主题标签/质量评分), 结果存入 SQLite 数据库并导出 JSON + Markdown 备份。 触发词:采集雪球、雪球帖子采集、爬取雪球、收集雪球、雪...
Usage Guidance
Before installing or running this skill: - Understand it needs access to a real Edge browser profile (login state) to work reliably. That profile contains all browser cookies and sessions—prefer using a dedicated Edge profile created only for scraping rather than your primary browser profile. - The skill will run npx/playwright-cli and drive Edge; ensure you trust the machine and review the commands you will run. Playwright may download browser binaries if missing. - The package writes logs, images and a SQLite DB to local disk (data/ and logs/ under the skill). Review those files for sensitive content and consider where you store/back them up. - Confirm scraping Xueqiu is permitted under the site's terms and that you have the right to collect the targeted users' posts. - Note the registry metadata omits the Edge profile/config requirement—this mismatch is likely an oversight but worth verifying with the publisher. - If you are concerned about exposure, run the skill in a sandboxed VM or create a throwaway Edge profile (logged-in only to the specific Xueqiu account) and inspect the code (collect.py/check_env.py/analyze.py) before use. If you need higher assurance, request the publisher to declare required config paths and explain why full profile access is necessary.
Capability Analysis
Type: OpenClaw Skill Name: xueqiu-collector Version: 1.0.0 The skill is a functional Xueqiu scraper that requires high-privilege access to the user's Edge browser profile (including session cookies) to bypass anti-bot measures. While this behavior is aligned with the stated purpose, the script `collect.py` lacks input sanitization for the `author` parameter, which is used to construct file paths, creating a potential path traversal vulnerability during data export. Additionally, the tool relies on executing shell commands via `subprocess` and `npx`, which increases the risk if the AI agent is manipulated into using malicious arguments.
Capability Assessment
Purpose & Capability
Name/description claim to scrape Xueqiu posts and run local rule-based analysis; the scripts implement exactly that using playwright-cli, Edge profile, and local SQLite/JSON output. That capability set is coherent with the stated purpose. Minor mismatch: registry metadata lists no required config paths or credentials, but the tool clearly expects an Edge profile (login state) and npx/playwright available.
Instruction Scope
SKILL.md and scripts instruct running check_env.py, collect.py and analyze.py which will: drive Edge via playwright-cli, save snapshots, download images, run OCR, write logs, and persist data to SQLite/JSON/Markdown. All of this is within the stated scraping/analysis scope. The instructions explicitly require mounting a real Edge profile (to reuse login state), which lets the tool access cookies and other profile data beyond just Xueqiu session—this is functional for bypassing captchas but increases privacy risk.
Install Mechanism
There is no automated install spec — this is an instruction+script bundle. It relies on existing npx/playwright-cli and local Edge; no obscure external downloads or URL-based installers appear in the package. Running npx/playwright may cause local browser installation via Playwright, but that is standard and traceable.
Credentials
Metadata declares no required env vars or config paths, yet scripts actively probe environment variables and multiple user directories to locate npx and Edge profile, and expect a path to an Edge profile folder (which contains cookies, local storage, etc.). Access to a full browser profile is sensitive and broader than 'just Xueqiu credentials'. The skill will also write logs and a DB under the skill's data/logs directories. The lack of declared required config paths in registry metadata is a notable omission.
Persistence & Privilege
The skill does not request 'always: true' or other elevated installation privileges. It stores output (DB/JSON/MD/images) and logs under the project/data and project/logs directories, which is expected for a scraper. It does not modify other skills or system-wide agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install xueqiu-collector
  3. After installation, invoke the skill by name or use /xueqiu-collector
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
xueqiu-collector v1.0.0 初始发布 - 支持采集任意雪球用户的全部帖子(含完整正文、图片下载、图片OCR识别) - 依据 V4 规则自动分析帖子类型、投资相关性、情感、操作意图、主题标签与质量评分 - 采集结果存入 SQLite 数据库并支持导出为 JSON 与 Markdown 格式(全量及分类) - 提供全量/增量采集、补全文本、批量分析等标准操作流程 - 内置反爬虫措施(请求延迟、重试、断点续采),日志记录与环境检查 - 支持通过 Edge 浏览器真实用户登录态规避验证码 - 附带详细参数说明、路径配置、输出结构及常见采坑经验
Metadata
Slug xueqiu-collector
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is xueqiu-collector?

雪球帖子全量采集 Skill。采集任意雪球用户的全部帖子(含完整正文、图片下载、OCR识别), 自动做 V4 规则分析(帖子类型/投资相关性/情感/操作意图/主题标签/质量评分), 结果存入 SQLite 数据库并导出 JSON + Markdown 备份。 触发词:采集雪球、雪球帖子采集、爬取雪球、收集雪球、雪... It is an AI Agent Skill for Claude Code / OpenClaw, with 125 downloads so far.

How do I install xueqiu-collector?

Run "/install xueqiu-collector" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is xueqiu-collector free?

Yes, xueqiu-collector is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does xueqiu-collector support?

xueqiu-collector is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created xueqiu-collector?

It is built and maintained by zhangjia-ie (@zhangjia-ie); the current version is v1.0.0.

💬 Comments