← 返回 Skills 市场
xueqiu-collector
作者
zhangjia-ie
· GitHub ↗
· v1.0.0
· MIT-0
125
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install xueqiu-collector
功能描述
雪球帖子全量采集 Skill。采集任意雪球用户的全部帖子(含完整正文、图片下载、OCR识别), 自动做 V4 规则分析(帖子类型/投资相关性/情感/操作意图/主题标签/质量评分), 结果存入 SQLite 数据库并导出 JSON + Markdown 备份。 触发词:采集雪球、雪球帖子采集、爬取雪球、收集雪球、雪...
安全使用建议
Before installing or running this skill:
- Understand it needs access to a real Edge browser profile (login state) to work reliably. That profile contains all browser cookies and sessions—prefer using a dedicated Edge profile created only for scraping rather than your primary browser profile.
- The skill will run npx/playwright-cli and drive Edge; ensure you trust the machine and review the commands you will run. Playwright may download browser binaries if missing.
- The package writes logs, images and a SQLite DB to local disk (data/ and logs/ under the skill). Review those files for sensitive content and consider where you store/back them up.
- Confirm scraping Xueqiu is permitted under the site's terms and that you have the right to collect the targeted users' posts.
- Note the registry metadata omits the Edge profile/config requirement—this mismatch is likely an oversight but worth verifying with the publisher.
- If you are concerned about exposure, run the skill in a sandboxed VM or create a throwaway Edge profile (logged-in only to the specific Xueqiu account) and inspect the code (collect.py/check_env.py/analyze.py) before use. If you need higher assurance, request the publisher to declare required config paths and explain why full profile access is necessary.
功能分析
Type: OpenClaw Skill
Name: xueqiu-collector
Version: 1.0.0
The skill is a functional Xueqiu scraper that requires high-privilege access to the user's Edge browser profile (including session cookies) to bypass anti-bot measures. While this behavior is aligned with the stated purpose, the script `collect.py` lacks input sanitization for the `author` parameter, which is used to construct file paths, creating a potential path traversal vulnerability during data export. Additionally, the tool relies on executing shell commands via `subprocess` and `npx`, which increases the risk if the AI agent is manipulated into using malicious arguments.
能力评估
Purpose & Capability
Name/description claim to scrape Xueqiu posts and run local rule-based analysis; the scripts implement exactly that using playwright-cli, Edge profile, and local SQLite/JSON output. That capability set is coherent with the stated purpose. Minor mismatch: registry metadata lists no required config paths or credentials, but the tool clearly expects an Edge profile (login state) and npx/playwright available.
Instruction Scope
SKILL.md and scripts instruct running check_env.py, collect.py and analyze.py which will: drive Edge via playwright-cli, save snapshots, download images, run OCR, write logs, and persist data to SQLite/JSON/Markdown. All of this is within the stated scraping/analysis scope. The instructions explicitly require mounting a real Edge profile (to reuse login state), which lets the tool access cookies and other profile data beyond just Xueqiu session—this is functional for bypassing captchas but increases privacy risk.
Install Mechanism
There is no automated install spec — this is an instruction+script bundle. It relies on existing npx/playwright-cli and local Edge; no obscure external downloads or URL-based installers appear in the package. Running npx/playwright may cause local browser installation via Playwright, but that is standard and traceable.
Credentials
Metadata declares no required env vars or config paths, yet scripts actively probe environment variables and multiple user directories to locate npx and Edge profile, and expect a path to an Edge profile folder (which contains cookies, local storage, etc.). Access to a full browser profile is sensitive and broader than 'just Xueqiu credentials'. The skill will also write logs and a DB under the skill's data/logs directories. The lack of declared required config paths in registry metadata is a notable omission.
Persistence & Privilege
The skill does not request 'always: true' or other elevated installation privileges. It stores output (DB/JSON/MD/images) and logs under the project/data and project/logs directories, which is expected for a scraper. It does not modify other skills or system-wide agent settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install xueqiu-collector - 安装完成后,直接呼叫该 Skill 的名称或使用
/xueqiu-collector触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
xueqiu-collector v1.0.0 初始发布
- 支持采集任意雪球用户的全部帖子(含完整正文、图片下载、图片OCR识别)
- 依据 V4 规则自动分析帖子类型、投资相关性、情感、操作意图、主题标签与质量评分
- 采集结果存入 SQLite 数据库并支持导出为 JSON 与 Markdown 格式(全量及分类)
- 提供全量/增量采集、补全文本、批量分析等标准操作流程
- 内置反爬虫措施(请求延迟、重试、断点续采),日志记录与环境检查
- 支持通过 Edge 浏览器真实用户登录态规避验证码
- 附带详细参数说明、路径配置、输出结构及常见采坑经验
元数据
常见问题
xueqiu-collector 是什么?
雪球帖子全量采集 Skill。采集任意雪球用户的全部帖子(含完整正文、图片下载、OCR识别), 自动做 V4 规则分析(帖子类型/投资相关性/情感/操作意图/主题标签/质量评分), 结果存入 SQLite 数据库并导出 JSON + Markdown 备份。 触发词:采集雪球、雪球帖子采集、爬取雪球、收集雪球、雪... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 125 次。
如何安装 xueqiu-collector?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install xueqiu-collector」即可一键安装,无需额外配置。
xueqiu-collector 是免费的吗?
是的,xueqiu-collector 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
xueqiu-collector 支持哪些平台?
xueqiu-collector 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 xueqiu-collector?
由 zhangjia-ie(@zhangjia-ie)开发并维护,当前版本 v1.0.0。
推荐 Skills