← 返回 Skills 市场
flyingtimes

Crawl From X

作者 flyingtimes · GitHub ↗ · v2.7.0
cross-platform ✓ 安全检测通过
684
总下载
1
收藏
1
当前安装
9
版本数
在 OpenClaw 中安装
/install crawl-from-x
功能描述
X/Twitter 帖子抓取工具。管理关注用户列表,自动抓取当天最新帖子,导出 Markdown。
使用说明 (SKILL.md)

Crawl From X

X/Twitter 帖子抓取工具。

⚠️ 前置要求:需要 OpenClaw Browser Relay 和浏览器扩展。


安装

npx clawhub@latest install crawl-from-x

安装位置:

  • $CLAWD/skills/crawl-from-x/scripts/craw_hot.py - 主脚本
  • $CLAWD/skills/crawl-from-x/users.txt - 用户列表
  • $CLAWD/skills/crawl-from-x/results/ - 抓取结果

准备

1. 安装 OpenClaw

访问 https://github.com/openclaw/openclaw 下载安装。

2. 安装浏览器扩展

在 OpenClaw 设置中进入 "Browser Relay",安装扩展。完成后扩展显示绿色图标。

3. 启动 Browser Relay

openclaw browser start
openclaw browser status  # 确认显示 "browser: enabled"

4. 登录 X 账号

在安装了扩展的浏览器中登录 X (Twitter)。


快速开始

cd $CLAWD/skills/crawl-from-x/scripts

# 添加用户
python3 craw_hot.py add username

# 列出用户
python3 craw_hot.py list

# 删除用户
python3 craw_hot.py remove username

# 抓取所有用户
python3 craw_hot.py crawl

# 抓取单个用户
python3 craw_hot.py crawl username

结果文件:

  • posts_YYYYMMDD_HHMMSS.md - 完整内容(Markdown),媒体 URL 已替换为本地路径
  • posts_YYYYMMDD_HHMMSS.txt - URL 列表(仅全部用户抓取)
  • images/ - 下载的图片和视频

说明:

  • 单用户抓取和全部用户抓取使用相同的策略
  • 所有媒体文件(图片、动图、视频)都会下载到 images/ 目录
  • Markdown 文件中的媒体 URL 会自动替换为本地相对路径

注意事项

  1. 浏览器要求:必须安装 OpenClaw 浏览器扩展
  2. 登录状态:浏览器必须登录 X 账号
  3. 速率限制:脚本已内置随机延迟
  4. 私密账号:无法抓取私密账号内容
安全使用建议
This skill is coherent with its stated purpose, but before installing or running it: 1) Inspect the two included Python scripts (scripts/craw_hot.py and scripts/media_downloader.py) for any unexpected network calls, hard-coded external endpoints, or data-exfil logic. 2) Understand that the skill uses the OpenClaw Browser Relay and your browser's logged-in X session — the Browser Relay extension will have access to browsing/session data, so only proceed if you trust that extension. 3) Run first against a test account or in a sandbox/container to verify behavior. 4) Check users.txt to ensure only desired accounts are tracked. 5) Be aware of X/Twitter terms-of-service and rate limits when scraping, and avoid scraping private accounts. If you are unable to review the scripts, treat the package with caution or ask the publisher for source code review or provenance (repository URL and maintainer identity).
功能分析
Type: OpenClaw Skill Name: crawl-from-x Version: 2.7.0 The OpenClaw AgentSkills skill bundle 'crawl-from-x' is classified as benign. Its stated purpose is to scrape X/Twitter posts and media, which is consistently reflected in the code. The Python scripts (`craw_hot.py`, `media_downloader.py`) utilize `subprocess.run` to interact with the `openclaw browser` CLI for automation and `requests` for fetching data from Twitter APIs and media URLs. All file system operations (reading user lists, writing logs, results, and downloaded media) are confined to the skill's local directory. There is no evidence of data exfiltration to unauthorized external endpoints, installation of persistence mechanisms, or prompt injection attempts in the documentation that would manipulate the AI agent for malicious purposes. The JavaScript executed in the browser is internally defined for content extraction, not for arbitrary code execution from untrusted input.
能力评估
Purpose & Capability
Name/description (X/Twitter crawler exporting Markdown) match the provided files and instructions: scripts for crawling and media download, a users.txt, and many example result files. The skill requires the OpenClaw Browser Relay and a logged-in browser session, which is reasonable for a browser-driven crawler.
Instruction Scope
SKILL.md instructions are focused on installing/starting OpenClaw Browser Relay and running the included Python scripts (add/list/remove/crawl). The runtime guidance uses the user's logged-in browser session and directs output to local results/images directories. This is expected, but the runtime does require the Browser Relay extension and access to the browser session (cookies/auth) — a privacy/security consideration that is outside the skill itself.
Install Mechanism
There is no formal install spec in the registry; the README and SKILL.md recommend installing via `npx clawhub@latest install crawl-from-x` or cloning a GitHub repo. No external download URLs with unknown hosts are used in the manifest. However, the package includes executable Python scripts (craw_hot.py, media_downloader.py) that will run on the host once invoked — review these before execution.
Credentials
The skill declares no required environment variables, credentials, or config paths. Its need for a logged-in browser session and the Browser Relay extension is consistent with its crawling purpose. There are no extra or unrelated credentials requested.
Persistence & Privilege
No 'always: true' privilege is requested. The skill does not declare modifications to other skills or system-wide configs. It writes results/media into its own results/ and images/ directories, which is normal for this functionality.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install crawl-from-x
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /crawl-from-x 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.7.0
v2.7.0 - 使用 环境变量替代绝对路径,确保跨环境兼容。优化文档说明,明确安装位置和文件结构。
v2.6.0
单用户抓取与全部用户抓取使用相同策略,自动下载媒体文件并替换为本地路径
v2.5.1
- 新增元数据文件 _meta.json,用于描述技能元信息。 - 其他无功能性变更。
v2.5.0
重要更新:添加 Browser Relay 前置要求和安装说明
v2.4.3
优化发布包:只包含必要文件,排除 results、docs、users.txt 等运行时文件
v2.4.2
优化发布:排除 results 和 docs 目录,完善 .skillignore 配置
v2.4.1
修复技能名称和路径,更新为 crawl-from-x,清理调试文件,排除 results 目录
v2.4.0
修复技能名称和路径,更新为 crawl-from-x
v1.0.0
- Initial release of crawl-from-x (craw-hot) for automated X/Twitter post tracking and management. - Manage custom user lists: add, remove, list, or update users to monitor. - Batch-crawl daily posts from all or specific users; fetches full post contents, including media, via fxtwitter and syndication APIs. - Exports results as Markdown files with post content, media links, and engagement metrics. - Robust error handling: auto-retries on network/browser failure, incremental writing for data safety, progress tracking, and detailed logs. - Includes process locking to avoid concurrent runs, auto-cleans lock files, and supports auto-recovery from browser errors.
元数据
Slug crawl-from-x
版本 2.7.0
许可证
累计安装 1
当前安装数 1
历史版本数 9
常见问题

Crawl From X 是什么?

X/Twitter 帖子抓取工具。管理关注用户列表,自动抓取当天最新帖子,导出 Markdown。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 684 次。

如何安装 Crawl From X?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install crawl-from-x」即可一键安装,无需额外配置。

Crawl From X 是免费的吗?

是的,Crawl From X 完全免费(开源免费),可自由下载、安装和使用。

Crawl From X 支持哪些平台?

Crawl From X 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Crawl From X?

由 flyingtimes(@flyingtimes)开发并维护,当前版本 v2.7.0。

💬 留言讨论