← Back to Skills Marketplace
Crawl From X
by
flyingtimes
· GitHub ↗
· v2.7.0
684
Downloads
1
Stars
1
Active Installs
9
Versions
Install in OpenClaw
/install crawl-from-x
Description
X/Twitter 帖子抓取工具。管理关注用户列表,自动抓取当天最新帖子,导出 Markdown。
README (SKILL.md)
Crawl From X
X/Twitter 帖子抓取工具。
⚠️ 前置要求:需要 OpenClaw Browser Relay 和浏览器扩展。
安装
npx clawhub@latest install crawl-from-x
安装位置:
$CLAWD/skills/crawl-from-x/scripts/craw_hot.py- 主脚本$CLAWD/skills/crawl-from-x/users.txt- 用户列表$CLAWD/skills/crawl-from-x/results/- 抓取结果
准备
1. 安装 OpenClaw
访问 https://github.com/openclaw/openclaw 下载安装。
2. 安装浏览器扩展
在 OpenClaw 设置中进入 "Browser Relay",安装扩展。完成后扩展显示绿色图标。
3. 启动 Browser Relay
openclaw browser start
openclaw browser status # 确认显示 "browser: enabled"
4. 登录 X 账号
在安装了扩展的浏览器中登录 X (Twitter)。
快速开始
cd $CLAWD/skills/crawl-from-x/scripts
# 添加用户
python3 craw_hot.py add username
# 列出用户
python3 craw_hot.py list
# 删除用户
python3 craw_hot.py remove username
# 抓取所有用户
python3 craw_hot.py crawl
# 抓取单个用户
python3 craw_hot.py crawl username
结果文件:
posts_YYYYMMDD_HHMMSS.md- 完整内容(Markdown),媒体 URL 已替换为本地路径posts_YYYYMMDD_HHMMSS.txt- URL 列表(仅全部用户抓取)images/- 下载的图片和视频
说明:
- 单用户抓取和全部用户抓取使用相同的策略
- 所有媒体文件(图片、动图、视频)都会下载到
images/目录 - Markdown 文件中的媒体 URL 会自动替换为本地相对路径
注意事项
- 浏览器要求:必须安装 OpenClaw 浏览器扩展
- 登录状态:浏览器必须登录 X 账号
- 速率限制:脚本已内置随机延迟
- 私密账号:无法抓取私密账号内容
Usage Guidance
This skill is coherent with its stated purpose, but before installing or running it: 1) Inspect the two included Python scripts (scripts/craw_hot.py and scripts/media_downloader.py) for any unexpected network calls, hard-coded external endpoints, or data-exfil logic. 2) Understand that the skill uses the OpenClaw Browser Relay and your browser's logged-in X session — the Browser Relay extension will have access to browsing/session data, so only proceed if you trust that extension. 3) Run first against a test account or in a sandbox/container to verify behavior. 4) Check users.txt to ensure only desired accounts are tracked. 5) Be aware of X/Twitter terms-of-service and rate limits when scraping, and avoid scraping private accounts. If you are unable to review the scripts, treat the package with caution or ask the publisher for source code review or provenance (repository URL and maintainer identity).
Capability Analysis
Type: OpenClaw Skill
Name: crawl-from-x
Version: 2.7.0
The OpenClaw AgentSkills skill bundle 'crawl-from-x' is classified as benign. Its stated purpose is to scrape X/Twitter posts and media, which is consistently reflected in the code. The Python scripts (`craw_hot.py`, `media_downloader.py`) utilize `subprocess.run` to interact with the `openclaw browser` CLI for automation and `requests` for fetching data from Twitter APIs and media URLs. All file system operations (reading user lists, writing logs, results, and downloaded media) are confined to the skill's local directory. There is no evidence of data exfiltration to unauthorized external endpoints, installation of persistence mechanisms, or prompt injection attempts in the documentation that would manipulate the AI agent for malicious purposes. The JavaScript executed in the browser is internally defined for content extraction, not for arbitrary code execution from untrusted input.
Capability Assessment
Purpose & Capability
Name/description (X/Twitter crawler exporting Markdown) match the provided files and instructions: scripts for crawling and media download, a users.txt, and many example result files. The skill requires the OpenClaw Browser Relay and a logged-in browser session, which is reasonable for a browser-driven crawler.
Instruction Scope
SKILL.md instructions are focused on installing/starting OpenClaw Browser Relay and running the included Python scripts (add/list/remove/crawl). The runtime guidance uses the user's logged-in browser session and directs output to local results/images directories. This is expected, but the runtime does require the Browser Relay extension and access to the browser session (cookies/auth) — a privacy/security consideration that is outside the skill itself.
Install Mechanism
There is no formal install spec in the registry; the README and SKILL.md recommend installing via `npx clawhub@latest install crawl-from-x` or cloning a GitHub repo. No external download URLs with unknown hosts are used in the manifest. However, the package includes executable Python scripts (craw_hot.py, media_downloader.py) that will run on the host once invoked — review these before execution.
Credentials
The skill declares no required environment variables, credentials, or config paths. Its need for a logged-in browser session and the Browser Relay extension is consistent with its crawling purpose. There are no extra or unrelated credentials requested.
Persistence & Privilege
No 'always: true' privilege is requested. The skill does not declare modifications to other skills or system-wide configs. It writes results/media into its own results/ and images/ directories, which is normal for this functionality.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install crawl-from-x - After installation, invoke the skill by name or use
/crawl-from-x - Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.7.0
v2.7.0 - 使用 环境变量替代绝对路径,确保跨环境兼容。优化文档说明,明确安装位置和文件结构。
v2.6.0
单用户抓取与全部用户抓取使用相同策略,自动下载媒体文件并替换为本地路径
v2.5.1
- 新增元数据文件 _meta.json,用于描述技能元信息。
- 其他无功能性变更。
v2.5.0
重要更新:添加 Browser Relay 前置要求和安装说明
v2.4.3
优化发布包:只包含必要文件,排除 results、docs、users.txt 等运行时文件
v2.4.2
优化发布:排除 results 和 docs 目录,完善 .skillignore 配置
v2.4.1
修复技能名称和路径,更新为 crawl-from-x,清理调试文件,排除 results 目录
v2.4.0
修复技能名称和路径,更新为 crawl-from-x
v1.0.0
- Initial release of crawl-from-x (craw-hot) for automated X/Twitter post tracking and management.
- Manage custom user lists: add, remove, list, or update users to monitor.
- Batch-crawl daily posts from all or specific users; fetches full post contents, including media, via fxtwitter and syndication APIs.
- Exports results as Markdown files with post content, media links, and engagement metrics.
- Robust error handling: auto-retries on network/browser failure, incremental writing for data safety, progress tracking, and detailed logs.
- Includes process locking to avoid concurrent runs, auto-cleans lock files, and supports auto-recovery from browser errors.
Metadata
Frequently Asked Questions
What is Crawl From X?
X/Twitter 帖子抓取工具。管理关注用户列表,自动抓取当天最新帖子,导出 Markdown。 It is an AI Agent Skill for Claude Code / OpenClaw, with 684 downloads so far.
How do I install Crawl From X?
Run "/install crawl-from-x" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Crawl From X free?
Yes, Crawl From X is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Crawl From X support?
Crawl From X is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Crawl From X?
It is built and maintained by flyingtimes (@flyingtimes); the current version is v2.7.0.
More Skills