← Back to Skills Marketplace
excalibur9527

Media Crawler

by Excalibur9527 · GitHub ↗ · v0.1.4 · MIT-0
cross-platform ✓ Security Clean
910
Downloads
1
Stars
1
Active Installs
4
Versions
Install in OpenClaw
/install mediacrawler-skill
Description
基于 MediaCrawler 的多平台公开信息采集工具,支持安装、命令行运行、WebUI、结果定位与常用任务模板。
README (SKILL.md)

MediaCrawler

基于 MediaCrawler 的多平台公开信息采集工具。

支持平台

  • 小红书(xhs)
  • 抖音(dy)
  • 快手(ks)
  • B站(bili)
  • 微博(wb)
  • 贴吧(tieba)
  • 知乎(zhihu)

功能特性

  • 自动安装依赖
  • 关键词搜索采集
  • 指定帖子/内容 ID 采集
  • 创作者主页采集
  • 评论/二级评论抓取
  • 登录态缓存
  • WebUI 可视化操作
  • 多种数据存储(CSV, JSON, JSONL, Excel, SQLite, MySQL, MongoDB)
  • 结果文件快速定位

Usage

安装环境

bash scripts/setup.sh

查看帮助

cd "$PROJECT_PATH"
uv run main.py --help

运行采集

小红书 - 关键词搜索

uv run main.py --platform xhs --lt qrcode --type search --keywords "护肤" --headless false

抖音 - 关键词搜索

uv run main.py --platform dy --lt qrcode --type search --keywords "护肤" --headless false

指定帖子详情抓取

uv run main.py --platform xhs --lt qrcode --type detail --specified_id "帖子ID1,帖子ID2"

创作者主页抓取

uv run main.py --platform xhs --lt qrcode --type creator --creator_id "创作者ID1"

启动 WebUI

uv run uvicorn api.main:app --port 8080 --reload

启动后访问:

http://127.0.0.1:8080

数据存储

根据 config/base_config.py 中:

SAVE_DATA_OPTION = "jsonl"
SAVE_DATA_PATH = ""

默认结果保存到:

data/{平台}/{存储格式}/

例如抖音 JSONL:

data/douyin/jsonl/search_contents_YYYY-MM-DD.jsonl
data/douyin/jsonl/search_comments_YYYY-MM-DD.jsonl
data/douyin/jsonl/search_creators_YYYY-MM-DD.jsonl

例如小红书 JSONL:

data/xiaohongshu/jsonl/search_contents_YYYY-MM-DD.jsonl
data/xiaohongshu/jsonl/search_comments_YYYY-MM-DD.jsonl

如果你设置了:

--save_data_path "/your/custom/path"

则结果会写入你指定的目录。

快速查看结果

bash scripts/show_results.sh

该脚本会列出当前项目下 data/ 目录中的结果文件。

前置依赖

  • Git
  • uv(脚本可自动安装)
  • Playwright 浏览器驱动(脚本自动安装 Chromium)

注意事项

  • 仅供学习和研究使用
  • 禁止用于非法用途或侵犯他人合法权益
  • 禁止用于商业化违规爬取
  • 执行前应确认目标行为合法合规
Usage Guidance
This skill clones and runs the upstream MediaCrawler project and may execute remote install scripts. Before installing: (1) review the GitHub repo (https://github.com/NanmiCoder/MediaCrawler) and its main.py / dependencies for any unexpected behavior; (2) avoid running setup.sh on a machine with sensitive data or credentials—use an isolated VM/container if possible; (3) note that setup.sh may run curl | sh to install 'uv' and will download Playwright/Chromium; (4) confirm your intended use is legal and complies with site terms of service. If you cannot review the upstream code, consider skipping installation or running only in a sandbox.
Capability Analysis
Type: OpenClaw Skill Name: mediacrawler-skill Version: 0.1.4 The skill is a functional wrapper for the MediaCrawler open-source project, providing automated setup and execution commands. It uses standard installation procedures, such as fetching the official 'uv' package manager and cloning the legitimate GitHub repository (NanmiCoder/MediaCrawler). No malicious behavior, data exfiltration, or harmful prompt injections were identified in the scripts or documentation.
Capability Assessment
Purpose & Capability
Name and description match the files and commands: the skill clones NanmiCoder/MediaCrawler, syncs dependencies, installs Playwright, and exposes run/web/result commands. Required artifacts (git, uv, Playwright) align with the stated purpose.
Instruction Scope
Runtime instructions and scripts operate within the expected scope (clone repo, uv sync, playwright install, run main.py, list data files). They do execute code from the cloned repository (uv run main.py) which is necessary for health checks and operation — this means upstream code will run on the host, so inspect the repo before running in sensitive environments.
Install Mechanism
No packaged install spec is included, but setup.sh may run a network-installed bootstrap: it pipes https://astral.sh/uv/install.sh into sh if uv is missing. Running a remote install script piped to sh is higher-risk than a vetted package; cloning from GitHub and running uv sync / uv run will also execute external code. These behaviors are coherent with installing/running a crawler but increase the attack surface.
Credentials
The skill does not request secrets or unrelated environment variables. It accepts PROJECT_PATH (default $HOME/MediaCrawler) which is appropriate and documented. No credentials or config paths outside the project are touched.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It does not modify other skills or system-wide agent settings. It writes files under the project directory (data/), which is expected for a crawler.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install mediacrawler-skill
  3. After installation, invoke the skill by name or use /mediacrawler-skill
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.4
- Improved documentation with clearer usage instructions, more examples, and template commands for common tasks. - Added scripts/show_results.sh for quickly locating result files in the data directory. - Updated supported platforms section with standardized platform codes. - Added details on various data storage formats and how to configure the save path. - README and setup instructions refined for easier installation and operation. - Manifest and metadata updated to reflect new capabilities and usage patterns.
v0.1.2
- Updated README.md for improved documentation and clarity. - Refined manifest.json configuration. - No functional or logic changes to the skill code.
v0.1.1
- Updated manifest.json to bump version to 0.1.1. - No functional code changes; documentation and metadata only.
v0.1.0
Initial release of media-crawler skill. - Supports multi-platform open data crawling (Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, Zhihu) - Features keyword search, post/comment retrieval, login cache, proxy, and WebUI - Allows output in CSV, JSON, JSONL, Excel, SQLite, MySQL - Includes command-line usage and configuration instructions - Emphasizes legal/compliance requirements for usage
Metadata
Slug mediacrawler-skill
Version 0.1.4
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 4

💬 Comments