Social Media Data Collector
/install social-media-data-collector
Social Media Data Collector
Overview
Collect engagement metrics from 13+ platforms, aggregate into structured format (飞书多维表格/CSV). Three-tier approach: API first → browser scrape fallback → manual flag.
Execution Flow
- Classify platforms by data access method (see references/platform-guide.md)
- API tier — call APIs for platforms with programmatic access
- Browser tier — Playwright render + text extraction for remaining
- Aggregate — normalize data, write to target (bitable/CSV)
- Cleanup — remove screenshots, temp files, browser cache
Platform Tiers
| Tier | Platforms | Method |
|---|---|---|
| API-first | 抖音, 微博, 快手, B站, 今日头条, 小红书 | TikHub API / BlueAI Crawler |
| Browser-scrape | 百家号, 汽车之家, 易车, 视频号, 斗鱼, 皮皮虾 | Playwright headless |
| API+scrape | 懂车帝 | TikHub (limited) + scrape |
Model Strategy (Token Optimization)
Problem
Using opus/sonnet for the entire pipeline wastes tokens on mechanical tasks.
Recommended Model Split
| Phase | Model | Why |
|---|---|---|
| Planning & classification | opus/sonnet | Needs reasoning |
| API calls & JSON parsing | haiku/flash | Mechanical, no reasoning needed |
| Browser text extraction | Code (no LLM) | Pure Python, no model call |
| Data normalization | haiku/flash | Simple mapping |
| Report/summary | sonnet | Needs synthesis |
Implementation
- Use
scripts/collect_api.pyfor API tier — zero LLM tokens (pure code) - Use
scripts/collect_browser.pyfor browser tier — zero LLM tokens (pure code) - Only invoke LLM for: planning which platforms to hit, handling errors, writing summaries
Token Budget Estimate (per 13-platform run)
- With current approach (all-opus): ~80k tokens
- With optimized approach (code scripts + haiku routing): ~5k tokens
- Savings: 94%
Key Commands
# Full collection run
python3 scripts/collect_api.py --config /tmp/sm-collect/config.json
# Browser scrape specific platforms
python3 scripts/collect_browser.py --platforms "百家号,汽车之家,视频号"
# Write to bitable
python3 scripts/write_bitable.py --app-token XXX --table-id YYY --data /tmp/sm-collect/results.json
# Cleanup
rm -rf /tmp/sm-collect/ /tmp/screenshots/
Bitable Field Mapping
| 多维表格字段 | 类型 | 说明 |
|---|---|---|
| 播放量 | text | 带"万"后缀的文本 |
| 点赞 | number | 纯数字 |
| 评论 | number | 纯数字 |
| 分享 | number | 纯数字 |
| 收藏 | number | 纯数字 |
| 互动量合计 | text | 带"万"后缀的文本 |
| 数据统计日期 | text | 格式 "2026.5.15" |
⚠️ 注意 播放量 和 互动量合计 是 text 类型,不是 number!传数字会报 TextFieldConvFail。
Cleanup Protocol
After each collection run, delete:
/tmp/sm-collect/(intermediate JSON)/tmp/screenshots/(browser screenshots)/tmp/subagent-out/(if spawned sub-agents)- Any
.jsontemp files in workspace
Error Handling
- API 403/401 → token expired, refresh and retry once
- Browser timeout → increase to 25s, retry with
wait_until="domcontentloaded" - Platform redirects → check URL is correct (易车 hao vs sv domain!)
- Empty data → flag for manual check, don't guess
Platform-Specific Notes
See references/platform-guide.md for detailed per-platform experience including:
- Authentication requirements
- URL patterns and gotchas
- Data extraction selectors
- Known limitations
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install social-media-data-collector - After installation, invoke the skill by name or use
/social-media-data-collector - Provide required inputs per the skill's parameter spec and get structured output
What is Social Media Data Collector?
Multi-platform social media data collection and aggregation for content performance tracking. Use when: (1) collecting engagement metrics (views/likes/commen... It is an AI Agent Skill for Claude Code / OpenClaw, with 68 downloads so far.
How do I install Social Media Data Collector?
Run "/install social-media-data-collector" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Social Media Data Collector free?
Yes, Social Media Data Collector is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Social Media Data Collector support?
Social Media Data Collector is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Social Media Data Collector?
It is built and maintained by Dr-xiaoming (@dr-xiaoming); the current version is v1.0.0.