TikTok Hotspot Monitor
/install tiktok-hotspot-monitor
\r \r
TikTok Hotspot Monitor — Agent Skill\r
\r
1. Task Boundary (Scope)\r
\r
Responsible For\r
- Crawling TikTok video public metadata (keyword/hashtag/creator/music sources)\r
via Apify cloud Actor (
clockworks/tiktok-scraper)\r - Fallback crawling via Playwright browser automation with saved session\r
- Offline deduplication, heat scoring, and trend analysis\r
- Term extraction: content keywords and TikTok hashtags, with multi-bucket aging\r
- Long-term term status based on current-snapshot age distribution, not only previous-snapshot overlap\r
- Coverage scoring to surface "broadly appearing" signals vs "single viral" signals\r
- Static HTML report generation with dark theme\r \r
NOT Responsible For\r
- Downloading video/audio files\r
- Real-time streaming or WebSocket data\r
- TikTok login or session management (must be pre-configured)\r
- Sentiment analysis of comments\r
- Cross-platform trend comparison\r
- Automated social media posting\r
- User authentication or authorization\r
- Data persistence beyond local JSONL/JSON files\r \r
Agent Addition Scope\r
The agent MAY add new keyword/hashtag sources to the config. The agent MUST\r NOT modify crawl window weights or add new window types without user approval,\r as those affect Apify billing.\r \r ---\r \r
2. Input Schema\r
\r
2.1 Main Config (config/tiktok_hotspot_sources.json)\r
\r
interface CrawlerConfig {\r
market: string; // default: "US"\r
output: {\r
base_dir: string; // default: "data/tiktok_hotspots"\r
snapshots_dir: string; // default: "snapshots"\r
logs_dir: string; // default: "logs"\r
};\r
provider: {\r
type: "apify" | "tiktok_mcp"; // default: "apify"\r
actor_id?: string; // required if type=apify\r
};\r
defaults: {\r
limit: number; // default: 10, per-source limit\r
};\r
sources: Array\x3C{\r
type: "keyword" | "hashtag" | "creator" | "music";\r
value: string;\r
limit?: number; // override defaults.limit\r
enabled?: boolean; // default: true\r
}>;\r
apify?: {\r
token_env?: string; // default: "APIFY_TOKEN"\r
actor_id?: string;\r
input: {\r
defaults: Record\x3Cstring, any>;\r
per_source?: Record\x3Cstring, any>;\r
crawl_windows?: Record\x3Cstring, CrawlWindow[]>;\r
};\r
};\r
tiktok_mcp?: {\r
command?: string;\r
args?: string[];\r
timeout_seconds?: number;\r
reject_simulated?: boolean;\r
};\r
}\r
\r
interface CrawlWindow {\r
name: string;\r
label: string;\r
weight: number; // allocation weight\r
input: Record\x3Cstring, any>; // searchSorting, searchDatePosted, etc.\r
}\r
```\r
\r
### 2.2 CLI Arguments\r
\r
| Argument | Type | Default | Description |\r
|----------|------|---------|-------------|\r
| `--config` | Path | `config/tiktok_hotspot_sources.json` | Config file |\r
| `--once` | Flag | - | Run single crawl |\r
| `--schedule` | Flag | - | Run continuously |\r
| `--max-sources` | int | None | Limit enabled sources |\r
| `--snapshot` | Path | latest | JSONL snapshot for analysis |\r
| `--previous-snapshot` | Path | auto | Previous snapshot for comparison |\r
| `--top` | int | 10 | Items per ranked section |\r
| `--report` | Path | latest | Analysis JSON for rendering |\r
\r
### 2.3 Environment Variables\r
\r
| Variable | Required | Description |\r
|----------|----------|-------------|\r
| `APIFY_TOKEN` | For Apify mode | Apify API token |\r
| `TIKTOK_PROXY` | For Playwright mode | Proxy URL |\r
\r
---\r
\r
## 3. Output Schema\r
\r
### 3.1 Crawl Snapshot (JSONL, one record per line)\r
\r
```typescript\r
interface CrawlRecord {\r
crawl_timestamp: string; // UTC ISO\r
source_type: "keyword" | "hashtag" | "creator" | "music";\r
source_value: string;\r
crawl_window: string;\r
crawl_window_label: string;\r
crawl_window_limit: number;\r
video_id: string | null;\r
webpage_url: string | null;\r
title: string | null;\r
description: string | null;\r
uploader: string | null;\r
uploader_id: string | null;\r
view_count: number | null;\r
like_count: number | null;\r
comment_count: number | null;\r
share_count: number | null;\r
collect_count: number | null;\r
hashtags: string[] | null;\r
music: {\r
id: string | null;\r
track: string | null;\r
artist: string | null;\r
};\r
upload_date: string | null; // ISO date\r
duration: number | null;\r
is_ad: boolean | null;\r
}\r
```\r
\r
### 3.2 Crawl Log (JSONL)\r
\r
```typescript\r
interface LogEntry {\r
crawl_timestamp: string;\r
source_type: string;\r
source_value: string;\r
crawl_window: string;\r
crawl_window_limit: number;\r
status: "success" | "failed";\r
record_count: number;\r
error: string | null;\r
}\r
```\r
\r
Last entry is a `CrawlRoundSummary`:\r
\r
```typescript\r
interface CrawlRoundSummary {\r
event: "crawl_round_summary";\r
crawl_timestamp: string;\r
provider: string;\r
enabled_source_count: number;\r
crawl_window_count: number;\r
planned_run_count: number;\r
requested_total_limit: number;\r
completed_run_count: number;\r
failed_run_count: number;\r
raw_record_count: number;\r
unique_video_count: number;\r
duplicate_rate: number; // 0.0 - 1.0\r
effective_unique_yield: number; // unique / requested\r
windows: Record\x3Cstring, WindowMetrics>;\r
cost_model_note: string;\r
}\r
```\r
\r
### 3.3 Analysis Report (JSON)\r
\r
```typescript\r
interface AnalysisReport {\r
generated_at: string;\r
snapshot_path: string;\r
previous_snapshot_path: string | null;\r
analysis_window: {\r
current_snapshot_time: string;\r
previous_snapshot_time: string | null;\r
interval_hours: number | null;\r
matched_previous_video_count: number;\r
};\r
record_count: number;\r
unique_video_count: number;\r
source_counts: Record\x3Cstring, number>;\r
top_videos: VideoItem[];\r
top_rising_videos: VideoItem[];\r
recent_videos_by_age: AgeBucket\x3CVideoItem>[];\r
recent_signals_by_age: SignalBucket[];\r
established_terms: TermItem[];\r
established_hashtags: TermItem[];\r
top_music: RankedItem[];\r
top_creators: RankedItem[];\r
crawl_metrics: CrawlRoundSummary | null;\r
}\r
```\r
\r
### 3.4 HTML Report\r
\r
Self-contained static HTML file at `data/tiktok_hotspot_analysis/tiktok_hotspot_report_\x3Ctimestamp>.html`.\r
No external dependencies. Dark themed. Machine-readable data embedded as JSON in comments.\r
\r
---\r
\r
## 4. Tools\r
\r
### 4.1 `crawl_tiktok_hotspots.py` — Metadata Crawler\r
\r
**When to call:**\r
- User requests data collection\r
- Need fresh snapshot for analysis\r
- Smoke test / validation run\r
\r
**When NOT to call:**\r
- User wants to view existing data only (use analyze instead)\r
- No config changes made when config is invalid\r
- Apify mode: APIFY_TOKEN not set (check env first)\r
- MCP mode: session file missing (run `tiktok_login_save_session.py` first)\r
\r
**Provider switching:**\r
Edit `config/tiktok_hotspot_sources.json` to switch between providers:\r
\r
```json\r
// Apify mode (default, full features)\r
{ "provider": { "type": "apify", "actor_id": "clockworks/tiktok-scraper" } }\r
\r
// Local MCP mode (limited, testing only)\r
{ "provider": { "type": "tiktok_mcp" } }\r
```\r
\r
MCP mode requires:\r
1. `pip install playwright && playwright install chromium`\r
2. `python scripts/tiktok_login_save_session.py` (manual TikTok login)\r
3. Config `tiktok_mcp.args` pointing to `scripts/tiktok_search_mcp_adapter.py`\r
\r
**Implementation:**\r
```python\r
# Provider dispatch\r
if config.provider_type == "apify":\r
# Requires APIFY_TOKEN in env\r
# Each source × window → one Actor run\r
# Supports all 4 source types\r
elif config.provider_type == "tiktok_mcp":\r
# Requires saved session file\r
# Keyword/hashtag only, ~12 items per source\r
```\r
\r
**Error states:**\r
| Error | Recovery |\r
|-------|----------|\r
| Apify token missing | Check env, prompt user to set APIFY_TOKEN |\r
| Actor run timeout | Retry with same config |\r
| No videos found | Log as failed window, continue |\r
| MCP session expired | Prompt re-login via tiktok_login_save_session.py |\r
| Proxy unreachable | Skip proxy or switch to Apify |\r
| Snapshot empty | Check sources config, ensure keywords are valid |\r
\r
**Retry policy:**\r
- Network errors: retry up to 2 times with 5s backoff\r
- Actor failures: no retry (Apify handles internally), log and continue\r
- MCP browser crash: retry once\r
\r
### 4.2 `analyze_tiktok_hotspots.py` — Offline Analyzer\r
\r
**When to call:**\r
- After crawl completes\r
- User has existing snapshot to analyze\r
- Need updated report\r
\r
**Implementation steps:**\r
1. Load snapshot JSONL → validate each record has `video_id`\r
2. Deduplicate by `video_id` (keep highest heat score)\r
3. Compute per-video heat score\r
4. Bucket videos by upload age (1d/3d/7d/14d)\r
5. Extract content terms and hashtags\r
6. Compute cross-bucket novelty (new vs existing terms)\r
7. Compute coverage scores\r
8. Compare with previous snapshot for growth metrics\r
9. Output structured JSON\r
\r
### 4.2.1 Long-term Term Status\r
\r
Long-term content terms and hashtags are **not** dropped when they are missing from the previous snapshot. A term enters the long-term section when its oldest matched video is older than 30 days. Its status is then computed from the current snapshot's video-age distribution:\r
\r
| Status | Condition | Meaning |\r
|--------|-----------|---------|\r
| `spreading` | newest video \x3C= 7 days AND recent_7d_count / video_count >= 10% | Still actively spreading |\r
| `mature_or_flat` | newest video \x3C= 30 days but 7d ratio is too low | Existing signal, activity weakening |\r
| `cooling` | newest video > 30 days | No recent new videos; cooling down |\r
\r
This avoids losing a long-term term simply because the previous crawl did not hit it, while also preventing one recent video among many old videos from falsely marking a term as spreading.\r
\r
\r
\r
### 4.3 — HTML Report Generator\r
\r
**When to call:**\r
- After analysis completes\r
- User requests visual output\r
\r
**Output:** Valid HTML5, self-contained, no external CSS/JS.\r
\r
### 4.4 `tiktok_login_save_session.py` — Session Setup (optional)\r
\r
**When to call:**\r
- User wants to use local Playwright mode\r
- Session file missing or expired\r
\r
---\r
\r
## 5. State Machine\r
\r
```\r
IDLE\r
│\r
▼\r
CONFIG_LOAD ──invalid──▶ ERROR (report config issue)\r
│\r
▼\r
CRAWL_PLAN\r
├─ Build requests: enabled_sources × crawl_windows\r
├─ Compute: planned_run_count, requested_total_limit\r
└─ Validate: at least 1 enabled source\r
│\r
▼\r
CRAWL_EXECUTE ──fail──▶ PARTIAL_COMPLETE (log failures, continue)\r
│ │\r
▼ ▼\r
SNAPSHOT_WRITTEN PARTIAL_SNAPSHOT\r
│ │\r
└───────both────────────▶\r
│\r
▼\r
ANALYZE ──empty_snapshot──▶ ERROR (no records to analyze)\r
│\r
▼\r
REPORT_GENERATE ──fail──▶ ERROR (corrupted analysis JSON)\r
│\r
▼\r
COMPLETE\r
```\r
\r
State management is handled by the Python scripts via:\r
- Exit codes: 0 (success), 1 (partial failure), 2 (config/input error)\r
- Logs: per-run JSONL entries with status\r
- Summary: `CrawlRoundSummary` as last log entry\r
\r
---\r
\r
## 6. Error Recovery\r
\r
### 6.1 Crawl Phase\r
\r
| Failure Mode | Detection | Recovery |\r
|-------------|-----------|----------|\r
| Invalid config | `load_config()` raises `ValueError` | Report exact field, suggest fix |\r
| No enabled sources | Config load check | Add at least one source |\r
| Apify token missing | `os.environ.get()` returns empty | Message: "Set APIFY_TOKEN in .env" |\r
| All sources fail | All log entries show `failed` | Check token, network, actor_id |\r
| Some sources fail | Log shows mixed success/fail | Continue, report failed count |\r
| Snapshot empty | 0 records written | Check source keywords/limits |\r
| Disk full | `write()` raises `OSError` | Free disk space, retry |\r
| MCP browser timeout | `asyncio.wait_for` raises | Fallback to fewer sources |\r
| MCP session expired | Actor raises RuntimeError | Run `tiktok_login_save_session.py` |\r
\r
### 6.2 Analyze Phase\r
\r
| Failure Mode | Detection | Recovery |\r
|-------------|-----------|----------|\r
| Snapshot missing | `FileNotFoundError` | Run crawl first |\r
| Corrupted JSONL | `json.JSONDecodeError` | Check snapshot, re-crawl |\r
| No video records | All lines lack `video_id` | Report empty snapshot |\r
| Previous snapshot missing | `valid_snapshots()` empty | Run without comparison |\r
| Division by zero | `video_count = 0` | Guard with `max(vc, 1)` |\r
\r
### 6.3 Report Phase\r
\r
| Failure Mode | Detection | Recovery |\r
|-------------|-----------|----------|\r
| Analysis JSON missing | `FileNotFoundError` | Run analyze first |\r
| Corrupted JSON | `json.JSONDecodeError` | Re-run analyze |\r
| KeyError in template | `report.get(key)` missing | Graceful fallback to empty |\r
| Encoding error | `UnicodeEncodeError` | Force UTF-8 output |\r
\r
---\r
\r
## 7. Planning Logic\r
\r
### 7.1 Task Decomposition\r
\r
For a typical hotspot monitoring request, decompose as:\r
\r
```\r
Step 1: Check existing data\r
├─ Is there a recent snapshot? (\x3C 24h old)\r
│ └─ Yes → skip crawl, go to Step 3\r
│ └─ No → continue to Step 2\r
│\r
Step 2: Crawl\r
├─ Validate APIFY_TOKEN exists\r
├─ Load config\r
├─ Run crawl (with timeout guard)\r
└─ Verify snapshot has records\r
│\r
Step 3: Analyze\r
├─ Auto-select latest snapshot\r
├─ Auto-select previous snapshot (if exists)\r
├─ Run analysis\r
└─ Verify output JSON has all required fields\r
│\r
Step 4: Generate report\r
├─ Render HTML from analysis JSON\r
└─ Verify output is valid HTML\r
```\r
\r
### 7.2 Decision Tree\r
\r
```\r
User: "check TikTok trends for summer dresses"\r
\r
Check: Does latest snapshot exist and have records?\r
├─ YES: Is it \x3C 24h old?\r
│ ├─ YES: Skip crawl, go to analyze\r
│ └─ NO: Is user OK waiting 5-30 min for crawl?\r
│ ├─ YES: Run crawl, then analyze\r
│ └─ NO: Use existing snapshot, warn about staleness\r
└─ NO: Must crawl first\r
├─ Is APIFY_TOKEN configured?\r
│ ├─ YES: Use Apify provider\r
│ └─ NO: Check MCP session\r
│ ├─ EXISTS: Use MCP provider (limited data)\r
│ └─ MISSING: Ask user to configure one\r
└─ Run crawl\r
```\r
\r
---\r
\r
## 8. Guardrails\r
\r
### 8.1 Cost Limits\r
\r
| Guardrail | Value | Enforcement |\r
|-----------|-------|-------------|\r
| Max sources per crawl | 50 | Config validation |\r
| Max limit per source | 500 | Config validation (`positive_int`) |\r
| Max requested total | 5000 | Config validation (project-level) |\r
| Max planned runs | 250 | 50 sources × 5 windows |\r
| Apify mode | Required for > 200 records | MCP limited to ~12/source |\r
| Report HTML size | \x3C 5MB | Self-limiting (trim if exceeded) |\r
\r
### 8.2 Time Limits\r
\r
| Operation | Timeout | Enforcement |\r
|-----------|---------|-------------|\r
| Single crawl run | 60 min | Bash timeout parameter |\r
| Per-Apify Actor | No limit | Apify handles internally |\r
| Per-MCP search | 120s | `tiktok_mcp.timeout_seconds` |\r
| Analysis | 30s | Python processing (fast) |\r
| Report render | 10s | Python processing (fast) |\r
\r
### 8.3 Rate Limits\r
\r
- No concurrent Apify runs (sequentially dispatched)\r
- MCP browser: one at a time (sequential per source)\r
- Web fetching: 60s minimum between full re-crawls\r
\r
### 8.4 Token / Credit Safety\r
\r
- Never commit `.env` to git\r
- Never print API tokens in logs or console\r
- `APIFY_TOKEN` read from environment only\r
- MCP session file is local only\r
\r
---\r
\r
## 9. Evaluation Criteria\r
\r
### 9.1 Crawl Success\r
\r
| Criterion | Passing | Warning | Failing |\r
|-----------|---------|---------|---------|\r
| Run completion | ≥ 90% runs succeed | 70-90% | \x3C 70% |\r
| Record count | ≥ 80% requested | 50-80% | \x3C 50% |\r
| Duplicate rate | \x3C 25% | 25-40% | > 40% |\r
| Failed windows | 0 | 1-3 | > 3 |\r
| Unique videos | ≥ 50 | 20-50 | \x3C 20 |\r
\r
### 9.2 Analysis Success\r
\r
| Criterion | Passing | Failing |\r
|-----------|---------|---------|\r
| Snapshot has records | ≥ 10 unique videos | \x3C 10 |\r
| Dedup processed | All records checked | Missing video_id |\r
| Term extraction | ≥ 1 content term found | 0 terms |\r
| JSON output | All required fields present | Missing required fields |\r
| Processing time | \x3C 30s | > 60s |\r
\r
### 9.3 Report Success\r
\r
| Criterion | Passing | Failing |\r
|-----------|---------|---------|\r
| Valid HTML | Closes `\x3C/html>` tag | Missing closing tag |\r
| Metrics visible | ≥ 4 grid metrics shown | Empty grid |\r
| Videos rendered | Top list non-empty | Empty list |\r
| All sections present | 6+ sections | \x3C 4 sections |\r
\r
### 9.4 Decision: Proceed to Next Stage\r
\r
After a validation crawl (target ~500 records):\r
\r
```\r
unique_yield = unique_videos / requested_total_limit\r
\r
if unique_yield >= 0.6 and duplicate_rate \x3C 0.25:\r
✅ Proceed to pilot (2000 target)\r
elif unique_yield >= 0.4:\r
⚠️ Proceed with caution, review source quality\r
else:\r
❌ Block scaling, fix sources/windows first\r
```\r
\r
---\r
\r
## 10. Composability\r
\r
### 10.1 Output Consumption\r
\r
Other skills/agents consume analysis JSON via standard path:\r
\r
```python\r
# Example: Another agent reads analysis for downstream processing\r
import json\r
\r
report = json.load(open("data/tiktok_hotspot_analysis/latest_analysis.json"))\r
top_signals = [t["name"] for t in report.get("top_videos", [])[:5]]\r
hot_terms = [t["name"] for t in report.get("established_terms", [])[:10]]\r
```\r
\r
### 10.2 Pipeline Integration\r
\r
```\r
Data Source Agent\r
└─► TikTok Hotspot Monitor Skill\r
├─► crawl → snapshot.jsonl\r
│ └─► [External] Apify usage dashboard (cost tracking)\r
├─► analyze → analysis.json\r
│ └─► [Downstream] Trend prediction / alerting\r
└─► render → report.html\r
└─► [Downstream] Static hosting / dashboard\r
```\r
\r
### 10.3 File-Based Contract\r
\r
All inter-skill communication is file-based:\r
\r
| Artifact | Format | Schema | Consumer |\r
|----------|--------|--------|----------|\r
| Snapshot | JSONL | CrawlRecord | Analysis, ML pipeline |\r
| Analysis | JSON | AnalysisReport | Report, dashboards |\r
| Log | JSONL | LogEntry / Summary | Monitoring, cost tracking |\r
| Report | HTML | Self-contained | Human viewing |\r
\r
### 10.4 Exit Codes\r
\r
```python\r
# Standard exit codes for script chaining\r
0: Success (all operations completed)\r
1: Partial success (some failures, usable results)\r
2: Configuration error (fix config before retry)\r
```\r
\r
---\r
\r
## Appendix: Quick Reference\r
\r
```bash\r
# Full pipeline (one command each)\r
python scripts/crawl_tiktok_hotspots.py --config config/tiktok_hotspot_sources.json --once\r
python scripts/analyze_tiktok_hotspots.py\r
python scripts/render_tiktok_hotspot_report.py\r
\r
# Smoke test (2 sources)\r
python scripts/crawl_tiktok_hotspots.py --once --max-sources 2\r
\r
# Validation run (500 records)\r
python scripts/crawl_tiktok_hotspots.py --config config/_tiktok_hotspot_apify_500_config.json --once\r
```\r
\r
**Apify Cost Note:** Verify actual charges at console.apify.com → Usage.\r
Cost depends on Actor pricing, run count, compute duration, memory, proxy usage,\r
retries, add-ons, and account plan — not only requested result count.\r
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install tiktok-hotspot-monitor - After installation, invoke the skill by name or use
/tiktok-hotspot-monitor - Provide required inputs per the skill's parameter spec and get structured output
What is TikTok Hotspot Monitor?
TikTok hotspot monitor. Crawls video metadata via Apify (primary) or Playwright (backup), analyzes trends with heat/coverage scoring, gene... It is an AI Agent Skill for Claude Code / OpenClaw, with 12 downloads so far.
How do I install TikTok Hotspot Monitor?
Run "/install tiktok-hotspot-monitor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is TikTok Hotspot Monitor free?
Yes, TikTok Hotspot Monitor is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does TikTok Hotspot Monitor support?
TikTok Hotspot Monitor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created TikTok Hotspot Monitor?
It is built and maintained by Tan Dongtao (@tandongtaotao); the current version is v2.0.0.