Description

Cross-platform content collection, web search, trending topics, confidence scoring, and watch/triage workflows for assistant and agent usage.

README (SKILL.md)

DataPulse Skill (v0.8.1)

Name: DataPulse
Author: sunyifei83

Use this skill when the user needs one or more of the following:

Read or batch-read URLs across X, Reddit, YouTube, Bilibili, Telegram, WeChat, Xiaohongshu, RSS, arXiv, Hacker News, GitHub, and generic web pages
Search the web, inspect trending topics, or collect cross-platform signals
Create watch missions, alert routes, triage queues, or story evidence packs
Run assistant-ready URL intake through datapulse_skill.run()

Python Entry Point

from datapulse_skill import run

run("请处理这些链接: https://x.com/... https://www.reddit.com/...")

Core Capabilities

URL ingestion with normalized DataPulseItem output
Confidence scoring and ranking
Web search and trending discovery
Watch missions and alert routing
Triage queue and story workspace workflows

Behavior Disclosure

Browser Automation (optional)

DataPulse uses Playwright for platforms that require authenticated browser sessions (WeChat, Xiaohongshu). Browser automation is opt-in only — it activates when the user explicitly runs a login command and a valid session file exists. The playwright dependency is optional (pip install datapulse[browser]). No browser launches occur during normal URL reading.

Subprocess Calls

MCP transport: Story and triage modules invoke subprocess.run() to communicate with MCP tool servers via subprocess_json transport (stdin/stdout JSON-RPC). All calls have explicit timeouts (30s default).
YouTube fallback: The YouTube collector may call yt-dlp as a subprocess for audio transcript extraction when the native API is unavailable.
CLI update check: The CLI invokes pip install --upgrade only when the user explicitly runs --upgrade.

No subprocess call runs silently or without user-initiated action.

Local Persistence

Session files: Playwright login sessions are saved to ~/.datapulse/sessions/ for reuse. Sessions are TTL-cached (12h) and can be invalidated via invalidate_session_cache().
Data files: Watch missions, alert routes, triage queues, story workspaces, and entity stores persist as JSON files under the working directory (data/ folder). All writes use atomic save patterns.

No data is written outside the working directory or ~/.datapulse/ without explicit user action.

Outbound HTTP (alert delivery)

When the user configures alert routes, DataPulse sends POST requests to user-specified endpoints:

Webhook: arbitrary URL provided by the user
Feishu: Feishu bot webhook URL provided by the user
Telegram: Telegram Bot API (api.telegram.org) using a user-provided bot token

Alert delivery only fires when: (1) a watch mission matches new content, AND (2) the user has explicitly configured a route with a destination URL or token. No outbound POST occurs without user-configured routes.

Local Server (optional)

datapulse-console starts a local FastAPI/Uvicorn HTTP server for the browser-based console UI. It binds to localhost by default and is never started automatically — only when the user explicitly runs datapulse-console or python -m datapulse.console_server.

External API Calls (read-only)

Normal operation makes outbound GET/POST requests to:

Jina AI (r.jina.ai, s.jina.ai): URL reading and web search (requires JINA_API_KEY)
Tavily (api.tavily.com): web search (requires TAVILY_API_KEY)
Groq (api.groq.com): YouTube audio transcription fallback (requires GROQ_API_KEY)
Target URLs: the URLs the user asks to read

All API keys are read from environment variables; none are bundled or hard-coded.

Environment Notes

Python 3.10+
Optional search enhancement: JINA_API_KEY, TAVILY_API_KEY
Optional platform enhancement: TG_API_ID, TG_API_HASH, GROQ_API_KEY
Optional browser sessions: pip install datapulse[browser] (Playwright)
Optional console UI: pip install datapulse[console] (FastAPI + Uvicorn)

Usage Guidance

This package appears to do what it says: scrape and normalize content across many platforms, optionally use browser automation, run transcripts, and persist watch/triage data. Before installing or running it: (1) run it in an isolated environment (virtualenv/container) because it needs many Python packages and may call external binaries (yt-dlp, Playwright browsers); (2) only set optional API keys (JINA/TAVILY/GROQ/Telegram) if you trust the code and want those backends enabled; (3) be aware it will write session files to ~/.datapulse and data/ in the working directory — inspect those directories if you want to avoid leaving sensitive session state on disk; (4) the skill will make outbound HTTP requests to many public services and the target URLs you provide — if you need to prevent network egress, do not run it with network access; (5) because there is no install spec, manually install the documented optional dependencies (or use the project's extras) before use; (6) if you want higher assurance, review the remaining omitted source files (core/security.py, core/ops.py, etc.) to verify secret handling and subprocess usage. If anything in the omitted files looks like it reads unrelated credentials or posts data to unknown endpoints, treat that as a red flag.

Capability Analysis

Type: OpenClaw Skill Name: datapulse Version: 0.8.1 The DataPulse skill is a feature-rich intelligence hub that includes several high-risk functional patterns. Most notably, it executes arbitrary shell commands defined in environment variables to facilitate 'native bridges' and 'factuality backends' (found in native_bridge.py, story.py, and triage.py). It also features a self-update mechanism in cli.py that runs subprocess commands to install code directly from a GitHub repository. Additionally, security.py contains logic to search for API keys in a specific hardcoded local path (~/Library/Mobile Documents/iCloud~md~obsidian/...) which is highly specific to the author's environment. While these behaviors are disclosed in the documentation and aligned with the tool's purpose, the reliance on environment-driven command execution and remote updates creates a significant attack surface for RCE.

Capability Assessment

✓ Purpose & Capability

Name/description (cross-platform content collection, trending, scoring, watch/triage) match the included collectors, triage/workflow modules, and console/server pieces. Required binary is only python3 which is appropriate for a Python skill; optional capabilities (Playwright, yt-dlp, transcript/backends) are documented and align with the collectors implemented.

ℹ Instruction Scope

SKILL.md spells out runtime behavior: read target URLs, optional browser login (opt-in), local server only when explicitly started, alert POSTs only when user-configured routes exist. The code implements network calls to many third-party endpoints (websites, Jina/Tavily/Groq, social platforms) and writes session files to ~/.datapulse and JSON data under the working directory. This is consistent with the stated scope — but it does read environment variables and local test env vars (DATAPULSE_SMOKE_*, DATAPULSE_LOCAL_TEST_CONTEXT, JINA/TAVILY/GROQ, Telegram keys) which are not declared as required in the registry metadata (they are documented as optional in SKILL.md).

⚠ Install Mechanism

The skill has no install spec (instruction-only in registry), yet the code contains many third-party imports (requests, bs4, feedparser, youtube_transcript_api, Playwright) and calls out optional extras (datapulse[browser], datapulse[console]). No automated dependency installation is declared in the registry metadata — the runtime will fail or behave differently unless the environment already has the needed packages and external tools (yt-dlp, Playwright browsers). This is a usability and deployment risk rather than an outright malicious signal, but you should expect to install dependencies manually or run in an environment with those packages available.

ℹ Credentials

The skill requests no required env vars in registry metadata, and SKILL.md lists a set of optional API keys (JINA_API_KEY, TAVILY_API_KEY, GROQ_API_KEY, TG_API_ID/HASH, various DATAPULSE_SMOKE_* test URLs). All of these map to documented features (search backends, transcription, Telegram integration, smoke tests). There is no request for unrelated credentials (AWS, SSH keys, etc.). However, the code references secret accessors (get_secret/has_secret), and several env keys are used at runtime if present — the registry not listing any env requirements is acceptable but you should be aware these optional keys, if set, will be used.

✓ Persistence & Privilege

always:false and disable-model-invocation:false (normal). The skill writes session state to ~/.datapulse/sessions/ and stores mission/triage/story data as JSON in the working directory (data/). SKILL.md documents these locations and states writes are opt-in (e.g., login to capture Playwright sessions, enabling alert routes, running the console). The skill does not claim to modify other skills or system config.

Version History

v0.8.1

v0.8.1 — Security transparency Added full behavior disclosure to SKILL.md to address OpenClaw security scan findings: - Documented opt-in Playwright browser automation (WeChat/XHS login sessions) - Disclosed subprocess usage (MCP JSON-RPC transport, yt-dlp fallback, CLI upgrade) with explicit timeouts - Clarified local persistence scope (~/.datapulse/sessions/ and data/ only) - Declared outbound webhook/Feishu/Telegram alert delivery as user-configured-only - Noted local FastAPI console server is never auto-started - Listed all external API endpoints (Jina, Tavily, Groq) as read-only, key-from-env - Extended manifest capabilities with 6 new behavior declarations No functional changes. Code is identical to v0.8.0.

v0.8.0

v0.8.0 — Initial ClawHub release Cross-platform content intake skill for Claude Code with 15 collectors, confidence scoring, and agentic workflows. Highlights: - 15 source collectors (X/Twitter, Reddit, YouTube, Bilibili, Telegram, WeChat, XHS, RSS, arXiv, HN, GitHub, Weibo, trending, generic, Jina) - 4D confidence scoring (confidence + authority + corroboration + recency) - Web search gateway with multi-provider routing (Jina / Tavily / auto) - Watch missions with alert routing (Telegram, Feishu, webhook) - Triage queue and story workspace for analyst workflows - Entity extraction and relationship persistence - Trending topic discovery (30+ locations) - Collector health self-check (doctor) - CLI, MCP server, and Python skill entry point - 656 tests across 41 modules

Metadata

Slug datapulse

Version 0.8.1

License MIT-0

All-time Installs 2

Active Installs 2

Total Versions 2

Frequently Asked Questions

What is DataPulse?

Cross-platform content collection, web search, trending topics, confidence scoring, and watch/triage workflows for assistant and agent usage. It is an AI Agent Skill for Claude Code / OpenClaw, with 345 downloads so far.

How do I install DataPulse?

Run "/install datapulse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is DataPulse free?

Yes, DataPulse is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does DataPulse support?

DataPulse is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created DataPulse?

It is built and maintained by vincent.sun (@sunyifei83); the current version is v0.8.1.

More Skills

DataPulse