← 返回 Skills 市场
astonysh

DeepReader

作者 Tony Li · GitHub ↗ · v0.1.0
cross-platform ⚠ suspicious
823
总下载
2
收藏
6
当前安装
1
版本数
在 OpenClaw 中安装
/install deepreader-skill
功能描述
The default web content reader for OpenClaw. Reads X (Twitter), Reddit, YouTube, and any webpage into clean Markdown — zero API keys required. Use when you n...
使用说明 (SKILL.md)

DeepReader

The default web content reader for OpenClaw agents. Automatically detects URLs in messages, fetches content using specialized parsers, and saves clean Markdown with YAML frontmatter to agent memory.

Use when

  1. A user shares a tweet, thread, or X article and you need to read its content
  2. A user shares a Reddit post and you need the discussion + top comments
  3. A user shares a YouTube video and you need the transcript
  4. A user shares any blog, article, or documentation URL and you need the text
  5. You need to batch-read multiple URLs from a single message

Supported sources

Source Method API Key?
Twitter / X FxTwitter API + Nitter fallback None
Reddit .json suffix API None
YouTube youtube-transcript-api None
Any URL Trafilatura + BeautifulSoup None

Usage

from deepreader_skill import run

# Automatic — triggered when message contains URLs
result = run("Check this out: https://x.com/user/status/123456")

# Reddit post with comments
result = run("https://www.reddit.com/r/python/comments/abc123/my_post/")

# YouTube transcript
result = run("https://youtube.com/watch?v=dQw4w9WgXcQ")

# Any webpage
result = run("https://example.com/blog/interesting-article")

# Multiple URLs at once
result = run("""
  https://x.com/user/status/123456
  https://www.reddit.com/r/MachineLearning/comments/xyz789/
  https://example.com/article
""")

Output

Content is saved as .md files with structured YAML frontmatter:

---
title: "Tweet by @user"
source_url: "https://x.com/user/status/123456"
domain: "x.com"
parser: "twitter"
ingested_at: "2026-02-16T12:00:00Z"
content_hash: "sha256:..."
word_count: 350
---

Configuration

Variable Default Description
DEEPREEDER_MEMORY_PATH ../../memory/inbox/ Where to save ingested content
DEEPREEDER_LOG_LEVEL INFO Logging verbosity

How it works

URL detected → is Twitter/X?  → FxTwitter API → Nitter fallback
             → is Reddit?     → .json suffix API
             → is YouTube?    → youtube-transcript-api
             → otherwise      → Trafilatura (generic)

Triggers automatically when any message contains https:// or http://.

安全使用建议
This skill appears to implement a real web content reader, but exercise caution before enabling it broadly. Key points to consider: - SSRF / unrestricted fetches: The skill will attempt to download any URL it detects (generic fallback fetches arbitrary hosts). If the agent runs in a networked environment with access to internal resources (localhost, internal metadata endpoints, cloud IMDS, private services), maliciously crafted messages or links could cause the agent to connect to those endpoints. Restricting the skill to isolated execution environments or adding a URL allowlist/blocklist is recommended. - Automatic triggering: The manifest triggers on any message containing "http(s)://". If you want manual control, disable the automatic trigger or require explicit user invocation. - Storage: Fetched content is written to the agent's memory directory (default ../../memory/inbox/). Confirm that storing external content there is acceptable and that sensitive data won't be leaked to downstream components that read agent memory. - Dependencies & deployment: The package imports non-stdlib libraries (trafilatura, bs4, youtube_transcript_api, requests). There is no install spec — ensure required dependencies are installed in a controlled way before use. - Minor red flags: Several typos/inconsistencies ("DeepReeder"/"DEEPREEDER") and mismatches between SKILL.md and code suggest the package may be lightly maintained — review code before trusting in production. If you plan to use it: run the skill in a sandboxed environment with constrained network egress, review/limit which domains are fetchable, audit requirements.txt and install dependencies from trusted sources, and consider disabling automatic URL-triggering until you add domain/host protections.
功能分析
Package: DeepReader (xpi) Version: 1.0.0 Description: The default web content reader for OpenClaw. Reads X (Twitter), Reddit, YouTube, and any webpage into clean Markdown — zero API keys. Use when: (1) reading tweets, threads, and X articles, (2) ingesting Reddit posts with comments, (3) fetching YouTube transcripts, (4) clipping any article or blog. The DeepReader skill is a web content ingestion tool designed for OpenClaw agents. It extracts URLs from user messages, uses specialized parsers (FxTwitter, Nitter, YouTube, Reddit, Generic) to fetch content, and saves it as clean Markdown files with YAML frontmatter to a designated local memory path (`../../memory/inbox/`). The package utilizes standard and well-known libraries for web scraping, content extraction, and data processing (e.g., `requests`, `trafilatura`, `beautifulsoup4`, `youtube-transcript-api`). While web scraping inherently involves making network requests to user-provided URLs, which could pose risks like SSRF or DoS, the implementation includes safeguards such as URL validation, filename sanitization, request timeouts, and specific User-Agent headers. There is no evidence of malicious activity, data exfiltration to unauthorized destinations, or arbitrary code execution. File writing operations are confined to an expected local memory directory, and filename generation includes sanitization to prevent path traversal. The code logic is transparent and aligns with the stated purpose of the skill.
能力评估
Purpose & Capability
The code and manifest match the described purpose: parsers for X/Twitter (FxTwitter + Nitter), Reddit (.json), YouTube transcripts, and generic webpages using trafilatura/BeautifulSoup. However, SKILL.md and other text contain typos/inconsistent names (e.g., "DEEPREEDER" / "DeepReeder") and the repo includes Python modules despite an earlier statement that the skill is instruction-only. The presence of a requirements.txt but no install spec is an implementation mismatch.
Instruction Scope
The skill triggers on any message containing 'http(s)://' and will attempt to fetch every detected URL (GenericParser will fetch arbitrary domains). There is no domain allowlist, no internal-host blocking, and no explicit SSRF protections. It writes the fetched content into agent memory. This broad, automatic URL-fetching behavior is the primary security concern (SSRF/data exposure, untrusted fetches).
Install Mechanism
There is no install spec (instruction-only in metadata), yet the package contains Python code that imports external libraries (trafilatura, bs4, requests, youtube_transcript_api). Without an install step the runtime may lack required dependencies, causing failures; the lack of an installation mechanism is an operational inconsistency but not itself malicious.
Credentials
The skill does not request credentials or secrets (requires.env empty), which is appropriate. SKILL.md documents two environment variables (DEEPREEDER_MEMORY_PATH, DEEPREEDER_LOG_LEVEL) but the code does not read these explicitly and the variable name is misspelled relative to the skill name — an inconsistent configuration story that could confuse administrators.
Persistence & Privilege
The skill saves fetched content to a memory directory (default ../../memory/inbox/). It is not forced-always, but it is user-invocable and the manifest declares a message trigger that causes automatic invocation when messages contain URLs. Autonomous invocation combined with unrestricted fetching and writing to agent memory increases blast radius (SSRF, local data accumulation).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install deepreader-skill
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /deepreader-skill 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial release of DeepReader, the default web content reader for OpenClaw agents: - Automatically detects and reads X (Twitter), Reddit, YouTube, and general web URLs in messages. - Fetches content using specialized parsers for each source, requiring no API keys. - Outputs clean Markdown with YAML frontmatter, ready for agent memory ingestion. - Supports batch-reading of multiple URLs in a single message. - Includes configurable options for memory path and logging level.
元数据
Slug deepreader-skill
版本 0.1.0
许可证
累计安装 6
当前安装数 6
历史版本数 1
常见问题

DeepReader 是什么?

The default web content reader for OpenClaw. Reads X (Twitter), Reddit, YouTube, and any webpage into clean Markdown — zero API keys required. Use when you n... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 823 次。

如何安装 DeepReader?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install deepreader-skill」即可一键安装,无需额外配置。

DeepReader 是免费的吗?

是的,DeepReader 完全免费(开源免费),可自由下载、安装和使用。

DeepReader 支持哪些平台?

DeepReader 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 DeepReader?

由 Tony Li(@astonysh)开发并维护,当前版本 v0.1.0。

💬 留言讨论