Url Reader

Name: Url Reader
Author: justao

功能描述

智能读取任意URL内容，支持微信公众号、小红书、今日头条、抖音、淘宝、天猫、京东、百度等中国主流平台，自动识别平台类型并提取核心内容。自动保存内容为Markdown，下载图片到本地。

安全使用建议

What to consider before installing: - The skill does what it claims (scrape many Chinese platforms and save Markdown/images) but is sloppy about metadata: it does require an optional FIRECRAWL_API_KEY (documented in SKILL.md) and Playwright, even though the registry lists no env vars. Treat the Firecrawl API key as sensitive. - The skill will write files to disk: it uses a hard-coded default save directory (/Users/ys/laoyang知识库/nickys/素材). Edit the DEFAULT_OUTPUT_DIR in scripts/save_content.py and url_reader.py before use to point to a directory you control, or run the scripts from a confined/sandboxed environment. - For sites that require login (WeChat, Taobao, etc.), the skill uses Playwright to launch a browser and will save browser storage_state (cookies/session tokens) to data/wechat_auth.json inside the skill directory. Those files contain authentication data — review and store them securely, or avoid using the Playwright login flow if you don't want to persist credentials. - Installing Playwright will download Chromium binaries (playwright install chromium). Only proceed if you are comfortable with that and run the install in a controlled environment (virtualenv, container, or VM). - The skill contacts external services: Firecrawl (requires API key, paid tiers) and r.jina.ai (free). If you do not trust Firecrawl, do not set its API key; the skill will fallback to Jina/Playwright but with degraded behavior. - Because the repository owner and homepage are unknown, exercise extra caution. Recommended steps before installing: (1) edit DEFAULT_OUTPUT_DIR to a safe location, (2) confirm or remove automatic saving of wechat_auth.json if you do not want local credential persistence, (3) only provide FIRECRAWL_API_KEY if you trust the service and understand billing, and (4) run the tool inside a sandbox (container or VM) until you are comfortable with its behavior. If the author updated the package metadata to declare FIRECRAWL_API_KEY as an optional required env var, and replaced the hard-coded output path with a configurable default or documented prompt, my assessment would move toward 'benign'.

功能分析

Type: OpenClaw Skill Name: url-reader Version: 0.1.1 The skill is designed to read and save content from arbitrary URLs, which inherently involves network requests and local file system writes. It is classified as 'suspicious' due to several vulnerabilities rather than clear malicious intent. Key indicators include a hardcoded output directory (`/Users/ys/laoyang知识库/nickys/素材`) in `skill.md`, `scripts/save_content.py`, and `scripts/url_reader.py`, which could lead to unintended file writes. Furthermore, `scripts/save_content.py` and `scripts/url_reader.py` download images from arbitrary URLs and save them to disk without robust content validation, posing a risk of downloading malicious files. The use of Playwright's `page.evaluate()` in `scripts/url_reader.py`, `scripts/wechat_reader.py`, and `scripts/wechat_reader_v2.py` to execute JavaScript in a browser context could also be vulnerable to client-side injection if a malicious URL is processed. While these are significant risks, there is no evidence of intentional data exfiltration, backdoor installation, or harmful prompt injection.

能力评估

ℹ Purpose & Capability

The skill's declared purpose (read arbitrary URLs, extract core content, save Markdown and images) matches the included scripts: URL identification, Firecrawl/Jina/Playwright readers, and save_content. However the registry metadata claims no required environment variables or credentials while the code and SKILL.md document an optional FIRECRAWL_API_KEY and Playwright login-state use; that's an inconsistency. The code also requires installing Playwright/Chromium and optionally a Firecrawl client library — reasonable for the stated features but not reflected in the registry 'requires' section.

⚠ Instruction Scope

Runtime instructions and scripts direct the agent to: (1) call external reader services (Firecrawl, r.jina.ai), (2) launch Playwright/Chromium, (3) prompt the user to log in via a browser and save storage_state to data/wechat_auth.json, and (4) automatically save Markdown and download images to disk. These actions go beyond simple read-only queries: they persist authentication tokens (login state) and write files to disk using a hard-coded default path (/Users/ys/laoyang知识库/nickys/素材) shown in multiple places. The SKILL.md also instructs setting FIRECRAWL_API_KEY, which is not declared as a required env var in metadata.

ℹ Install Mechanism

There is no formal install spec, but SKILL.md instructs users to create a Python venv and pip install packages including 'firecrawl-py', 'requests' and 'playwright' and to run 'playwright install chromium' (which downloads browser binaries). This will write binaries and files to disk. The install sources are public package installs (pip) and Playwright's download — moderate risk and expected for this functionality; there are no unknown URL shorteners or arbitrary archive downloads in the install instructions.

⚠ Credentials

Registry metadata lists no required env vars, yet code and docs use FIRECRAWL_API_KEY (FIRECRAWL_API_KEY) for the preferred Firecrawl strategy. The skill also creates and stores Playwright 'storage_state' (WeChat login tokens) under the skill's data directory, which are sensitive credentials. The number and type of secrets (API key + browser auth state) are proportionate to the feature set, but the omission from the declared requirements and the automatic local persistence of login state are concerning and should be made explicit to the user.

ℹ Persistence & Privilege

The skill does not request 'always: true' and does not change other skills' configs. However it persists data to disk in two places: a hard-coded default output directory in the author's home path and a local data/wechat_auth.json for saved browser auth. Persisting auth tokens is normal for a reader that needs logged-in sessions, but the hard-coded user-specific path and lack of an opt-out or configurable default is problematic. Autonomous invocation is allowed (default), which combined with network access and file writes increases blast radius but is expected for this kind of skill.

版本历史

v0.1.1

Initial release of core scripts and structure. - Added main program files for reading and extracting URL content, including platform identification and multi-strategy reading logic. - Implemented scripts for WeChat content extraction and multiple platform support. - Enabled automatic content and image saving as Markdown and local files. - Included documentation (README.md) and metadata for basic usage and setup instructions.

v0.1.0

Initial release of url-reader – smart URL content extractor for Chinese platforms: - Supports WeChat Official Accounts, Xiaohongshu, Toutiao, Douyin, Taobao, Tmall, JD.com, Baidu, and more. - Automatically detects platform and applies a three-layer extraction strategy: Firecrawl API (preferred), Jina Reader (free fallback), and Playwright automation (for login-required or complex sites). - Core content (title, body, author, date, interaction data) is extracted and saved as Markdown; images are downloaded locally. - Clearly defined output and folder structure for easy organization. - Supports command line and conversational usage; guides for API key and login session configuration included.

元数据

Slug url-reader

版本 0.1.1

许可证 —

累计安装 18

当前安装数 16

历史版本数 2

常见问题

Url Reader 是什么？

智能读取任意URL内容，支持微信公众号、小红书、今日头条、抖音、淘宝、天猫、京东、百度等中国主流平台，自动识别平台类型并提取核心内容。自动保存内容为Markdown，下载图片到本地。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1358 次。

如何安装 Url Reader？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install url-reader」即可一键安装，无需额外配置。

Url Reader 是免费的吗？

是的，Url Reader 完全免费（开源免费），可自由下载、安装和使用。

Url Reader 支持哪些平台？

Url Reader 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Url Reader？

由 justao（@justao）开发并维护，当前版本 v0.1.1。

Url Reader 是什么？

如何安装 Url Reader？

Url Reader 是免费的吗？

Url Reader 支持哪些平台？

谁开发了 Url Reader？

💬 留言讨论