Content Catcher

Name: Content Catcher
Author: luis1213899

Description

虾抓抓(xia-zhua-zhua) v4.0 - 超强内容抓取技能支持：Markdown/PDF/多模态提取/结构化抽取/翻译/视频下载触发词：抓取网页、网页转Markdown、内容抓取、虾抓抓、视频下载

Usage Guidance

This package bundles a capable scraper/downloader and also browser-extension code that can modify request headers, read page DOM, and write files on your machine. It has several inconsistencies: SKILL.md mentions dependencies (Playwright, yt-dlp, weasyprint) but the registry lists none; instructions reference files/scripts and paths that don't line up with the manifest; extension code expects Chrome extension permissions (declarativeNetRequest, downloads, runtime) but there's no install guide or permission disclosure. Before installing or running: 1) Don't run scripts as your primary account — use an isolated VM or container. 2) Inspect any places that set remote URLs or mitm endpoints (e.g., streamSaver/ffmpeg endpoints) to ensure they point to trusted services. 3) If you plan to use the extension pieces, review and audit required browser permissions and the extension manifest (not included) — granting declarativeNetRequest and downloads allows header/cookie injection and arbitrary file downloads. 4) Verify missing files and paths referenced in SKILL.md (e.g., content-watcher.js) and ensure you have trustworthy provenance (homepage/author). 5) If you lack the ability to audit, avoid installing or run only in a sandboxed environment. Additional information that would raise confidence: a clear install script, an extension manifest showing requested permissions, a homepage/repo with commits, and explicit justification for header-modification behavior.

Capability Analysis

Type: OpenClaw Skill Name: xia-zhua-zhua Version: 4.0.1 The bundle is a comprehensive web scraping and video downloading tool that includes a ported version of the 'Cat Catch' browser extension core. It contains high-risk capabilities, such as the 'send2local' function in 'cat-catch-core/function.js', which allows sending data to arbitrary URLs, and the use of 'execSync' to run Python scripts from Node.js. Furthermore, 'video_catcher_pro.py' contains hardcoded Windows file paths (e.g., 'C:\Users\26240\workspace\video-downloads') from the developer's environment, indicating a lack of sanitization. While these features support the stated functionality, the combination of broad network access, shell execution, and hardcoded environment leaks warrants a suspicious classification.

Capability Assessment

⚠ Purpose & Capability

The SKILL.md describes a Node/Python-based web scraper + video downloader that expects Playwright, yt-dlp, weasyprint, etc., but the registry metadata lists no required binaries or environment. The bundle includes browser-extension style files (chrome.* APIs, declarativeNetRequest) alongside CLI scripts — it's unclear which parts are expected to run where. The skill reads/writes local paths (e.g., ~/.clips, Desktop) and uses request-header modification APIs; these capabilities are coherent with a scraping/downloader tool but are broader than the declared requirements and lack clear installation instructions or permission disclosures.

⚠ Instruction Scope

SKILL.md runtime instructions instruct running node and python scripts to fetch pages, extract media, export PDF, and download video. The instructions reference additional scripts (e.g., content-watcher.js, content-watcher folder paths, video_catcher/ path) that are not present or whose locations do not match the provided manifest — this is inconsistent. The code will access local files (e.g., ~/.clips/clips.json), write output to user Desktop, and the extension code can read page DOM, enumerate media, and post messages to background pages. Those behaviors go beyond simple 'convert a webpage to markdown' (they include network request header modification, streaming to ffmpeg, and download management).

ℹ Install Mechanism

There is no install spec in the registry (no package manager install/download instruction), but a full set of scripts and browser-extension-like files are included in the skill bundle. That mismatch is notable: the skill supplies code that will be present in the agent environment, but there is no documented install step, dependency installation, or manifest for the extension pieces. This is not direct remote code download, but running the included code will execute substantial functionality on the host.

⚠ Credentials

The skill declares no required environment variables or credentials, which superficially looks safe. However, the code accesses and writes local files (e.g., ~/.clips/clips.json, Desktop), manipulates browser network rules via chrome.declarativeNetRequest.updateSessionRules (ability to inject/modify request headers and cookies), and can stream data to ffmpeg or remote stream-saver mitm endpoints configured at runtime (G.streamSaverConfig.url). Those actions require sensitive permissions in a browser context and access to local filesystem/network resources; the SKILL.md does not call out or justify these privileges explicitly.

⚠ Persistence & Privilege

The skill is not marked always:true, and it requests no declared persistent credentials, but the included extension-like code is written to interact with chrome.runtime, declarativeNetRequest session rules, chrome.downloads, and localStorage. If installed into a browser extension context, it would require elevated permissions to modify headers and downloads and could persist state (clip logs, keys). The package provides no clear boundary on whether these components are meant to be installed as a browser extension or just run as scripts — that ambiguity increases risk.

Version History

v4.0.1

- Skill renamed to "xia-zhua-zhua" and branding updated throughout documentation. - File structure expanded with 12 new files added, including key modules like analyzer.py and smart-extract.py. - Core features remain the same; primarily documentation, naming, and structure adjustments. - Removed "content-catcher" references to clarify and focus on the "虾抓抓(xia-zhua-zhua)" identity. - All usage examples, descriptions, and trigger words updated for consistency with the new name.

v4.0.0

Major upgrade: xia-zhua-zhua is now Content Catcher v4.0 with expanded features beyond web-to-Markdown, including media and video extraction. - Adds multi-modal content extraction (images, audio, video) - Adds PDF export for grabbed content - Adds structured data extraction (tables, lists) - Adds incremental monitoring and update notifications for web pages - Integrates translation functionality for extracted content - Expands file structure and modernizes commands; removes legacy scripts and dependencies

v2.1.3

- 新增自动分析功能：抓取网页后可自动生成摘要、关键词、关键洞察等内容分析（--analyze 参数）。 - 新增 analyzer.py 脚本，支持对 Markdown 内容进行机器学习文本分析。 - 使用说明更新，详细列出标准模式、Smart 模式和分析模式操作方法。 - 依赖说明新增 scikit-learn（分析功能，选装）。 - 反爬措施与配置日志文档进一步简化与更新。

v2.1.2

xia-zhua-zhua v2.1.2 - 更新 SKILL.md 与程序文件，版本号提升至 2.1.2 - 文档保持用法不变，仅修正和统一版本信息 - package.json 等文件同步更新为 2.1.2 - 无新增功能或破坏性变动

v2.1.1

xia-zhua-zhua v2.1.1 - Internal code updates made in batch-clip.js, clip-lib.js, and markdown-clip.js for improved stability. - No changes to user-facing features, documentation, or usage instructions. - Version bump from 2.1.0 to 2.1.1 to reflect minor technical adjustments.

v2.1.0

xia-zhua-zhua v2.1.0 introduces Smart模式 (Smart Mode) for more accurate content extraction: - 新增「Smart 模式」选项，自动识别网页正文，无需依赖固定的 CSS 选择器。 - 集成 Readability 算法，适用于未预设抓取规则的陌生网站。 - 新增 smart-extract.py（需 Python+readability-lxml），支持「--smart」命令行参数。 - 用户可选择标准模式和 Smart 模式，兼容原有操作方式与批量抓取。 - 依赖说明与使用方式已更新，Smart 模式只需额外安装一个 Python 库。

v2.0.0

xia-zhua-zhua v2.0.0 introduces clip log, duplicate URL checking, WeChat article extraction, and configuration management. - 新增 Clip Log，自动记录所有抓取历史，防止重复抓取 - 支持 URL 去重，已抓过不会重复保存，支持 --force 覆盖 - 增强对微信公众号（WeChat）文章的专用内容/作者提取 - 加入配置文件（~/.clips/config.json），可自定义输出目录等选项 - 批量模式下自动跳过重复 URL - 新增命令支持：查看配置、历史，修改输出目录

v1.0.1

- Internal updates to batch-clip.js and markdown-clip.js. - No user-facing changes or new features.

v1.0.0

首次发布：网页转Markdown，支持批量并发抓取

Metadata

Slug xia-zhua-zhua

Version 4.0.1

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 9

Frequently Asked Questions

What is Content Catcher?

虾抓抓(xia-zhua-zhua) v4.0 - 超强内容抓取技能支持：Markdown/PDF/多模态提取/结构化抽取/翻译/视频下载触发词：抓取网页、网页转Markdown、内容抓取、虾抓抓、视频下载. It is an AI Agent Skill for Claude Code / OpenClaw, with 176 downloads so far.

How do I install Content Catcher?

Run "/install xia-zhua-zhua" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Content Catcher free?

Yes, Content Catcher is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Content Catcher support?

Content Catcher is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Content Catcher?

It is built and maintained by luis1213899 (@luis1213899); the current version is v4.0.1.

More Skills

What is Content Catcher?

How do I install Content Catcher?

Is Content Catcher free?

Which platforms does Content Catcher support?

Who created Content Catcher?

💬 Comments