← 返回 Skills 市场
kirkraman

deep-scraper

作者 KirkRaman · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
67
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install kirk-deep-scraper
功能描述
A Docker-based tool using Crawlee and Playwright to deeply scrape complex sites like YouTube, extracting verified raw transcripts or descriptions with ads re...
使用说明 (SKILL.md)

Skill: deep-scraper

Overview

A high-performance engineering tool for deep web scraping. It uses a containerized Docker + Crawlee (Playwright) environment to penetrate protections on complex websites like YouTube and X/Twitter, providing "interception-level" raw data.

Requirements

  1. Docker: Must be installed and running on the host machine.
  2. Image: Build the environment with the tag skillboss-crawlee.
    • Build command: docker build -t skillboss-crawlee skills/deep-scraper/

Integration Guide

Simply copy the skills/deep-scraper directory into your skills/ folder. Ensure the Dockerfile remains within the skill directory for self-contained deployment.

Standard Interface (CLI)

docker run -t --rm -v $(pwd)/skills/deep-scraper/assets:/usr/src/app/assets skillboss-crawlee node assets/main_handler.js [TARGET_URL]

Output Specification (JSON)

The scraping results are printed to stdout as a JSON string:

  • status: SUCCESS | PARTIAL | ERROR
  • type: TRANSCRIPT | DESCRIPTION | GENERIC
  • videoId: (For YouTube) The validated Video ID.
  • data: The core text content or transcript.

Core Rules

  1. ID Validation: All YouTube tasks MUST verify the Video ID to prevent cache contamination.
  2. Privacy: Strictly forbidden from scraping password-protected or non-public personal information.
  3. Alpha-Focused: Automatically strips ads and noise, delivering pure data optimized for LLM processing.
安全使用建议
Do not run this skill as-is. Key issues to resolve before installing: (1) The SKILL.md requires you to build a Docker image but no Dockerfile is included — ask the publisher for the Dockerfile or a verified image source. (2) The registry metadata omits Docker as a required binary even though the skill depends on it — confirm system requirements. (3) Running the scraper requires building/running a container; avoid mounting sensitive host directories into the container and inspect any Dockerfile or image build steps for unexpected commands or external downloads. (4) The code intercepts network traffic in-browser to fetch API endpoints — this is normal for this use-case but could capture tokens or private content if used against authenticated pages; only run against public pages you control or trust. If the publisher cannot provide a Dockerfile or a trusted release image, treat the package as untrusted and do not run it on sensitive hosts.
功能分析
Type: OpenClaw Skill Name: kirk-deep-scraper Version: 1.0.0 The kirk-deep-scraper skill is a legitimate web scraping tool designed to extract transcripts and content from YouTube and dynamic websites using Playwright and Crawlee. The implementation in assets/main_handler.js and assets/youtube_handler.js uses standard network interception and UI automation to retrieve public data, with results outputted to stdout. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.
能力评估
Purpose & Capability
The skill claims a Dockerized Crawlee+Playwright scraper for sites like YouTube and the code (main_handler.js / youtube_handler.js) implements that behavior. However, the registry metadata at the top lists no required binaries while SKILL.md and package.json explicitly require Docker. SKILL.md also instructs building an image tagged skillboss-crawlee, but no Dockerfile is present in the file manifest — this mismatch is a strong coherence issue.
Instruction Scope
Runtime instructions explicitly require building and running a Docker image and mounting local skill assets, and they describe network interception of requests to capture YouTube timedtext APIs. The code performs network-level interception and fetches intercepted API URLs from the page. Those actions are consistent with the stated scraping purpose, but the instructions promise a Dockerfile to remain in the directory while the manifest does not include one. The SKILL.md also instructs copying the skill directory into a host skills/ folder and mounting assets — this grants the container read access to whatever is mounted and could expose unintended host data if users mount different paths.
Install Mechanism
There is no formal install spec. SKILL.md expects you to docker build a local Dockerfile, but the repository snapshot lacks a Dockerfile. Because no image source is provided, the user would have to create their own Dockerfile or run unknown build steps — a risky manual step. package.json lists dependencies (crawlee, playwright) but without a Dockerfile or explicit install instructions, it's unclear how the runtime environment will be created. This gap increases the chance a user will follow unsafe ad-hoc build/run steps.
Credentials
The skill declares no required environment variables or credentials, which is consistent with its scraping-only purpose. The code clears cookies and interacts with page context; that is expected. However, running arbitrary scraping containers can still expose sensitive host data if users mount inappropriate paths, and intercepted network traffic could include private tokens if the page is authenticated — the SKILL.md forbids scraping protected data but cannot enforce it.
Persistence & Privilege
The skill is not set always:true and does not request elevated platform privileges in the manifest. It appears to be user-invocable only, which is proportionate for a scraper tool.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install kirk-deep-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /kirk-deep-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of deep-scraper skill—containerized deep web scraper for high-resilience data extraction. - Leverages Docker and Crawlee (Playwright) for robust scraping on complex sites (YouTube, X/Twitter) - Command-line interface for easy integration and usage - Outputs standardized JSON with validation and content-type indicators - Built-in rules for ID validation, privacy, and ad/noise filtering
元数据
Slug kirk-deep-scraper
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

deep-scraper 是什么?

A Docker-based tool using Crawlee and Playwright to deeply scrape complex sites like YouTube, extracting verified raw transcripts or descriptions with ads re... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 67 次。

如何安装 deep-scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install kirk-deep-scraper」即可一键安装,无需额外配置。

deep-scraper 是免费的吗?

是的,deep-scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

deep-scraper 支持哪些平台?

deep-scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 deep-scraper?

由 KirkRaman(@kirkraman)开发并维护,当前版本 v1.0.0。

💬 留言讨论