← 返回 Skills 市场
marjoriebroad

deep-scraper

作者 MarjorieBroad · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
72
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install abe-deep-scraper
功能描述
Performs deep web scraping using a Docker-based Crawlee environment to extract validated, ad-free raw data from complex sites like YouTube and X/Twitter.
使用说明 (SKILL.md)

Skill: deep-scraper

Overview

A high-performance engineering tool for deep web scraping. It uses a containerized Docker + Crawlee (Playwright) environment to penetrate protections on complex websites like YouTube and X/Twitter, providing "interception-level" raw data.

Requirements

  1. Docker: Must be installed and running on the host machine.
  2. Image: Build the environment with the tag skillboss-crawlee.
    • Build command: docker build -t skillboss-crawlee skills/deep-scraper/

Integration Guide

Simply copy the skills/deep-scraper directory into your skills/ folder. Ensure the Dockerfile remains within the skill directory for self-contained deployment.

Standard Interface (CLI)

docker run -t --rm -v $(pwd)/skills/deep-scraper/assets:/usr/src/app/assets skillboss-crawlee node assets/main_handler.js [TARGET_URL]

Output Specification (JSON)

The scraping results are printed to stdout as a JSON string:

  • status: SUCCESS | PARTIAL | ERROR
  • type: TRANSCRIPT | DESCRIPTION | GENERIC
  • videoId: (For YouTube) The validated Video ID.
  • data: The core text content or transcript.

Core Rules

  1. ID Validation: All YouTube tasks MUST verify the Video ID to prevent cache contamination.
  2. Privacy: Strictly forbidden from scraping password-protected or non-public personal information.
  3. Alpha-Focused: Automatically strips ads and noise, delivering pure data optimized for LLM processing.
安全使用建议
Do not run this on production hosts or with privileged access yet. Key concerns: (1) The README and package.json expect a Docker image and a Dockerfile, but no Dockerfile is included — ask the publisher for the Dockerfile and confirm its contents before building. (2) The description claims X/Twitter support, but the shipped code only implements YouTube/generic scraping; ask for clarification or updated code if you need X/Twitter. (3) Building and running Docker images from unknown sources can execute arbitrary code on your host — inspect the Dockerfile and image contents (or run it in an isolated sandbox/VM) before use. (4) The tool intentionally clears cookies and intercepts network requests to fetch transcripts; this behavior can bypass site protections and may violate website terms of service. If you proceed, run in an isolated environment, review the missing Dockerfile, and verify that the image only contains the expected Node dependencies and scripts.
能力评估
Purpose & Capability
The description promises 'deep' scraping for YouTube and X/Twitter and a Dockerized Crawlee environment. The actual code implements YouTube-focused scraping only (two handlers both target YouTube or generic pages) — there is no X/Twitter-specific logic. The SKILL.md and package.json state Docker is required, but the skill manifest earlier lists no required binaries; additionally the SKILL.md instructs keeping a Dockerfile in the skill directory, yet no Dockerfile is present in the provided file manifest. These mismatches suggest the published metadata and the shipped files are out of sync.
Instruction Scope
SKILL.md instructs building and running a Docker image, copying the skill directory into a host 'skills/' folder, and running the node handlers inside the container. The runtime steps and the code stay within scraping behavior (clearing cookies, simulating UI actions, intercepting network requests, and printing JSON to stdout). The instructions do not ask for unrelated system credentials or to exfiltrate data to third-party endpoints. Still, the guidance to 'penetrate protections' and the UI/network-interception behavior can be used to bypass site protections — that's consistent with the stated scraping purpose but has legal/TOS implications the user should consider.
Install Mechanism
This is instruction-only with included Node files and a package.json (no install spec). SKILL.md requires building a Docker image from the skill directory (docker build -t skillboss-crawlee skills/deep-scraper/), but no Dockerfile is present in the listed files. Running an image built from an absent or unknown Dockerfile is impossible as-is; if a Dockerfile is added by the publisher later, building and running arbitrary Docker images from an unknown source is higher risk. Dependencies (crawlee, playwright) are expected for the described functionality but are heavy; the absence of an explicit, included Dockerfile is the primary install risk.
Credentials
The skill requests no environment variables, no credentials, and no config paths. The code does not read env vars or secret files. Output is written to stdout only. From a credential-scope viewpoint, the skill is proportionate to its scraping purpose.
Persistence & Privilege
The skill does not request persistent 'always' inclusion and does not modify other skills or system settings. It runs as a containerized task per the instructions; autonomous invocation is allowed by default but not combined with other high-risk privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install abe-deep-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /abe-deep-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of deep-scraper skill - Introduces a Docker + Crawlee (Playwright) environment for robust deep web scraping on complex sites (e.g., YouTube, X/Twitter). - Provides a standard CLI interface for running scraping tasks and outputs structured JSON. - Enforces YouTube Video ID validation to ensure data integrity. - Designed for privacy: avoids scraping any non-public or password-protected data. - Automatically cleans output for clarity and LLM readiness.
元数据
Slug abe-deep-scraper
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

deep-scraper 是什么?

Performs deep web scraping using a Docker-based Crawlee environment to extract validated, ad-free raw data from complex sites like YouTube and X/Twitter. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 72 次。

如何安装 deep-scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install abe-deep-scraper」即可一键安装,无需额外配置。

deep-scraper 是免费的吗?

是的,deep-scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

deep-scraper 支持哪些平台?

deep-scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 deep-scraper?

由 MarjorieBroad(@marjoriebroad)开发并维护,当前版本 v1.0.0。

💬 留言讨论