← Back to Skills Marketplace

deep-scraper

Name: deep-scraper
Author: marjoriebroad

by MarjorieBroad · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install abe-deep-scraper

Description

Performs deep web scraping using a Docker-based Crawlee environment to extract validated, ad-free raw data from complex sites like YouTube and X/Twitter.

README (SKILL.md)

Skill: deep-scraper

Overview

A high-performance engineering tool for deep web scraping. It uses a containerized Docker + Crawlee (Playwright) environment to penetrate protections on complex websites like YouTube and X/Twitter, providing "interception-level" raw data.

Requirements

Docker: Must be installed and running on the host machine.
Image: Build the environment with the tag skillboss-crawlee.
- Build command: docker build -t skillboss-crawlee skills/deep-scraper/

Integration Guide

Simply copy the skills/deep-scraper directory into your skills/ folder. Ensure the Dockerfile remains within the skill directory for self-contained deployment.

Standard Interface (CLI)

docker run -t --rm -v $(pwd)/skills/deep-scraper/assets:/usr/src/app/assets skillboss-crawlee node assets/main_handler.js [TARGET_URL]

Output Specification (JSON)

The scraping results are printed to stdout as a JSON string:

status: SUCCESS | PARTIAL | ERROR
type: TRANSCRIPT | DESCRIPTION | GENERIC
videoId: (For YouTube) The validated Video ID.
data: The core text content or transcript.

Core Rules

ID Validation: All YouTube tasks MUST verify the Video ID to prevent cache contamination.
Privacy: Strictly forbidden from scraping password-protected or non-public personal information.
Alpha-Focused: Automatically strips ads and noise, delivering pure data optimized for LLM processing.

Usage Guidance

Do not run this on production hosts or with privileged access yet. Key concerns: (1) The README and package.json expect a Docker image and a Dockerfile, but no Dockerfile is included — ask the publisher for the Dockerfile and confirm its contents before building. (2) The description claims X/Twitter support, but the shipped code only implements YouTube/generic scraping; ask for clarification or updated code if you need X/Twitter. (3) Building and running Docker images from unknown sources can execute arbitrary code on your host — inspect the Dockerfile and image contents (or run it in an isolated sandbox/VM) before use. (4) The tool intentionally clears cookies and intercepts network requests to fetch transcripts; this behavior can bypass site protections and may violate website terms of service. If you proceed, run in an isolated environment, review the missing Dockerfile, and verify that the image only contains the expected Node dependencies and scripts.

Capability Assessment

⚠ Purpose & Capability

The description promises 'deep' scraping for YouTube and X/Twitter and a Dockerized Crawlee environment. The actual code implements YouTube-focused scraping only (two handlers both target YouTube or generic pages) — there is no X/Twitter-specific logic. The SKILL.md and package.json state Docker is required, but the skill manifest earlier lists no required binaries; additionally the SKILL.md instructs keeping a Dockerfile in the skill directory, yet no Dockerfile is present in the provided file manifest. These mismatches suggest the published metadata and the shipped files are out of sync.

ℹ Instruction Scope

SKILL.md instructs building and running a Docker image, copying the skill directory into a host 'skills/' folder, and running the node handlers inside the container. The runtime steps and the code stay within scraping behavior (clearing cookies, simulating UI actions, intercepting network requests, and printing JSON to stdout). The instructions do not ask for unrelated system credentials or to exfiltrate data to third-party endpoints. Still, the guidance to 'penetrate protections' and the UI/network-interception behavior can be used to bypass site protections — that's consistent with the stated scraping purpose but has legal/TOS implications the user should consider.

⚠ Install Mechanism

This is instruction-only with included Node files and a package.json (no install spec). SKILL.md requires building a Docker image from the skill directory (docker build -t skillboss-crawlee skills/deep-scraper/), but no Dockerfile is present in the listed files. Running an image built from an absent or unknown Dockerfile is impossible as-is; if a Dockerfile is added by the publisher later, building and running arbitrary Docker images from an unknown source is higher risk. Dependencies (crawlee, playwright) are expected for the described functionality but are heavy; the absence of an explicit, included Dockerfile is the primary install risk.

✓ Credentials

The skill requests no environment variables, no credentials, and no config paths. The code does not read env vars or secret files. Output is written to stdout only. From a credential-scope viewpoint, the skill is proportionate to its scraping purpose.

✓ Persistence & Privilege

The skill does not request persistent 'always' inclusion and does not modify other skills or system settings. It runs as a containerized task per the instructions; autonomous invocation is allowed by default but not combined with other high-risk privileges.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install abe-deep-scraper
After installation, invoke the skill by name or use /abe-deep-scraper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of deep-scraper skill - Introduces a Docker + Crawlee (Playwright) environment for robust deep web scraping on complex sites (e.g., YouTube, X/Twitter). - Provides a standard CLI interface for running scraping tasks and outputs structured JSON. - Enforces YouTube Video ID validation to ensure data integrity. - Designed for privacy: avoids scraping any non-public or password-protected data. - Automatically cleans output for clarity and LLM readiness.

Metadata

Slug abe-deep-scraper

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is deep-scraper?

Performs deep web scraping using a Docker-based Crawlee environment to extract validated, ad-free raw data from complex sites like YouTube and X/Twitter. It is an AI Agent Skill for Claude Code / OpenClaw, with 72 downloads so far.

How do I install deep-scraper?

Run "/install abe-deep-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is deep-scraper free?

Yes, deep-scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does deep-scraper support?

deep-scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created deep-scraper?

It is built and maintained by MarjorieBroad (@marjoriebroad); the current version is v1.0.0.

More Skills