← Back to Skills Marketplace

deep-scraper

Name: deep-scraper
Author: kirkraman

by KirkRaman · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install kirk-deep-scraper

Description

A Docker-based tool using Crawlee and Playwright to deeply scrape complex sites like YouTube, extracting verified raw transcripts or descriptions with ads re...

README (SKILL.md)

Skill: deep-scraper

Overview

A high-performance engineering tool for deep web scraping. It uses a containerized Docker + Crawlee (Playwright) environment to penetrate protections on complex websites like YouTube and X/Twitter, providing "interception-level" raw data.

Requirements

Docker: Must be installed and running on the host machine.
Image: Build the environment with the tag skillboss-crawlee.
- Build command: docker build -t skillboss-crawlee skills/deep-scraper/

Integration Guide

Simply copy the skills/deep-scraper directory into your skills/ folder. Ensure the Dockerfile remains within the skill directory for self-contained deployment.

Standard Interface (CLI)

docker run -t --rm -v $(pwd)/skills/deep-scraper/assets:/usr/src/app/assets skillboss-crawlee node assets/main_handler.js [TARGET_URL]

Output Specification (JSON)

The scraping results are printed to stdout as a JSON string:

status: SUCCESS | PARTIAL | ERROR
type: TRANSCRIPT | DESCRIPTION | GENERIC
videoId: (For YouTube) The validated Video ID.
data: The core text content or transcript.

Core Rules

ID Validation: All YouTube tasks MUST verify the Video ID to prevent cache contamination.
Privacy: Strictly forbidden from scraping password-protected or non-public personal information.
Alpha-Focused: Automatically strips ads and noise, delivering pure data optimized for LLM processing.

Usage Guidance

Do not run this skill as-is. Key issues to resolve before installing: (1) The SKILL.md requires you to build a Docker image but no Dockerfile is included — ask the publisher for the Dockerfile or a verified image source. (2) The registry metadata omits Docker as a required binary even though the skill depends on it — confirm system requirements. (3) Running the scraper requires building/running a container; avoid mounting sensitive host directories into the container and inspect any Dockerfile or image build steps for unexpected commands or external downloads. (4) The code intercepts network traffic in-browser to fetch API endpoints — this is normal for this use-case but could capture tokens or private content if used against authenticated pages; only run against public pages you control or trust. If the publisher cannot provide a Dockerfile or a trusted release image, treat the package as untrusted and do not run it on sensitive hosts.

Capability Analysis

Type: OpenClaw Skill Name: kirk-deep-scraper Version: 1.0.0 The kirk-deep-scraper skill is a legitimate web scraping tool designed to extract transcripts and content from YouTube and dynamic websites using Playwright and Crawlee. The implementation in assets/main_handler.js and assets/youtube_handler.js uses standard network interception and UI automation to retrieve public data, with results outputted to stdout. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.

Capability Assessment

⚠ Purpose & Capability

The skill claims a Dockerized Crawlee+Playwright scraper for sites like YouTube and the code (main_handler.js / youtube_handler.js) implements that behavior. However, the registry metadata at the top lists no required binaries while SKILL.md and package.json explicitly require Docker. SKILL.md also instructs building an image tagged skillboss-crawlee, but no Dockerfile is present in the file manifest — this mismatch is a strong coherence issue.

⚠ Instruction Scope

Runtime instructions explicitly require building and running a Docker image and mounting local skill assets, and they describe network interception of requests to capture YouTube timedtext APIs. The code performs network-level interception and fetches intercepted API URLs from the page. Those actions are consistent with the stated scraping purpose, but the instructions promise a Dockerfile to remain in the directory while the manifest does not include one. The SKILL.md also instructs copying the skill directory into a host skills/ folder and mounting assets — this grants the container read access to whatever is mounted and could expose unintended host data if users mount different paths.

⚠ Install Mechanism

There is no formal install spec. SKILL.md expects you to docker build a local Dockerfile, but the repository snapshot lacks a Dockerfile. Because no image source is provided, the user would have to create their own Dockerfile or run unknown build steps — a risky manual step. package.json lists dependencies (crawlee, playwright) but without a Dockerfile or explicit install instructions, it's unclear how the runtime environment will be created. This gap increases the chance a user will follow unsafe ad-hoc build/run steps.

ℹ Credentials

The skill declares no required environment variables or credentials, which is consistent with its scraping-only purpose. The code clears cookies and interacts with page context; that is expected. However, running arbitrary scraping containers can still expose sensitive host data if users mount inappropriate paths, and intercepted network traffic could include private tokens if the page is authenticated — the SKILL.md forbids scraping protected data but cannot enforce it.

✓ Persistence & Privilege

The skill is not set always:true and does not request elevated platform privileges in the manifest. It appears to be user-invocable only, which is proportionate for a scraper tool.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install kirk-deep-scraper
After installation, invoke the skill by name or use /kirk-deep-scraper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of deep-scraper skill—containerized deep web scraper for high-resilience data extraction. - Leverages Docker and Crawlee (Playwright) for robust scraping on complex sites (YouTube, X/Twitter) - Command-line interface for easy integration and usage - Outputs standardized JSON with validation and content-type indicators - Built-in rules for ID validation, privacy, and ad/noise filtering

Metadata

Slug kirk-deep-scraper

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is deep-scraper?

A Docker-based tool using Crawlee and Playwright to deeply scrape complex sites like YouTube, extracting verified raw transcripts or descriptions with ads re... It is an AI Agent Skill for Claude Code / OpenClaw, with 67 downloads so far.

How do I install deep-scraper?

Run "/install kirk-deep-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is deep-scraper free?

Yes, deep-scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does deep-scraper support?

deep-scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created deep-scraper?

It is built and maintained by KirkRaman (@kirkraman); the current version is v1.0.0.

More Skills