← 返回 Skills 市场

FlowCrawl — Stealth Web Scraper That Bypasses Everything

Name: FlowCrawl — Stealth Web Scraper That Bypasses Everything
Author: windseeker1111

作者 windseeker1111 · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ⚠ suspicious

419

总下载

当前安装

版本数

在 OpenClaw 中安装

/install flowcrawl

功能描述

Stealth web scraper. Give it any URL and it punches through Cloudflare, bot detection, and WAFs automatically using a 3-tier cascade (plain HTTP → TLS spoof...

使用说明 (SKILL.md)

FlowCrawl

Scrape any website. Bypass any bot protection. Free.

Install Scrapling First

pip install scrapling

Scrapling installs Playwright automatically on first run. That's the only dependency.

Quick Usage

# Single URL — prints clean markdown to stdout
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com

# Spider the whole site
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --deep

# Deep crawl with limits, save and combine
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --deep --limit 30 --combine

# JSON output — pipe into anything
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --json

Add Alias (Recommended)

echo 'alias flowcrawl="python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py"' >> ~/.zshrc
source ~/.zshrc

Then just: flowcrawl https://example.com

How It Works

FlowCrawl uses a 3-tier fetcher cascade. Starts fast, escalates only when blocked:

Tier	Method	Handles
1	Plain HTTP	Most sites, instant
2	Stealth + TLS spoof	Cloudflare, Imperva, basic WAFs
3	Full JS execution	SPAs, heavy JS, aggressive bot detection

Auto-detects blocking (403, 503, "Just a moment...") and escalates silently.

All Options

Flag	Description	Default
`--deep`	Spider whole site following internal links	off
`--depth N`	Max hop depth from start URL	3
`--limit N`	Max pages to crawl	50
`--combine`	Merge all pages into one file	off
`--format md\|txt`	Output format	md
`--output DIR`	Output directory	./flowcrawl-output
`--json`	Structured JSON output	off
`--quiet`	Suppress progress logs	off

安全使用建议

This skill is coherent with its stated aim of bypassing bot protections, but that purpose is inherently risky and may violate site terms or laws. Before installing: 1) Decide whether evading WAFs/Cloudflare is appropriate and legal for your use case — don’t use on sites you don’t own or without permission. 2) Review the scrapling project source and trustworthiness (pip package + GitHub repo) because installing it will bring Playwright and download browser binaries. 3) Be aware the README suggests modifying ~/.zshrc (adds an alias); only do this if you want that persistent change. 4) Run in an isolated environment (VM/container) if you want to reduce risk of surprising downloads or side effects. 5) If you plan to use this in production or in an automated agent, consider legal/ethical review and logging/limits to avoid abusive scraping. If you want a lower-risk option, prefer tools that respect robots.txt and avoid active fingerprint spoofing.

功能分析

Type: OpenClaw Skill Name: flowcrawl Version: 1.1.0 FlowCrawl is a web scraping utility that implements a three-tier escalation strategy (plain HTTP, TLS spoofing, and full JS execution) using the 'scrapling' library to bypass bot protections. The Python script in `scripts/flowcrawl.py` contains standard crawling logic, markdown extraction, and local file management without any evidence of data exfiltration, unauthorized network calls, or malicious execution. While `SKILL.md` suggests adding a shell alias to `~/.zshrc`, this is presented as a documented convenience for CLI usage rather than a hidden persistence mechanism.

能力评估

ℹ Purpose & Capability

The name/description (stealth scraper that 'punches through Cloudflare/WAFs') align with the included code and SKILL.md: the CLI uses a three-tier escalation (plain HTTP → stealth/TLS spoof → full JS via Playwright). No unrelated credentials or config are requested. The claim 'No CDP Chrome' is potentially misleading because Playwright and stealth tooling are used—functionally this is a browser-automation based bypass stack, which matches the stated purpose but the marketing is aggressive and possibly inaccurate.

⚠ Instruction Scope

SKILL.md instructs the user to pip install scrapling (which will pull Playwright and stealth plugins) and to add an alias to the user's shell rc (~/.zshrc). The runtime instructions and code explicitly escalate to evasion techniques (TLS fingerprint spoofing, stealth plugins, full JS execution) to bypass protections — behavior that intentionally evades server-side defenses and could violate terms of service or laws. The skill does not attempt to read unrelated local files, nor does it exfiltrate data to external endpoints, but it does modify user shell config via the recommended alias and triggers external downloads when installed or run.

⚠ Install Mechanism

There is no registry install spec, but SKILL.md requires 'pip install scrapling'. Scrapling will install Playwright and (on first run) download browser binaries — a network-driven install that writes binaries to disk. The lack of a formal install spec in the registry plus the implicit heavy runtime dependency (Playwright/browser downloads) is a practical installation risk and should be made explicit to users. The pip/Playwright download is from public registries, not an unknown URL, but can be large and perform additional network activity.

✓ Credentials

The skill requests no environment variables, no credentials, and no special config paths. That is proportionate to a local scraper tool. There are no declared requirements for unrelated secrets or remote service keys.

ℹ Persistence & Privilege

The skill is user-invocable and not 'always: true' (no elevated persistent privilege). However SKILL.md recommends adding an alias to ~/.zshrc which writes to the user's shell config — a mild, user-visible persistence action. Playwright will also place browser artifacts on disk. The skill does not modify other skills or system-wide OpenClaw settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install flowcrawl
安装完成后，直接呼叫该 Skill 的名称或使用 /flowcrawl 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

Stealth web scraper. Punches through Cloudflare, bot detection, and WAFs using a 3-tier cascade (plain HTTP, TLS spoof, full JS). No API keys, no proxies, no CDP Chrome. Free from the Flow team.

v1.0.2

Version 1.0.2 of FlowCrawl - No file changes were detected in this version. - Functionality, documentation, and options remain unchanged.

v1.0.1

- Updated SKILL.md with improved description and branding. - Clarified usage and description to emphasize FlowCrawl’s ability to bypass bot protection. - Adjusted skill name casing and authorship notes. - No code changes; documentation only.

v1.0.0

Initial release of FlowCrawl, a stealth web scraper that bypasses Cloudflare and bot protections. - Introduces a 3-tier cascade for web scraping: plain HTTP → TLS fingerprint spoofing → full JS execution. - Requires Scrapling (installs Playwright on first use) as the only dependency. - Offers CLI usage for scraping single URLs, deep site crawling, output in markdown or JSON, and output combining. - Includes flags for crawl depth, page limits, output format, and quiet mode. - Automatically detects and escalates around site blocks, supporting most modern anti-bot protections.

元数据

Slug flowcrawl

版本 1.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 4

常见问题