← Back to Skills Marketplace
felixopt17

test_skill

by felixopt17 · GitHub ↗ · v1.0.9 · MIT-0
cross-platform ✓ Security Clean
408
Downloads
0
Stars
0
Active Installs
10
Versions
Install in OpenClaw
/install bbccrawlermaxclaw
Description
Web crawler using BFS and anti-scraping to extract and save structured BBC and general news content in Markdown with multi-site and dedup support.
README (SKILL.md)

BBC Crawler MaxClaw

Description

A powerful, universal web crawler optimized for BBC News but capable of crawling other sites. It integrates advanced scraping technologies including Crawl4AI and Playwright to handle dynamic content and anti-bot protections.

Features

  • Multi-Method Extraction:
    • crawl4ai: Primary method using AsyncWebCrawler for high performance and accuracy.
    • playwright: Full browser rendering fallback for complex dynamic pages.
    • requests: Fast fallback for static content.
    • auto: Automatically detects the best method (Prioritizes Crawl4AI).
  • Hierarchical Storage: Saves content in a structured format: YYYY-MM-DD/Category/Title.md.
  • Local Image Archiving: Downloads images locally, names them by MD5 hash, and updates Markdown references.
  • Content Filtering: Intelligently extracts main article content and relevant images using CSS selectors.

Requirements

  • Python 3.9+
  • See requirements.txt for Python packages.

Installation

# 1. Install dependencies
# Note: install.py supports passing arguments to pip, e.g., --break-system-packages
python install.py

# Example for environments requiring --break-system-packages:
python install.py --break-system-packages

Usage

Basic Usage

python universal_crawler_v2.py --url https://www.bbc.co.uk/news --max-pages 50

Advanced Usage

# Force Crawl4AI
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --method crawl4ai

# Force Playwright
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --method playwright

# Control depth and delay
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --depth 3 --delay 2.5

# Specify output directory
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --output ./my_data

Troubleshooting

  • Import Errors: If you see "No module named 'crawl4ai'" or similar, run python install.py again.
  • Empty Responses: Ensure you have the latest version of the crawler. Some sites may block specific IPs or user agents; try increasing delay or switching methods.
Usage Guidance
This package appears to be a coherent web crawler. Before installing or running it: 1) run pip installs and Playwright browser installs in a virtualenv or sandbox (not as root) to avoid system package conflicts; 2) review the requirements (especially 'crawl4ai') and verify their provenance and any credentials they might require; 3) be mindful of legal/ethical rules: respect robots.txt and site terms, and avoid aggressive crawling—use delays and domain restrictions; 4) if you need higher assurance, inspect the full universal_crawler_v2.py (the provided file was truncated) and run the code in an isolated network environment to observe outbound connections made by dependencies.
Capability Analysis
Type: OpenClaw Skill Name: bbccrawlermaxclaw Version: 1.0.9 The skill bundle provides a functional web crawler designed for BBC News and general websites, utilizing Crawl4AI, Playwright, and Requests for content extraction. The primary logic in `universal_crawler_v2.py` focuses on hierarchical content storage, image localization, and metadata extraction, while `install.py` handles dependency management and Playwright browser installation. No evidence of data exfiltration, unauthorized remote execution, persistence mechanisms, or malicious prompt injection was found; the code and documentation (including AIGC metadata) are consistent with the stated purpose.
Capability Assessment
Purpose & Capability
Name/description (BBC-focused universal crawler with anti-scraping fallbacks) match the included code and scripts: a multi-method crawler (crawl4ai, playwright, requests), deduping, image download, and Markdown output. Minor inconsistencies (README mentions Python 3.8+, SKILL.md says 3.9+) do not change purpose.
Instruction Scope
SKILL.md instructs only to install Python deps and run the crawler with CLI flags. It does not instruct reading unrelated local files or environment secrets, nor does it send collected data to unexpected endpoints (the code crawls target sites and writes local files). The crawler will perform network requests to target websites as expected.
Install Mechanism
No platform install spec declared in registry, but repository includes install.py / install_dependencies.sh that run pip install -r requirements.txt and run 'python -m playwright install chromium'. Dependencies are fetched via pip and Playwright's browser install (standard mechanisms). Note: crawl4ai is a third‑party package (no pinned source) and Playwright will download browser binaries from the web—recommend verifying packages and running installs in an isolated environment.
Credentials
The skill declares no required environment variables, credentials, or config paths. Code does not read secrets or request unrelated credentials. Dependencies may later require credentials (e.g., if some optional third-party services are used), so check upstream package docs.
Persistence & Privilege
Skill is not always-enabled and does not request elevated platform privileges. It writes lock files and output data under its working directory only. No modifications to other skills or global agent settings are present.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install bbccrawlermaxclaw
  3. After installation, invoke the skill by name or use /bbccrawlermaxclaw
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.9
Version 1.0.9 of bbccrawlermaxclaw - No file or documentation changes detected in this release. - Functionality, usage, and instructions remain unchanged from the previous version.
v1.0.8
Version 1.0.8 of bbccrawlermaxclaw - No file changes detected in this release. - Documentation and usage instructions remain unchanged from the previous version.
v1.0.7
Version 1.0.7 of bbccrawlermaxclaw - No changes detected in the code or documentation. - The SKILL.md file remains unchanged in content.
v1.0.6
- Added installation and run helper scripts: install_dependencies.sh and run_bbc_crawler.sh. - Added manifest.json for clearer skill metadata and integration support. - Updated documentation to reference Crawl4AI as the main async crawling method. - Adjusted troubleshooting instructions to refer to "crawl4ai" instead of "browserforge".
v1.0.5
- Replaced the Scrapling (StealthyFetcher) extraction method with Crawl4AI, prioritizing async crawling for improved speed and efficiency. - Updated documentation to reflect the new crawl4ai method, replacing references to scrapling. - Adjusted usage instructions to include crawl4ai as a command-line option. - Removed two shell scripts related to installation and running; added a crawl state JSON file for tracking or persistence.
v1.0.4
No changes detected in this version. - No file or documentation changes were made in version 1.0.4.
v1.0.3
- Removed support and documentation for the Crawl4AI extraction method; only Scrapling and Playwright methods are now described. - Updated installation instructions to clarify use of install.py with pip arguments such as --break-system-packages. - Removed notes about lxml dependency conflicts and simplified requirements and install sections. - All other usage and troubleshooting steps remain unchanged.
v1.0.2
- Added install.py script to handle installation and resolve dependency conflicts between Scrapling and Crawl4ai. - Updated documentation to reflect install.py usage and clarified dependency instructions. - Expanded feature list and technical details in SKILL.md. - Improved troubleshooting steps and clarified advanced options.
v1.0.1
bbc_crawler_maxclaw v1.0.1 - 明确高级爬虫依赖(crawl4ai、scrapling、playwright)为“必须安装”,并新增 playwright 安装命令 - 新增 `--method` 参数,允许在命令行选择爬虫核心(auto、scrapling、crawl4ai、playwright、requests) - 更新命令行示例,展示如何指定不同爬虫引擎 - 原依赖说明从“基础/可选”调整为“核心/必须安装”,更清晰易懂
v1.0.0
111
Metadata
Slug bbccrawlermaxclaw
Version 1.0.9
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 10
Frequently Asked Questions

What is test_skill?

Web crawler using BFS and anti-scraping to extract and save structured BBC and general news content in Markdown with multi-site and dedup support. It is an AI Agent Skill for Claude Code / OpenClaw, with 408 downloads so far.

How do I install test_skill?

Run "/install bbccrawlermaxclaw" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is test_skill free?

Yes, test_skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does test_skill support?

test_skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created test_skill?

It is built and maintained by felixopt17 (@felixopt17); the current version is v1.0.9.

💬 Comments