← 返回 Skills 市场
felixopt17

test_skill

作者 felixopt17 · GitHub ↗ · v1.0.9 · MIT-0
cross-platform ✓ 安全检测通过
408
总下载
0
收藏
0
当前安装
10
版本数
在 OpenClaw 中安装
/install bbccrawlermaxclaw
功能描述
Web crawler using BFS and anti-scraping to extract and save structured BBC and general news content in Markdown with multi-site and dedup support.
使用说明 (SKILL.md)

BBC Crawler MaxClaw

Description

A powerful, universal web crawler optimized for BBC News but capable of crawling other sites. It integrates advanced scraping technologies including Crawl4AI and Playwright to handle dynamic content and anti-bot protections.

Features

  • Multi-Method Extraction:
    • crawl4ai: Primary method using AsyncWebCrawler for high performance and accuracy.
    • playwright: Full browser rendering fallback for complex dynamic pages.
    • requests: Fast fallback for static content.
    • auto: Automatically detects the best method (Prioritizes Crawl4AI).
  • Hierarchical Storage: Saves content in a structured format: YYYY-MM-DD/Category/Title.md.
  • Local Image Archiving: Downloads images locally, names them by MD5 hash, and updates Markdown references.
  • Content Filtering: Intelligently extracts main article content and relevant images using CSS selectors.

Requirements

  • Python 3.9+
  • See requirements.txt for Python packages.

Installation

# 1. Install dependencies
# Note: install.py supports passing arguments to pip, e.g., --break-system-packages
python install.py

# Example for environments requiring --break-system-packages:
python install.py --break-system-packages

Usage

Basic Usage

python universal_crawler_v2.py --url https://www.bbc.co.uk/news --max-pages 50

Advanced Usage

# Force Crawl4AI
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --method crawl4ai

# Force Playwright
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --method playwright

# Control depth and delay
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --depth 3 --delay 2.5

# Specify output directory
python universal_crawler_v2.py --url https://www.bbc.co.uk/news --output ./my_data

Troubleshooting

  • Import Errors: If you see "No module named 'crawl4ai'" or similar, run python install.py again.
  • Empty Responses: Ensure you have the latest version of the crawler. Some sites may block specific IPs or user agents; try increasing delay or switching methods.
安全使用建议
This package appears to be a coherent web crawler. Before installing or running it: 1) run pip installs and Playwright browser installs in a virtualenv or sandbox (not as root) to avoid system package conflicts; 2) review the requirements (especially 'crawl4ai') and verify their provenance and any credentials they might require; 3) be mindful of legal/ethical rules: respect robots.txt and site terms, and avoid aggressive crawling—use delays and domain restrictions; 4) if you need higher assurance, inspect the full universal_crawler_v2.py (the provided file was truncated) and run the code in an isolated network environment to observe outbound connections made by dependencies.
功能分析
Type: OpenClaw Skill Name: bbccrawlermaxclaw Version: 1.0.9 The skill bundle provides a functional web crawler designed for BBC News and general websites, utilizing Crawl4AI, Playwright, and Requests for content extraction. The primary logic in `universal_crawler_v2.py` focuses on hierarchical content storage, image localization, and metadata extraction, while `install.py` handles dependency management and Playwright browser installation. No evidence of data exfiltration, unauthorized remote execution, persistence mechanisms, or malicious prompt injection was found; the code and documentation (including AIGC metadata) are consistent with the stated purpose.
能力评估
Purpose & Capability
Name/description (BBC-focused universal crawler with anti-scraping fallbacks) match the included code and scripts: a multi-method crawler (crawl4ai, playwright, requests), deduping, image download, and Markdown output. Minor inconsistencies (README mentions Python 3.8+, SKILL.md says 3.9+) do not change purpose.
Instruction Scope
SKILL.md instructs only to install Python deps and run the crawler with CLI flags. It does not instruct reading unrelated local files or environment secrets, nor does it send collected data to unexpected endpoints (the code crawls target sites and writes local files). The crawler will perform network requests to target websites as expected.
Install Mechanism
No platform install spec declared in registry, but repository includes install.py / install_dependencies.sh that run pip install -r requirements.txt and run 'python -m playwright install chromium'. Dependencies are fetched via pip and Playwright's browser install (standard mechanisms). Note: crawl4ai is a third‑party package (no pinned source) and Playwright will download browser binaries from the web—recommend verifying packages and running installs in an isolated environment.
Credentials
The skill declares no required environment variables, credentials, or config paths. Code does not read secrets or request unrelated credentials. Dependencies may later require credentials (e.g., if some optional third-party services are used), so check upstream package docs.
Persistence & Privilege
Skill is not always-enabled and does not request elevated platform privileges. It writes lock files and output data under its working directory only. No modifications to other skills or global agent settings are present.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install bbccrawlermaxclaw
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /bbccrawlermaxclaw 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.9
Version 1.0.9 of bbccrawlermaxclaw - No file or documentation changes detected in this release. - Functionality, usage, and instructions remain unchanged from the previous version.
v1.0.8
Version 1.0.8 of bbccrawlermaxclaw - No file changes detected in this release. - Documentation and usage instructions remain unchanged from the previous version.
v1.0.7
Version 1.0.7 of bbccrawlermaxclaw - No changes detected in the code or documentation. - The SKILL.md file remains unchanged in content.
v1.0.6
- Added installation and run helper scripts: install_dependencies.sh and run_bbc_crawler.sh. - Added manifest.json for clearer skill metadata and integration support. - Updated documentation to reference Crawl4AI as the main async crawling method. - Adjusted troubleshooting instructions to refer to "crawl4ai" instead of "browserforge".
v1.0.5
- Replaced the Scrapling (StealthyFetcher) extraction method with Crawl4AI, prioritizing async crawling for improved speed and efficiency. - Updated documentation to reflect the new crawl4ai method, replacing references to scrapling. - Adjusted usage instructions to include crawl4ai as a command-line option. - Removed two shell scripts related to installation and running; added a crawl state JSON file for tracking or persistence.
v1.0.4
No changes detected in this version. - No file or documentation changes were made in version 1.0.4.
v1.0.3
- Removed support and documentation for the Crawl4AI extraction method; only Scrapling and Playwright methods are now described. - Updated installation instructions to clarify use of install.py with pip arguments such as --break-system-packages. - Removed notes about lxml dependency conflicts and simplified requirements and install sections. - All other usage and troubleshooting steps remain unchanged.
v1.0.2
- Added install.py script to handle installation and resolve dependency conflicts between Scrapling and Crawl4ai. - Updated documentation to reflect install.py usage and clarified dependency instructions. - Expanded feature list and technical details in SKILL.md. - Improved troubleshooting steps and clarified advanced options.
v1.0.1
bbc_crawler_maxclaw v1.0.1 - 明确高级爬虫依赖(crawl4ai、scrapling、playwright)为“必须安装”,并新增 playwright 安装命令 - 新增 `--method` 参数,允许在命令行选择爬虫核心(auto、scrapling、crawl4ai、playwright、requests) - 更新命令行示例,展示如何指定不同爬虫引擎 - 原依赖说明从“基础/可选”调整为“核心/必须安装”,更清晰易懂
v1.0.0
111
元数据
Slug bbccrawlermaxclaw
版本 1.0.9
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 10
常见问题

test_skill 是什么?

Web crawler using BFS and anti-scraping to extract and save structured BBC and general news content in Markdown with multi-site and dedup support. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 408 次。

如何安装 test_skill?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install bbccrawlermaxclaw」即可一键安装,无需额外配置。

test_skill 是免费的吗?

是的,test_skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

test_skill 支持哪些平台?

test_skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 test_skill?

由 felixopt17(@felixopt17)开发并维护,当前版本 v1.0.9。

💬 留言讨论