← 返回 Skills 市场
tsingliuwin

Siteone Crawler

作者 Tsingliu · GitHub ↗ · v0.0.1 · MIT-0
cross-platform ⚠ suspicious
12
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install siteone-crawler
功能描述
Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze...
使用说明 (SKILL.md)

SiteOne Crawler

Cross-platform website crawler/analyzer written in Rust.

Setup (run once)

Before first use, ensure the binary exists. If not found, install it automatically:

  1. Check if binary exists at the paths below (in order of priority):
    • $HOME/.siteone-crawler/siteone-crawler
    • Any siteone-crawler found in $PATH (via which siteone-crawler)
  2. If neither exists, download the latest release from GitHub:
    INSTALL_DIR="$HOME/.siteone-crawler"
    mkdir -p "$INSTALL_DIR"
    # Detect OS/arch
    OS=$(uname -s | tr '[:upper:]' '[:lower:]')
    ARCH=$(uname -m)
    case "$ARCH" in x86_64) ARCH="x64" ;; aarch64|arm64) ARCH="arm64" ;; esac
    # Get latest release URL from GitHub API
    RELEASE_URL=$(curl -sL https://api.github.com/repos/janreges/siteone-crawler/releases/latest \
      | grep -oP "browser_download_url.*?${OS}-${ARCH}\.zip" | head -1 | sed 's/browser_download_url": "//')
    curl -sL "$RELEASE_URL" -o /tmp/siteone-crawler.zip \
      && unzip -o /tmp/siteone-crawler.zip -d /tmp/siteone-crawler \
      && mv /tmp/siteone-crawler/siteone-crawler "$INSTALL_DIR/" \
      && chmod +x "$INSTALL_DIR/siteone-crawler" \
      && rm -rf /tmp/siteone-crawler /tmp/siteone-crawler.zip
    
  3. After installation, set CRAWLER to the resolved path and verify with $CRAWLER --version.

Binary

CRAWLER="$HOME/.siteone-crawler/siteone-crawler"

If the above path doesn't exist, fall back to $(which siteone-crawler) after running Setup.

Always use the resolved path. The binary outputs colored text to terminal; use --no-color for script/pipeline usage and --output json for programmatic consumption.

Common Workflows

1. Quick Audit (HTML report)

$CRAWLER --url="https://example.com" --output-html-report="/path/to/report.html"

Generates a self-contained interactive HTML audit report with quality scores (0.0-10.0) across Performance, SEO, Security, Accessibility, Best Practices.

2. Full Audit + JSON + Upload

$CRAWLER --url="https://example.com" \
  --output-html-report="/path/to/report.html" \
  --output-json-file="/path/to/result.json" \
  --upload --upload-retention="7d"

3. Offline Clone

$CRAWLER --url="https://example.com" --offline-export-dir="/path/to/offline-site" --disable-javascript

Use --disable-javascript for SPA/React sites to get a browsable static version. Use --allowed-domain-for-external-files="*" to include CDN assets.

4. Markdown Export

Multi-file (browsable):

$CRAWLER --url="https://example.com" --markdown-export-dir="/path/to/md-export"

Single-file (ideal for AI tools):

$CRAWLER --url="https://example.com" --markdown-export-dir="/tmp/md" --markdown-export-single-file="/path/to/site.md" \
  --markdown-disable-images --markdown-disable-files

5. Sitemap Generation

$CRAWLER --url="https://example.com" --sitemap-xml-file="/path/to/sitemap" --sitemap-txt-file="/path/to/sitemap"

6. CI/CD Quality Gate

$CRAWLER --url="https://example.com" --ci --ci-min-score="7.0" --ci-max-404="0" --ci-max-5xx="0"

Exit code 10 if thresholds not met. See references/cli-params.md for all --ci-* options.

7. Stress/Load Test

$CRAWLER --url="https://example.com" --workers="20" --max-reqs-per-sec="100" --max-depth="1"

Warning: high worker counts can cause DoS. Use with caution.

8. Single Page Crawl

$CRAWLER --url="https://example.com/about" --single-page --output-json-file="/path/to/result.json"

9. HTML-to-Markdown (local file)

$CRAWLER --html-to-markdown="/path/to/page.html" --html-to-markdown-output="/path/to/page.md"

10. Browse Exported Content

$CRAWLER --serve-markdown="/path/to/md-export" --serve-port="8321"
$CRAWLER --serve-offline="/path/to/offline-site" --serve-port="8321"

Key Parameters Reference

See references/cli-params.md for the complete parameter reference organized by category.

Most-used flags

Flag Purpose Default
--url Target URL (required) -
--output text or json text
--workers Concurrent threads 3
--max-reqs-per-sec Requests per second limit 10
--max-depth Crawl depth (0 = unlimited) 0
--timeout Request timeout in seconds 5
--no-color Disable colors off
--ignore-robots-txt Ignore robots.txt off

Resource filtering

Flag Effect
--disable-all-assets Only crawl pages
--disable-javascript No JS (recommended for offline/SPA)
--disable-images No images
--disable-styles No CSS
--disable-files No downloadable docs

URL filtering

Flag Effect
--include-regex PCRE regex to include URLs
--ignore-regex PCRE regex to skip URLs
--allowed-domain-for-crawling Allow cross-domain crawling
--allowed-domain-for-external-files Allow external asset domains

Script Helpers

scripts/audit.sh — Quick audit wrapper

Runs a full crawl with HTML report and optional JSON output. See script for usage.

scripts/export-markdown.sh — Markdown export wrapper

Exports a website to markdown (single or multi-file). See script for usage.

Tips

  • For modern JS frameworks (Next.js, React, Vue), add --disable-javascript when doing offline exports
  • Use --output json for programmatic processing; JSON goes to STDOUT, progress to STDERR
  • Use --extra-columns="Title,Keywords,Description" to add SEO columns
  • Use --timezone="Asia/Shanghai" for local timestamps
  • For large sites, increase --memory-limit, --max-visited-urls, and --max-queue-length
  • Use --resolve to test local/dev servers (like curl --resolve)
  • HTML reports are self-contained — open in any browser, no server needed
安全使用建议
Before installing, verify the SiteOne Crawler release source and prefer a pinned, checksum-verified binary. Run crawls and load tests only on sites you own or have permission to test. Keep reports local for private sites unless you intentionally enable upload, and avoid passing cookies, passwords, or tokens unless necessary.
功能分析
Type: OpenClaw Skill Name: siteone-crawler Version: 0.0.1 The skill bundle includes a setup script in SKILL.md that downloads and executes a binary from a remote GitHub repository (janreges/siteone-crawler), a high-risk installation pattern. It also features an optional report upload function to an external endpoint (crawler.siteone.io) and supports high-concurrency crawling capable of being used for DoS attacks. While these features are documented and align with the tool's purpose as a website auditor, the combination of automated remote binary execution and external data transmission warrants a suspicious classification.
能力评估
Purpose & Capability
Website crawling, auditing, offline cloning, markdown export, sitemap generation, and local report generation are coherent with the stated purpose; stress/load testing and report upload are more sensitive but disclosed.
Instruction Scope
The skill is user-invocable and its examples are task-oriented. It does not show prompt hijacking, but users should ensure upload and load-test workflows are only run with explicit intent.
Install Mechanism
There is no formal install spec, while SKILL.md instructs downloading the latest GitHub release, unzipping it, moving it into the home directory, and making it executable without a pinned version or checksum.
Credentials
Network crawling and writing reports/exports to user-specified directories are expected for this tool. Optional third-party upload and high-rate crawling can affect privacy or third-party systems.
Persistence & Privilege
The setup persists a crawler binary under ~/.siteone-crawler and helper scripts write reports/exports, but the artifacts do not show root installation, background daemons, or hidden persistence.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install siteone-crawler
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /siteone-crawler 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.0.1
- Initial release of SiteOne Crawler skill for website crawling, auditing, cloning, and markdown export. - Supports actions such as SEO/security/performance/accessibility audit (HTML or JSON report), offline site cloning, markdown export (single/multi-file), sitemap generation, and CI/CD quality gate checks. - Automatically installs or resolves the crawler Rust binary if not found. - Includes common usage workflows and command examples for quick setup. - Key CLI flags, advanced filtering, and tips for different site scenarios provided in documentation.
元数据
Slug siteone-crawler
版本 0.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Siteone Crawler 是什么?

Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 12 次。

如何安装 Siteone Crawler?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install siteone-crawler」即可一键安装,无需额外配置。

Siteone Crawler 是免费的吗?

是的,Siteone Crawler 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Siteone Crawler 支持哪些平台?

Siteone Crawler 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Siteone Crawler?

由 Tsingliu(@tsingliuwin)开发并维护,当前版本 v0.0.1。

💬 留言讨论