← 返回 Skills 市场

Siteone Crawler

Name: Siteone Crawler
Author: tsingliuwin

作者 Tsingliu · GitHub ↗ · v0.0.1 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install siteone-crawler

功能描述

Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze...

使用说明 (SKILL.md)

SiteOne Crawler

Cross-platform website crawler/analyzer written in Rust.

Setup (run once)

Before first use, ensure the binary exists. If not found, install it automatically:

Check if binary exists at the paths below (in order of priority):
- $HOME/.siteone-crawler/siteone-crawler
- Any siteone-crawler found in $PATH (via which siteone-crawler)

If neither exists, download the latest release from GitHub:

INSTALL_DIR="$HOME/.siteone-crawler"
mkdir -p "$INSTALL_DIR"
# Detect OS/arch
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
ARCH=$(uname -m)
case "$ARCH" in x86_64) ARCH="x64" ;; aarch64|arm64) ARCH="arm64" ;; esac
# Get latest release URL from GitHub API
RELEASE_URL=$(curl -sL https://api.github.com/repos/janreges/siteone-crawler/releases/latest \
  | grep -oP "browser_download_url.*?${OS}-${ARCH}\.zip" | head -1 | sed 's/browser_download_url": "//')
curl -sL "$RELEASE_URL" -o /tmp/siteone-crawler.zip \
  && unzip -o /tmp/siteone-crawler.zip -d /tmp/siteone-crawler \
  && mv /tmp/siteone-crawler/siteone-crawler "$INSTALL_DIR/" \
  && chmod +x "$INSTALL_DIR/siteone-crawler" \
  && rm -rf /tmp/siteone-crawler /tmp/siteone-crawler.zip

After installation, set CRAWLER to the resolved path and verify with $CRAWLER --version.

Binary

CRAWLER="$HOME/.siteone-crawler/siteone-crawler"

If the above path doesn't exist, fall back to $(which siteone-crawler) after running Setup.

Always use the resolved path. The binary outputs colored text to terminal; use --no-color for script/pipeline usage and --output json for programmatic consumption.

Common Workflows

1. Quick Audit (HTML report)

$CRAWLER --url="https://example.com" --output-html-report="/path/to/report.html"

Generates a self-contained interactive HTML audit report with quality scores (0.0-10.0) across Performance, SEO, Security, Accessibility, Best Practices.

2. Full Audit + JSON + Upload

$CRAWLER --url="https://example.com" \
  --output-html-report="/path/to/report.html" \
  --output-json-file="/path/to/result.json" \
  --upload --upload-retention="7d"

3. Offline Clone

$CRAWLER --url="https://example.com" --offline-export-dir="/path/to/offline-site" --disable-javascript

Use --disable-javascript for SPA/React sites to get a browsable static version. Use --allowed-domain-for-external-files="*" to include CDN assets.

4. Markdown Export

Multi-file (browsable):

$CRAWLER --url="https://example.com" --markdown-export-dir="/path/to/md-export"

Single-file (ideal for AI tools):

$CRAWLER --url="https://example.com" --markdown-export-dir="/tmp/md" --markdown-export-single-file="/path/to/site.md" \
  --markdown-disable-images --markdown-disable-files

5. Sitemap Generation

$CRAWLER --url="https://example.com" --sitemap-xml-file="/path/to/sitemap" --sitemap-txt-file="/path/to/sitemap"

6. CI/CD Quality Gate

$CRAWLER --url="https://example.com" --ci --ci-min-score="7.0" --ci-max-404="0" --ci-max-5xx="0"

Exit code 10 if thresholds not met. See references/cli-params.md for all --ci-* options.

7. Stress/Load Test

$CRAWLER --url="https://example.com" --workers="20" --max-reqs-per-sec="100" --max-depth="1"

Warning: high worker counts can cause DoS. Use with caution.

8. Single Page Crawl

$CRAWLER --url="https://example.com/about" --single-page --output-json-file="/path/to/result.json"

9. HTML-to-Markdown (local file)

$CRAWLER --html-to-markdown="/path/to/page.html" --html-to-markdown-output="/path/to/page.md"

10. Browse Exported Content

$CRAWLER --serve-markdown="/path/to/md-export" --serve-port="8321"
$CRAWLER --serve-offline="/path/to/offline-site" --serve-port="8321"

Key Parameters Reference

See references/cli-params.md for the complete parameter reference organized by category.

Most-used flags

Flag	Purpose	Default
`--url`	Target URL (required)	-
`--output`	`text` or `json`	text
`--workers`	Concurrent threads	3
`--max-reqs-per-sec`	Requests per second limit	10
`--max-depth`	Crawl depth (0 = unlimited)	0
`--timeout`	Request timeout in seconds	5
`--no-color`	Disable colors	off
`--ignore-robots-txt`	Ignore robots.txt	off

Resource filtering

Flag	Effect
`--disable-all-assets`	Only crawl pages
`--disable-javascript`	No JS (recommended for offline/SPA)
`--disable-images`	No images
`--disable-styles`	No CSS
`--disable-files`	No downloadable docs

URL filtering

Flag	Effect
`--include-regex`	PCRE regex to include URLs
`--ignore-regex`	PCRE regex to skip URLs
`--allowed-domain-for-crawling`	Allow cross-domain crawling
`--allowed-domain-for-external-files`	Allow external asset domains

Script Helpers

`scripts/audit.sh` — Quick audit wrapper

Runs a full crawl with HTML report and optional JSON output. See script for usage.

`scripts/export-markdown.sh` — Markdown export wrapper

Exports a website to markdown (single or multi-file). See script for usage.

Tips

For modern JS frameworks (Next.js, React, Vue), add --disable-javascript when doing offline exports
Use --output json for programmatic processing; JSON goes to STDOUT, progress to STDERR
Use --extra-columns="Title,Keywords,Description" to add SEO columns
Use --timezone="Asia/Shanghai" for local timestamps
For large sites, increase --memory-limit, --max-visited-urls, and --max-queue-length
Use --resolve to test local/dev servers (like curl --resolve)
HTML reports are self-contained — open in any browser, no server needed

安全使用建议

Before installing, verify the SiteOne Crawler release source and prefer a pinned, checksum-verified binary. Run crawls and load tests only on sites you own or have permission to test. Keep reports local for private sites unless you intentionally enable upload, and avoid passing cookies, passwords, or tokens unless necessary.

功能分析

Type: OpenClaw Skill Name: siteone-crawler Version: 0.0.1 The skill bundle includes a setup script in SKILL.md that downloads and executes a binary from a remote GitHub repository (janreges/siteone-crawler), a high-risk installation pattern. It also features an optional report upload function to an external endpoint (crawler.siteone.io) and supports high-concurrency crawling capable of being used for DoS attacks. While these features are documented and align with the tool's purpose as a website auditor, the combination of automated remote binary execution and external data transmission warrants a suspicious classification.

能力评估

ℹ Purpose & Capability

Website crawling, auditing, offline cloning, markdown export, sitemap generation, and local report generation are coherent with the stated purpose; stress/load testing and report upload are more sensitive but disclosed.

ℹ Instruction Scope

The skill is user-invocable and its examples are task-oriented. It does not show prompt hijacking, but users should ensure upload and load-test workflows are only run with explicit intent.

⚠ Install Mechanism

There is no formal install spec, while SKILL.md instructs downloading the latest GitHub release, unzipping it, moving it into the home directory, and making it executable without a pinned version or checksum.

ℹ Credentials

Network crawling and writing reports/exports to user-specified directories are expected for this tool. Optional third-party upload and high-rate crawling can affect privacy or third-party systems.

ℹ Persistence & Privilege

The setup persists a crawler binary under ~/.siteone-crawler and helper scripts write reports/exports, but the artifacts do not show root installation, background daemons, or hidden persistence.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install siteone-crawler
安装完成后，直接呼叫该 Skill 的名称或使用 /siteone-crawler 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.0.1

- Initial release of SiteOne Crawler skill for website crawling, auditing, cloning, and markdown export. - Supports actions such as SEO/security/performance/accessibility audit (HTML or JSON report), offline site cloning, markdown export (single/multi-file), sitemap generation, and CI/CD quality gate checks. - Automatically installs or resolves the crawler Rust binary if not found. - Includes common usage workflows and command examples for quick setup. - Key CLI flags, advanced filtering, and tips for different site scenarios provided in documentation.

元数据

Slug siteone-crawler

版本 0.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Siteone Crawler 是什么？

Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 12 次。

如何安装 Siteone Crawler？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install siteone-crawler」即可一键安装，无需额外配置。

Siteone Crawler 是免费的吗？

是的，Siteone Crawler 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Siteone Crawler 支持哪些平台？

Siteone Crawler 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Siteone Crawler？

由 Tsingliu（@tsingliuwin）开发并维护，当前版本 v0.0.1。