← Back to Skills Marketplace

Siteone Crawler

Name: Siteone Crawler
Author: tsingliuwin

by Tsingliu · GitHub ↗ · v0.0.1 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install siteone-crawler

Description

Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze...

README (SKILL.md)

SiteOne Crawler

Cross-platform website crawler/analyzer written in Rust.

Setup (run once)

Before first use, ensure the binary exists. If not found, install it automatically:

Check if binary exists at the paths below (in order of priority):
- $HOME/.siteone-crawler/siteone-crawler
- Any siteone-crawler found in $PATH (via which siteone-crawler)

If neither exists, download the latest release from GitHub:

INSTALL_DIR="$HOME/.siteone-crawler"
mkdir -p "$INSTALL_DIR"
# Detect OS/arch
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
ARCH=$(uname -m)
case "$ARCH" in x86_64) ARCH="x64" ;; aarch64|arm64) ARCH="arm64" ;; esac
# Get latest release URL from GitHub API
RELEASE_URL=$(curl -sL https://api.github.com/repos/janreges/siteone-crawler/releases/latest \
  | grep -oP "browser_download_url.*?${OS}-${ARCH}\.zip" | head -1 | sed 's/browser_download_url": "//')
curl -sL "$RELEASE_URL" -o /tmp/siteone-crawler.zip \
  && unzip -o /tmp/siteone-crawler.zip -d /tmp/siteone-crawler \
  && mv /tmp/siteone-crawler/siteone-crawler "$INSTALL_DIR/" \
  && chmod +x "$INSTALL_DIR/siteone-crawler" \
  && rm -rf /tmp/siteone-crawler /tmp/siteone-crawler.zip

After installation, set CRAWLER to the resolved path and verify with $CRAWLER --version.

Binary

CRAWLER="$HOME/.siteone-crawler/siteone-crawler"

If the above path doesn't exist, fall back to $(which siteone-crawler) after running Setup.

Always use the resolved path. The binary outputs colored text to terminal; use --no-color for script/pipeline usage and --output json for programmatic consumption.

Common Workflows

1. Quick Audit (HTML report)

$CRAWLER --url="https://example.com" --output-html-report="/path/to/report.html"

Generates a self-contained interactive HTML audit report with quality scores (0.0-10.0) across Performance, SEO, Security, Accessibility, Best Practices.

2. Full Audit + JSON + Upload

$CRAWLER --url="https://example.com" \
  --output-html-report="/path/to/report.html" \
  --output-json-file="/path/to/result.json" \
  --upload --upload-retention="7d"

3. Offline Clone

$CRAWLER --url="https://example.com" --offline-export-dir="/path/to/offline-site" --disable-javascript

Use --disable-javascript for SPA/React sites to get a browsable static version. Use --allowed-domain-for-external-files="*" to include CDN assets.

4. Markdown Export

Multi-file (browsable):

$CRAWLER --url="https://example.com" --markdown-export-dir="/path/to/md-export"

Single-file (ideal for AI tools):

$CRAWLER --url="https://example.com" --markdown-export-dir="/tmp/md" --markdown-export-single-file="/path/to/site.md" \
  --markdown-disable-images --markdown-disable-files

5. Sitemap Generation

$CRAWLER --url="https://example.com" --sitemap-xml-file="/path/to/sitemap" --sitemap-txt-file="/path/to/sitemap"

6. CI/CD Quality Gate

$CRAWLER --url="https://example.com" --ci --ci-min-score="7.0" --ci-max-404="0" --ci-max-5xx="0"

Exit code 10 if thresholds not met. See references/cli-params.md for all --ci-* options.

7. Stress/Load Test

$CRAWLER --url="https://example.com" --workers="20" --max-reqs-per-sec="100" --max-depth="1"

Warning: high worker counts can cause DoS. Use with caution.

8. Single Page Crawl

$CRAWLER --url="https://example.com/about" --single-page --output-json-file="/path/to/result.json"

9. HTML-to-Markdown (local file)

$CRAWLER --html-to-markdown="/path/to/page.html" --html-to-markdown-output="/path/to/page.md"

10. Browse Exported Content

$CRAWLER --serve-markdown="/path/to/md-export" --serve-port="8321"
$CRAWLER --serve-offline="/path/to/offline-site" --serve-port="8321"

Key Parameters Reference

See references/cli-params.md for the complete parameter reference organized by category.

Most-used flags

Flag	Purpose	Default
`--url`	Target URL (required)	-
`--output`	`text` or `json`	text
`--workers`	Concurrent threads	3
`--max-reqs-per-sec`	Requests per second limit	10
`--max-depth`	Crawl depth (0 = unlimited)	0
`--timeout`	Request timeout in seconds	5
`--no-color`	Disable colors	off
`--ignore-robots-txt`	Ignore robots.txt	off

Resource filtering

Flag	Effect
`--disable-all-assets`	Only crawl pages
`--disable-javascript`	No JS (recommended for offline/SPA)
`--disable-images`	No images
`--disable-styles`	No CSS
`--disable-files`	No downloadable docs

URL filtering

Flag	Effect
`--include-regex`	PCRE regex to include URLs
`--ignore-regex`	PCRE regex to skip URLs
`--allowed-domain-for-crawling`	Allow cross-domain crawling
`--allowed-domain-for-external-files`	Allow external asset domains

Script Helpers

`scripts/audit.sh` — Quick audit wrapper

Runs a full crawl with HTML report and optional JSON output. See script for usage.

`scripts/export-markdown.sh` — Markdown export wrapper

Exports a website to markdown (single or multi-file). See script for usage.

Tips

For modern JS frameworks (Next.js, React, Vue), add --disable-javascript when doing offline exports
Use --output json for programmatic processing; JSON goes to STDOUT, progress to STDERR
Use --extra-columns="Title,Keywords,Description" to add SEO columns
Use --timezone="Asia/Shanghai" for local timestamps
For large sites, increase --memory-limit, --max-visited-urls, and --max-queue-length
Use --resolve to test local/dev servers (like curl --resolve)
HTML reports are self-contained — open in any browser, no server needed

Usage Guidance

Before installing, verify the SiteOne Crawler release source and prefer a pinned, checksum-verified binary. Run crawls and load tests only on sites you own or have permission to test. Keep reports local for private sites unless you intentionally enable upload, and avoid passing cookies, passwords, or tokens unless necessary.

Capability Analysis

Type: OpenClaw Skill Name: siteone-crawler Version: 0.0.1 The skill bundle includes a setup script in SKILL.md that downloads and executes a binary from a remote GitHub repository (janreges/siteone-crawler), a high-risk installation pattern. It also features an optional report upload function to an external endpoint (crawler.siteone.io) and supports high-concurrency crawling capable of being used for DoS attacks. While these features are documented and align with the tool's purpose as a website auditor, the combination of automated remote binary execution and external data transmission warrants a suspicious classification.

Capability Assessment

ℹ Purpose & Capability

Website crawling, auditing, offline cloning, markdown export, sitemap generation, and local report generation are coherent with the stated purpose; stress/load testing and report upload are more sensitive but disclosed.

ℹ Instruction Scope

The skill is user-invocable and its examples are task-oriented. It does not show prompt hijacking, but users should ensure upload and load-test workflows are only run with explicit intent.

⚠ Install Mechanism

There is no formal install spec, while SKILL.md instructs downloading the latest GitHub release, unzipping it, moving it into the home directory, and making it executable without a pinned version or checksum.

ℹ Credentials

Network crawling and writing reports/exports to user-specified directories are expected for this tool. Optional third-party upload and high-rate crawling can affect privacy or third-party systems.

ℹ Persistence & Privilege

The setup persists a crawler binary under ~/.siteone-crawler and helper scripts write reports/exports, but the artifacts do not show root installation, background daemons, or hidden persistence.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install siteone-crawler
After installation, invoke the skill by name or use /siteone-crawler
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.0.1

- Initial release of SiteOne Crawler skill for website crawling, auditing, cloning, and markdown export. - Supports actions such as SEO/security/performance/accessibility audit (HTML or JSON report), offline site cloning, markdown export (single/multi-file), sitemap generation, and CI/CD quality gate checks. - Automatically installs or resolves the crawler Rust binary if not found. - Includes common usage workflows and command examples for quick setup. - Key CLI flags, advanced filtering, and tips for different site scenarios provided in documentation.

Metadata

Slug siteone-crawler

Version 0.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Siteone Crawler?

Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze... It is an AI Agent Skill for Claude Code / OpenClaw, with 12 downloads so far.

How do I install Siteone Crawler?

Run "/install siteone-crawler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Siteone Crawler free?

Yes, Siteone Crawler is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Siteone Crawler support?

Siteone Crawler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Siteone Crawler?

It is built and maintained by Tsingliu (@tsingliuwin); the current version is v0.0.1.

More Skills