/install siteone-crawler
SiteOne Crawler
Cross-platform website crawler/analyzer written in Rust.
Setup (run once)
Before first use, ensure the binary exists. If not found, install it automatically:
- Check if binary exists at the paths below (in order of priority):
$HOME/.siteone-crawler/siteone-crawler- Any
siteone-crawlerfound in$PATH(viawhich siteone-crawler)
- If neither exists, download the latest release from GitHub:
INSTALL_DIR="$HOME/.siteone-crawler" mkdir -p "$INSTALL_DIR" # Detect OS/arch OS=$(uname -s | tr '[:upper:]' '[:lower:]') ARCH=$(uname -m) case "$ARCH" in x86_64) ARCH="x64" ;; aarch64|arm64) ARCH="arm64" ;; esac # Get latest release URL from GitHub API RELEASE_URL=$(curl -sL https://api.github.com/repos/janreges/siteone-crawler/releases/latest \ | grep -oP "browser_download_url.*?${OS}-${ARCH}\.zip" | head -1 | sed 's/browser_download_url": "//') curl -sL "$RELEASE_URL" -o /tmp/siteone-crawler.zip \ && unzip -o /tmp/siteone-crawler.zip -d /tmp/siteone-crawler \ && mv /tmp/siteone-crawler/siteone-crawler "$INSTALL_DIR/" \ && chmod +x "$INSTALL_DIR/siteone-crawler" \ && rm -rf /tmp/siteone-crawler /tmp/siteone-crawler.zip - After installation, set
CRAWLERto the resolved path and verify with$CRAWLER --version.
Binary
CRAWLER="$HOME/.siteone-crawler/siteone-crawler"
If the above path doesn't exist, fall back to $(which siteone-crawler) after running Setup.
Always use the resolved path. The binary outputs colored text to terminal; use --no-color for script/pipeline usage and --output json for programmatic consumption.
Common Workflows
1. Quick Audit (HTML report)
$CRAWLER --url="https://example.com" --output-html-report="/path/to/report.html"
Generates a self-contained interactive HTML audit report with quality scores (0.0-10.0) across Performance, SEO, Security, Accessibility, Best Practices.
2. Full Audit + JSON + Upload
$CRAWLER --url="https://example.com" \
--output-html-report="/path/to/report.html" \
--output-json-file="/path/to/result.json" \
--upload --upload-retention="7d"
3. Offline Clone
$CRAWLER --url="https://example.com" --offline-export-dir="/path/to/offline-site" --disable-javascript
Use --disable-javascript for SPA/React sites to get a browsable static version. Use --allowed-domain-for-external-files="*" to include CDN assets.
4. Markdown Export
Multi-file (browsable):
$CRAWLER --url="https://example.com" --markdown-export-dir="/path/to/md-export"
Single-file (ideal for AI tools):
$CRAWLER --url="https://example.com" --markdown-export-dir="/tmp/md" --markdown-export-single-file="/path/to/site.md" \
--markdown-disable-images --markdown-disable-files
5. Sitemap Generation
$CRAWLER --url="https://example.com" --sitemap-xml-file="/path/to/sitemap" --sitemap-txt-file="/path/to/sitemap"
6. CI/CD Quality Gate
$CRAWLER --url="https://example.com" --ci --ci-min-score="7.0" --ci-max-404="0" --ci-max-5xx="0"
Exit code 10 if thresholds not met. See references/cli-params.md for all --ci-* options.
7. Stress/Load Test
$CRAWLER --url="https://example.com" --workers="20" --max-reqs-per-sec="100" --max-depth="1"
Warning: high worker counts can cause DoS. Use with caution.
8. Single Page Crawl
$CRAWLER --url="https://example.com/about" --single-page --output-json-file="/path/to/result.json"
9. HTML-to-Markdown (local file)
$CRAWLER --html-to-markdown="/path/to/page.html" --html-to-markdown-output="/path/to/page.md"
10. Browse Exported Content
$CRAWLER --serve-markdown="/path/to/md-export" --serve-port="8321"
$CRAWLER --serve-offline="/path/to/offline-site" --serve-port="8321"
Key Parameters Reference
See references/cli-params.md for the complete parameter reference organized by category.
Most-used flags
| Flag | Purpose | Default |
|---|---|---|
--url |
Target URL (required) | - |
--output |
text or json |
text |
--workers |
Concurrent threads | 3 |
--max-reqs-per-sec |
Requests per second limit | 10 |
--max-depth |
Crawl depth (0 = unlimited) | 0 |
--timeout |
Request timeout in seconds | 5 |
--no-color |
Disable colors | off |
--ignore-robots-txt |
Ignore robots.txt | off |
Resource filtering
| Flag | Effect |
|---|---|
--disable-all-assets |
Only crawl pages |
--disable-javascript |
No JS (recommended for offline/SPA) |
--disable-images |
No images |
--disable-styles |
No CSS |
--disable-files |
No downloadable docs |
URL filtering
| Flag | Effect |
|---|---|
--include-regex |
PCRE regex to include URLs |
--ignore-regex |
PCRE regex to skip URLs |
--allowed-domain-for-crawling |
Allow cross-domain crawling |
--allowed-domain-for-external-files |
Allow external asset domains |
Script Helpers
scripts/audit.sh — Quick audit wrapper
Runs a full crawl with HTML report and optional JSON output. See script for usage.
scripts/export-markdown.sh — Markdown export wrapper
Exports a website to markdown (single or multi-file). See script for usage.
Tips
- For modern JS frameworks (Next.js, React, Vue), add
--disable-javascriptwhen doing offline exports - Use
--output jsonfor programmatic processing; JSON goes to STDOUT, progress to STDERR - Use
--extra-columns="Title,Keywords,Description"to add SEO columns - Use
--timezone="Asia/Shanghai"for local timestamps - For large sites, increase
--memory-limit,--max-visited-urls, and--max-queue-length - Use
--resolveto test local/dev servers (like curl --resolve) - HTML reports are self-contained — open in any browser, no server needed
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install siteone-crawler - After installation, invoke the skill by name or use
/siteone-crawler - Provide required inputs per the skill's parameter spec and get structured output
What is Siteone Crawler?
Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze... It is an AI Agent Skill for Claude Code / OpenClaw, with 12 downloads so far.
How do I install Siteone Crawler?
Run "/install siteone-crawler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Siteone Crawler free?
Yes, Siteone Crawler is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Siteone Crawler support?
Siteone Crawler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Siteone Crawler?
It is built and maintained by Tsingliu (@tsingliuwin); the current version is v0.0.1.