← Back to Skills Marketplace
tsingliuwin

Siteone Crawler

by Tsingliu · GitHub ↗ · v0.0.1 · MIT-0
cross-platform ⚠ suspicious
12
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install siteone-crawler
Description
Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze...
README (SKILL.md)

SiteOne Crawler

Cross-platform website crawler/analyzer written in Rust.

Setup (run once)

Before first use, ensure the binary exists. If not found, install it automatically:

  1. Check if binary exists at the paths below (in order of priority):
    • $HOME/.siteone-crawler/siteone-crawler
    • Any siteone-crawler found in $PATH (via which siteone-crawler)
  2. If neither exists, download the latest release from GitHub:
    INSTALL_DIR="$HOME/.siteone-crawler"
    mkdir -p "$INSTALL_DIR"
    # Detect OS/arch
    OS=$(uname -s | tr '[:upper:]' '[:lower:]')
    ARCH=$(uname -m)
    case "$ARCH" in x86_64) ARCH="x64" ;; aarch64|arm64) ARCH="arm64" ;; esac
    # Get latest release URL from GitHub API
    RELEASE_URL=$(curl -sL https://api.github.com/repos/janreges/siteone-crawler/releases/latest \
      | grep -oP "browser_download_url.*?${OS}-${ARCH}\.zip" | head -1 | sed 's/browser_download_url": "//')
    curl -sL "$RELEASE_URL" -o /tmp/siteone-crawler.zip \
      && unzip -o /tmp/siteone-crawler.zip -d /tmp/siteone-crawler \
      && mv /tmp/siteone-crawler/siteone-crawler "$INSTALL_DIR/" \
      && chmod +x "$INSTALL_DIR/siteone-crawler" \
      && rm -rf /tmp/siteone-crawler /tmp/siteone-crawler.zip
    
  3. After installation, set CRAWLER to the resolved path and verify with $CRAWLER --version.

Binary

CRAWLER="$HOME/.siteone-crawler/siteone-crawler"

If the above path doesn't exist, fall back to $(which siteone-crawler) after running Setup.

Always use the resolved path. The binary outputs colored text to terminal; use --no-color for script/pipeline usage and --output json for programmatic consumption.

Common Workflows

1. Quick Audit (HTML report)

$CRAWLER --url="https://example.com" --output-html-report="/path/to/report.html"

Generates a self-contained interactive HTML audit report with quality scores (0.0-10.0) across Performance, SEO, Security, Accessibility, Best Practices.

2. Full Audit + JSON + Upload

$CRAWLER --url="https://example.com" \
  --output-html-report="/path/to/report.html" \
  --output-json-file="/path/to/result.json" \
  --upload --upload-retention="7d"

3. Offline Clone

$CRAWLER --url="https://example.com" --offline-export-dir="/path/to/offline-site" --disable-javascript

Use --disable-javascript for SPA/React sites to get a browsable static version. Use --allowed-domain-for-external-files="*" to include CDN assets.

4. Markdown Export

Multi-file (browsable):

$CRAWLER --url="https://example.com" --markdown-export-dir="/path/to/md-export"

Single-file (ideal for AI tools):

$CRAWLER --url="https://example.com" --markdown-export-dir="/tmp/md" --markdown-export-single-file="/path/to/site.md" \
  --markdown-disable-images --markdown-disable-files

5. Sitemap Generation

$CRAWLER --url="https://example.com" --sitemap-xml-file="/path/to/sitemap" --sitemap-txt-file="/path/to/sitemap"

6. CI/CD Quality Gate

$CRAWLER --url="https://example.com" --ci --ci-min-score="7.0" --ci-max-404="0" --ci-max-5xx="0"

Exit code 10 if thresholds not met. See references/cli-params.md for all --ci-* options.

7. Stress/Load Test

$CRAWLER --url="https://example.com" --workers="20" --max-reqs-per-sec="100" --max-depth="1"

Warning: high worker counts can cause DoS. Use with caution.

8. Single Page Crawl

$CRAWLER --url="https://example.com/about" --single-page --output-json-file="/path/to/result.json"

9. HTML-to-Markdown (local file)

$CRAWLER --html-to-markdown="/path/to/page.html" --html-to-markdown-output="/path/to/page.md"

10. Browse Exported Content

$CRAWLER --serve-markdown="/path/to/md-export" --serve-port="8321"
$CRAWLER --serve-offline="/path/to/offline-site" --serve-port="8321"

Key Parameters Reference

See references/cli-params.md for the complete parameter reference organized by category.

Most-used flags

Flag Purpose Default
--url Target URL (required) -
--output text or json text
--workers Concurrent threads 3
--max-reqs-per-sec Requests per second limit 10
--max-depth Crawl depth (0 = unlimited) 0
--timeout Request timeout in seconds 5
--no-color Disable colors off
--ignore-robots-txt Ignore robots.txt off

Resource filtering

Flag Effect
--disable-all-assets Only crawl pages
--disable-javascript No JS (recommended for offline/SPA)
--disable-images No images
--disable-styles No CSS
--disable-files No downloadable docs

URL filtering

Flag Effect
--include-regex PCRE regex to include URLs
--ignore-regex PCRE regex to skip URLs
--allowed-domain-for-crawling Allow cross-domain crawling
--allowed-domain-for-external-files Allow external asset domains

Script Helpers

scripts/audit.sh — Quick audit wrapper

Runs a full crawl with HTML report and optional JSON output. See script for usage.

scripts/export-markdown.sh — Markdown export wrapper

Exports a website to markdown (single or multi-file). See script for usage.

Tips

  • For modern JS frameworks (Next.js, React, Vue), add --disable-javascript when doing offline exports
  • Use --output json for programmatic processing; JSON goes to STDOUT, progress to STDERR
  • Use --extra-columns="Title,Keywords,Description" to add SEO columns
  • Use --timezone="Asia/Shanghai" for local timestamps
  • For large sites, increase --memory-limit, --max-visited-urls, and --max-queue-length
  • Use --resolve to test local/dev servers (like curl --resolve)
  • HTML reports are self-contained — open in any browser, no server needed
Usage Guidance
Before installing, verify the SiteOne Crawler release source and prefer a pinned, checksum-verified binary. Run crawls and load tests only on sites you own or have permission to test. Keep reports local for private sites unless you intentionally enable upload, and avoid passing cookies, passwords, or tokens unless necessary.
Capability Analysis
Type: OpenClaw Skill Name: siteone-crawler Version: 0.0.1 The skill bundle includes a setup script in SKILL.md that downloads and executes a binary from a remote GitHub repository (janreges/siteone-crawler), a high-risk installation pattern. It also features an optional report upload function to an external endpoint (crawler.siteone.io) and supports high-concurrency crawling capable of being used for DoS attacks. While these features are documented and align with the tool's purpose as a website auditor, the combination of automated remote binary execution and external data transmission warrants a suspicious classification.
Capability Assessment
Purpose & Capability
Website crawling, auditing, offline cloning, markdown export, sitemap generation, and local report generation are coherent with the stated purpose; stress/load testing and report upload are more sensitive but disclosed.
Instruction Scope
The skill is user-invocable and its examples are task-oriented. It does not show prompt hijacking, but users should ensure upload and load-test workflows are only run with explicit intent.
Install Mechanism
There is no formal install spec, while SKILL.md instructs downloading the latest GitHub release, unzipping it, moving it into the home directory, and making it executable without a pinned version or checksum.
Credentials
Network crawling and writing reports/exports to user-specified directories are expected for this tool. Optional third-party upload and high-rate crawling can affect privacy or third-party systems.
Persistence & Privilege
The setup persists a crawler binary under ~/.siteone-crawler and helper scripts write reports/exports, but the artifacts do not show root installation, background daemons, or hidden persistence.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install siteone-crawler
  3. After installation, invoke the skill by name or use /siteone-crawler
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.0.1
- Initial release of SiteOne Crawler skill for website crawling, auditing, cloning, and markdown export. - Supports actions such as SEO/security/performance/accessibility audit (HTML or JSON report), offline site cloning, markdown export (single/multi-file), sitemap generation, and CI/CD quality gate checks. - Automatically installs or resolves the crawler Rust binary if not found. - Includes common usage workflows and command examples for quick setup. - Key CLI flags, advanced filtering, and tips for different site scenarios provided in documentation.
Metadata
Slug siteone-crawler
Version 0.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Siteone Crawler?

Website crawling, auditing, offline cloning, and markdown export using SiteOne Crawler (Rust). Trigger when the user asks to: crawl a website, audit/analyze... It is an AI Agent Skill for Claude Code / OpenClaw, with 12 downloads so far.

How do I install Siteone Crawler?

Run "/install siteone-crawler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Siteone Crawler free?

Yes, Siteone Crawler is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Siteone Crawler support?

Siteone Crawler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Siteone Crawler?

It is built and maintained by Tsingliu (@tsingliuwin); the current version is v0.0.1.

💬 Comments