← 返回 Skills 市场
jiafar

Amazon Scraper

作者 jiafar · GitHub ↗ · v3.4.1 · MIT-0
cross-platform ⚠ suspicious
2225
总下载
6
收藏
10
当前安装
26
版本数
在 OpenClaw 中安装
/install amazon-scraper
功能描述
High-performance containerized Amazon scraper (Docker + playwright-extra + Stealth plugin). Bypasses Amazon headless detection. Supports Amazon BSR, search r...
使用说明 (SKILL.md)

Amazon Scraper

Docker容器化爬虫,基于 playwright-extra + Stealth 插件,专为绕过亚马逊反爬检测优化,同时支持通用动态网页爬取。

⚙️ 系统要求

  • Docker Engine 20.10+(必须已安装并运行)
  • 磁盘空间:~2GB(镜像 + Playwright 浏览器二进制文件)
  • 内存:建议 2GB+(Playwright 运行时需要)

快速开始

首次使用:在 skill 目录下执行一键构建脚本:

bash scripts/setup.sh

脚本会自动完成:构建 amazon-scraper 镜像 + 创建 ~/scrapes 输出目录。

模式选择规则

1. Amazon模式 (amazon_handler.js)

自动触发条件: URL包含 amazon.com,或用户提到亚马逊/Amazon/ASIN/BSR/选品/竞品/畅销榜/类目分析等关键词

根据URL自动识别页面类型:

URL特征 页面类型 可获取字段
/gp/bestsellers/ 畅销榜 rank, title, asin, price, rating, reviews, image, url
/zg/new-releases/ 新品榜 同上
/zg/movers-and-shakers/ 飙升榜 同上
/s?k=/s/ 搜索结果 title, asin, price, rating, reviews, image, url, boughtPastMonth, sponsored
/dp//gp/product/ 产品详情 title, asin, price, rating, reviews, brand, bsr, boughtPastMonth, dateFirstAvailable, category, bullets, details, image

⚠️ 重要规则:

  • Best Sellers页面没有月销量(boughtPastMonth)数据 — 亚马逊不在榜单页显示此信息
  • 要获取月销量,必须用搜索页(/s?k=关键词)或产品详情页(/dp/ASIN)
  • 如果用户同时需要排名+月销量,建议:先爬Best Sellers拿排名,再用搜索页补月销
  • BSR URL 必须使用 /gp/bestsellers//zgbs/ 会返回 Page Not Found
# 畅销榜(有排名,无月销)
docker run -t --rm amazon-scraper node assets/amazon_handler.js "https://www.amazon.com/gp/bestsellers/electronics"

# 搜索结果(有月销,无排名)
docker run -t --rm amazon-scraper node assets/amazon_handler.js "https://www.amazon.com/s?k=feather+duster"

# 产品详情(最全字段:BSR、品牌、卖点、月销)
docker run -t --rm amazon-scraper node assets/amazon_handler.js "https://www.amazon.com/dp/B001TQ6IHS"

# 多页爬取
docker run -t --rm amazon-scraper node assets/amazon_handler.js "URL" --pages 2

# 保存结果到文件
docker run -t --rm -v ~/scrapes:/data amazon-scraper node assets/amazon_handler.js "URL" --output result.json

# 用自己的代理覆盖内置配置
docker run -t --rm -e AMAZON_PROXIES="http://user:***@host:8001,..." amazon-scraper node assets/amazon_handler.js "URL"

输出格式: JSON

{
  "status": "SUCCESS",
  "type": "bestsellers|search|product-detail",
  "category": "品类名",
  "totalProducts": 30,
  "scrapedAt": "ISO时间",
  "products": [
    {
      "rank": 1,
      "title": "产品名",
      "asin": "B001TQ6IHS",
      "price": 9.94,
      "priceStr": "$9.94",
      "rating": 4.6,
      "reviews": 20547,
      "boughtPastMonth": "1K+",
      "image": "https://...",
      "url": "https://..."
    }
  ]
}

2. 通用模式 (main_handler.js)

触发条件: 非Amazon的URL,或用户提到爬取/抓取任意网页内容

  • 基于和 Amazon 模式相同的 playwright-extra + Stealth 架构
  • 内置代理已预配置,无需额外设置
  • 支持 --output 文件保存
  • 可通过环境变量覆盖内置代理
  • Playwright打开页面,等待JS加载完成
  • 提取 document.body.innerText(纯文本,去广告噪音)
  • 输出上限10000字符
  • 输出: {status:"SUCCESS", type:"GENERIC", title, data}
# 通用爬取(代理已内置)
docker run -t --rm amazon-scraper node assets/main_handler.js "https://任意网址"

# 保存文件
docker run -t --rm -v ~/scrapes:/data \
  amazon-scraper node assets/main_handler.js "https://任意网址" --output page.json

Agent调用决策树

用户给了URL?
├─ 包含 amazon.com → 用 amazon_handler.js
│   ├─ 需要月销量? → 建议用搜索URL(/s?k=) 或详情页(/dp/)
│   └─ 需要排名? → 用畅销榜URL(/gp/bestsellers/)
└─ 其他网站 → 用 main_handler.js (通用模式)

用户没给URL,只说了需求?
├─ "爬亚马逊XX品类Top" / "XX类目排行" / "XX畅销榜" → 构造 https://www.amazon.com/gp/bestsellers/品类
├─ "搜亚马逊XX" / "XX关键词搜索" / "找XX产品" → 构造 https://www.amazon.com/s?k=关键词
├─ "分析某个ASIN" / "看看这个产品" / "XX的详情" → 构造 https://www.amazon.com/dp/ASIN
├─ "XX的月销量" / "XX卖了多少" / "XX销量怎么样" → 用搜索页或详情页(有boughtPastMonth)
├─ "竞品分析" / "竞品调研" / "对手在卖什么" → 先搜索再逐个爬详情
├─ "选品" / "什么好卖" / "品类机会" / "市场调研" → Best Sellers + 搜索结合
└─ 其他网页 → 先web_search找到URL,再用通用模式爬

常见用户意图 → 操作映射

用户说 操作
"帮我看看亚马逊XX品类" 爬 /gp/bestsellers/品类 畅销榜
"XX在亚马逊卖得怎么样" 搜索 /s?k=XX 看月销
"分析一下这个ASIN: BXXXXXXXXX" 爬 /dp/ASIN 详情页
"XX品类有什么机会" 畅销榜 + 搜索 综合分析
"帮我爬这个链接" 判断URL类型,选对应handler
"帮我抓XX网站的内容" 通用模式
"搜一下XX的竞品" 搜索页爬取 + 分析
"XX月销多少" / "XX一个月卖多少" 搜索页或详情页
"帮我看看top 100" / "热门产品" Best Sellers畅销榜
"新品有哪些" / "最近上了什么新品" /zg/new-releases/
"什么产品涨得快" / "飙升榜" /zg/movers-and-shakers/

代理配置

本 skill 已内置 5 个轮询代理,无需额外配置即可直接使用。

如需覆盖内置代理,可通过环境变量注入自己的代理:

变量 用途 格式
AMAZON_PROXY 单代理 http://user:pass@host:port
AMAZON_PROXIES 多代理轮询 http://u:p@h1:8001,http://u:p@h2:8002,...
  • 轮询:多页爬取时每页自动切换下一个代理
  • 故障切换:单页失败时自动重试列表中下一个代理
  • 代理配置存放于 config/proxies.json,可直接修改文件更新代理列表

反爬能力

  • playwright-extra + puppeteer-extra-plugin-stealth — 自动修改 navigator、WebGL、Canvas 等 headless 特征
  • Chrome 123 UserAgent — 模拟真实 Mac Chrome 浏览器
  • 完整浏览器指纹 headers — Accept-Encoding: identity, Sec-Ch-Ua, Sec-Fetch-* 等
  • 1920x1080 viewport — 避免移动端/小屏检测
  • 自动滚动加载懒加载内容
  • Docker沙箱隔离,每次启动全新浏览器上下文
  • 代理轮询分散请求源 IP

局限

  • 通用模式输出上限10000字符
  • Amazon单页最多约30-50个产品
  • 不支持需要登录的页面
  • Docker容器启动有~15秒冷启动时间(含 stealth 插件初始化)
安全使用建议
This skill appears to implement the advertised Amazon scraping functionality, but it ships with built-in proxy credentials in config/proxies.json (plain text URLs with username/password to disp.oxylabs.io). Before installing or running: 1) Do not rely on or expose embedded proxy credentials — they may belong to someone else and could lead to unknown billing, abuse, or account suspension. Replace or remove config/proxies.json and supply your own proxies via AMAZON_PROXIES or edit the file. 2) Confirm you understand legal and Amazon Terms of Service implications of scraping. 3) Verify the package origin/author (homepage is missing) and consider running the image in an isolated environment (VM) if you proceed. 4) Be aware building the image will run npm install and download Playwright browser binaries. 5) Fix the metadata inconsistency (Docker is required) or confirm platform requirements before use.
功能分析
Type: OpenClaw Skill Name: amazon-scraper Version: 3.4.1 The skill is a functional Amazon and generic web scraper utilizing Playwright, Docker, and the stealth plugin to bypass bot detection. The bundle includes a setup script (scripts/setup.sh), a Dockerfile, and JavaScript handlers (amazon_handler.js, main_handler.js) that perform scraping and save results locally. While it contains hardcoded proxy credentials in config/proxies.json, these appear to be provided for immediate utility rather than malicious intent, and the code logic is transparently aligned with the stated purpose of data collection without evidence of exfiltration or unauthorized access.
能力评估
Purpose & Capability
The name, description, SKILL.md, and handler scripts consistently implement a Dockerized Playwright-based Amazon scraper. The skill legitimately needs Docker and Playwright/browser binaries (the SKILL.md and scripts require Docker). Minor metadata mismatch: package.json declares a docker requirement and requiredBinaries: ["docker"], while the registry metadata presented earlier listed no required binaries — this is an inconsistency but not itself malicious.
Instruction Scope
SKILL.md and the handlers instruct the agent to build and run a Docker image that launches Playwright to visit URLs and extract page content. That is within the declared purpose. However, the runtime explicitly loads config/proxies.json (included in the bundle) and will use embedded proxies by default; SKILL.md advertises 'built-in proxies' and instructs use of those. Shipping and using embedded third‑party proxy credentials expands the scope to routing traffic through an externally owned service and is not necessary to the stated scraping logic — this is a notable scope/privilege concern.
Install Mechanism
There is no platform install spec in the registry; instead the bundle includes a setup.sh that copies Dockerfile.sh and runs docker build. Dockerfile.sh uses an official Microsoft Playwright base image and runs npm install and npx playwright install inside the image. Pulling npm packages and browser binaries is expected for Playwright but does cause network fetches during image build. Overall install steps are typical for a containerized scraper, not an unusual risk vector, but building the image will download remote packages and browser artifacts.
Credentials
The skill declares no required environment variables in the registry metadata, yet the runtime supports overriding built-in proxies via AMAZON_PROXY / AMAZON_PROXIES. More importantly, the bundle contains config/proxies.json with plaintext proxy URLs and embedded credentials for disp.oxylabs.io. Including active third‑party credentials inside the distributed package is disproportionate: secrets are present that were not declared, and they grant outbound proxy access billed/controlled by some other account. This raises operational, legal, and trust concerns.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide agent settings. The setup script creates ~/scrapes (local output directory) and the container writes to /data if mounted — these are limited, expected behaviors for a scraper.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install amazon-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /amazon-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v3.4.1
- Added: config/proxies.json with built-in proxy list\n- Proxy loaded from config file, no env var needed
v3.4.0
- Added: proxy support (AMAZON_PROXY / AMAZON_PROXIES env vars)\n- Added: --output flag to save results to JSON file\n- Added: /data volume mount for persistent output\n- Changed: Docker image name from clawd-crawlee to amazon-scraper\n- Changed: test script URL updated to /gp/bestsellers/electronics\n- Updated: setup.sh with proxy usage examples and output instructions
v3.3.4
- Fixed: stealth plugin corrected to puppeteer-extra-plugin-stealth@^2.11.2\n- playwright-extra-plugin-stealth only has 0.0.1 on npm; correct package is puppeteer-extra-plugin-stealth\n- Updated require() in amazon_handler.js accordingly
v3.3.3
- Fixed: removed crawlee from dependencies (no longer used)\n- Fixed: playwright-extra-plugin-stealth version corrected to ^2.11.2 (was ^0.0.1)
v3.3.2
- Fixed: SKILL.md 反爬能力章节更新为 stealth 插件能力描述\n- Fixed: 移除 crawlee 残留参数 maxRetries,冷启动时间更新为 15 秒
v3.3.1
- Fixed: SKILL.md description updated from Crawlee to playwright-extra + Stealth\n- Fixed: package.json description updated to reflect stealth capability\n- No code changes from 3.3.0
v3.3.0
- Changed: replaced crawlee PlaywrightCrawler with playwright-extra + stealth plugin\n- Added: stealth mode to bypass Amazon headless detection\n- Added: viewport 1920x1080 and Chrome 120 userAgent\n- Unchanged: all data extraction logic (bestsellers/search/product-detail/generic)\n- Unchanged: output format and ASIN deduplication\n- Updated: package.json dependencies (playwright-extra + playwright-extra-plugin-stealth)\n- Updated: Dockerfile CMD now points to amazon_handler.js
v3.2.0
- Fixed: Dockerfile renamed to Dockerfile.sh so clawhub server bundles it correctly\n- clawhub server filters out files without text extensions; .sh extension bypasses this\n- setup.sh now copies Dockerfile.sh -> Dockerfile before docker build\n- Users can now run setup.sh after install and docker build will work
v3.1.9
- Correct release (replaces 3.1.6/3.1.7/3.1.8 which had wrong Dockerfile)\n- Dockerfile: playwright:v1.40.0-jammy, full chromium install\n- setup.sh: simple one-click docker build\n- All files match the verified reference bundle
v3.1.8
- Synced Dockerfile and setup.sh to reference version\n- Dockerfile: playwright:v1.40.0-jammy, full install with chromium\n- setup.sh: simplified one-click docker build
v3.1.7
- Fixed: setup.sh now auto-generates Dockerfile if missing (clawhub cannot bundle files without extensions)\n- setup.sh is now fully self-contained: no dependency on Dockerfile being present in the zip
v3.1.6
- Stable release: Dockerfile confirmed included in release bundle\n- All previous fixes from v3.1.x consolidated\n- Ready for production use
v3.1.5
- Fixed: Added .clawhubignore to ensure Dockerfile is included in release bundle\n- Fixed: .dockerignore cleaned up (removed SKILL.md/scripts exclusions that don't belong there)\n- setup.sh one-click build now works correctly after download
v3.1.4
- Fixed: package.json main entry changed from main_handler.js to amazon_handler.js
v3.1.3
- Removed: youtube_handler.js (deleted from assets)\n- Fixed: package.json description and keywords no longer mention YouTube\n- Fixed: SKILL.md subtitle and generic mode trigger no longer mention YouTube/social media\n- Skill is now 100% Amazon-focused with generic fallback only
v3.1.2
- Removed: YouTube mode from main_handler.js (out of scope)\n- Removed: YouTube/TikTok/Twitter/X trigger keywords from SKILL.md\n- Fixed: main_handler.js comment header renamed from 'Deep-Scraper' to 'Amazon-Scraper'\n- Fixed: Dockerfile CMD now points to amazon_handler.js as primary entry\n- Fixed: SKILL.md frontmatter description now leads with Amazon use case\n- Fixed: Decision tree cleaned up (no more YouTube branch)
v3.1.1
- Fixed: package.json name corrected from 'deep-scraper' to 'amazon-scraper'\n- Fixed: description updated to highlight Amazon as primary use case\n- Fixed: added requiredBinaries docker to registry metadata\n- Added: scripts/setup.sh for one-click Docker image build\n- Added: .dockerignore to reduce build context\n- Added: System requirements section in SKILL.md\n- Updated: SKILL.md installation steps to use setup.sh\n- Bumped version to 3.1.1
v3.1.0
amazon-scraper 3.1.0 - Added batch scraping support with the new `scripts/batch-scrape.sh` script, enabling efficient multi-ASIN processing and file-based result output. - Expanded documentation: clarified batch and file output workflows, added best practices for large-scale/batch analysis, and described browser-based manual extraction methods. - Updated output management section: strongly recommends direct-to-file output to avoid stdout mixing in multi-process scenarios. - Enhanced guidance on DOM selectors and inline JS scripts for browser-assisted extraction. - Minor optimizations and clarifications throughout documentation for a streamlined user experience.
v3.0.6
Add recommended volume mount pattern for writing output to local files
v3.0.5
Fix: upgrade Playwright base image from v1.52.0 to v1.59.1 to fix Chromium executable error
元数据
Slug amazon-scraper
版本 3.4.1
许可证 MIT-0
累计安装 10
当前安装数 10
历史版本数 26
常见问题

Amazon Scraper 是什么?

High-performance containerized Amazon scraper (Docker + playwright-extra + Stealth plugin). Bypasses Amazon headless detection. Supports Amazon BSR, search r... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 2225 次。

如何安装 Amazon Scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install amazon-scraper」即可一键安装,无需额外配置。

Amazon Scraper 是免费的吗?

是的,Amazon Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Amazon Scraper 支持哪些平台?

Amazon Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Amazon Scraper?

由 jiafar(@jiafar)开发并维护,当前版本 v3.4.1。

💬 留言讨论