Description

Collect all customer reviews from an Amazon product URL or product-reviews URL through a logged-in Chrome session on port 9222, export a 14-column factual wo...

README (SKILL.md)

Amazon Review Workbook

Name: Amazon Review Workbook
Author: aduo6668

Turn an Amazon product or review link into a two-phase delivery workbook.

This skill is designed to be portable: the scripts live inside the skill folder and do not depend on dashcamauto or any other local repo.

Quick Path

If this is the first run on a machine, read references/setup.md.
Run a quick health check:

python scripts/amazon_review_workbook.py doctor --url "\x3Camazon-url>"

Run factual collection:

python scripts/amazon_review_workbook.py intake --url "\x3Camazon-url>" --output-dir "\x3Cworkspace>/amazon-review-output"

If DeepLX is configured and reachable, fill 评论中文版:

python scripts/amazon_review_workbook.py translate --input-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_factual.json" --output-dir "\x3Cworkspace>/amazon-review-output"

Check coverage before deciding whether keyword expansion is worth the extra requests:

python scripts/amazon_review_workbook.py coverage-check --url "\x3Camazon-url>" --db-path "\x3Cworkspace>/amazon-review-output/amazon_review_cache.sqlite3"

Build canonical tags and a lightweight tagging payload:

python scripts/amazon_review_workbook.py taxonomy-bootstrap --input-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_translated.json" --output-dir "\x3Cworkspace>/amazon-review-output"
python scripts/amazon_review_workbook.py prepare-tagging --input-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_translated.json" --output-dir "\x3Cworkspace>/amazon-review-output" --canonical-tags-json "\x3Cworkspace>/amazon-review-output/canonical_tags.json"

taxonomy-bootstrap is only for building a stable canonical vocabulary for the batch. prepare-tagging consumes the full factual or translated JSON and emits a trimmed *_tagging_input.json that contains pending rows only plus cache metadata. Do not use that trimmed file as the merge source.

Read references/tagging-guidelines.md, let the model fill only the pending rows in a separate labels JSON, then merge the labels back into the full base JSON and build the final workbook:

python scripts/amazon_review_workbook.py merge-build --base-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_translated.json" --labels-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_labels.json" --output-dir "\x3Cworkspace>/amazon-review-output" --taxonomy-version "v1" --strict

Workflow

1. Verify prerequisites

Confirm doctor reports a valid asin.
Confirm chrome_debug_ready is true.
If you plan to use translate, confirm deeplx_env_ready is true.
If deeplx_reachable is false, do not block the workflow; let the model fill 评论中文版 during tagging.

If any of these fail, read references/setup.md before continuing.

2. Use the smallest command that fits

For raw review collection only: use collect
For factual extraction plus workbook scaffolding: use intake
For deciding whether a keyword pass is still needed: use coverage-check
For rebuilding the tuned keyword state from historical data: use keyword-autotune
For machine translation of 评论中文版: use translate
For canonical tag sampling: use taxonomy-bootstrap
For cache-aware lightweight model input: use prepare-tagging
For writing the final labeled workbook: use merge-build

Examples:

python scripts/amazon_review_workbook.py collect --url "\x3Camazon-url>" --output-dir "\x3Cworkspace>/amazon-review-output"
python scripts/amazon_review_workbook.py translate --input-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_factual.json" --output-dir "\x3Cworkspace>/amazon-review-output"
python scripts/amazon_review_workbook.py coverage-check --url "\x3Camazon-url>" --db-path "\x3Cworkspace>/amazon-review-output/amazon_review_cache.sqlite3"
python scripts/amazon_review_workbook.py keyword-autotune --output-dir "\x3Cworkspace>/amazon-review-output" --db-path "\x3Cworkspace>/amazon-review-output/amazon_review_cache.sqlite3"
python scripts/amazon_review_workbook.py taxonomy-bootstrap --input-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_translated.json" --output-dir "\x3Cworkspace>/amazon-review-output"
python scripts/amazon_review_workbook.py prepare-tagging --input-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_translated.json" --output-dir "\x3Cworkspace>/amazon-review-output" --canonical-tags-json "\x3Cworkspace>/amazon-review-output/canonical_tags.json"
python scripts/amazon_review_workbook.py merge-build --base-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_review_rows_translated.json" --labels-json "\x3Cworkspace>/amazon-review-output/amazon_\x3Casin>_labels.json" --output-dir "\x3Cworkspace>/amazon-review-output" --taxonomy-version "v1" --strict

3. Keep the workbook stable

The factual and final workbooks always use the 14-column schema in references/output-schema.md.

Do not silently add or remove columns. If a field is unavailable from the page, leave it blank rather than inventing a value.

4. Tag rows only after grounding on the factual file

The model should not invent from the product page alone. Ground semantic tagging on the factual JSON/workbook created by intake or translate.

Keep the two JSON shapes distinct:

*_tagging_input.json from prepare-tagging is the cropped machine prompt payload for the model
--base-json for merge-build must be the full factual/translated record set, not the cropped tagging payload
--labels-json is the model's completed semantic output for the pending rows only

If translate prints translation_mode=model_fallback, fill 评论中文版 in the same tagging pass instead of waiting for DeepLX.

Use references/tagging-guidelines.md when filling:

评论概括
情感倾向
类别分类
标签
重点标记

The preferred fast path is:

taxonomy-bootstrap to build a canonical tag vocabulary for this batch
prepare-tagging to create a minimal pending-row payload
model labeling only for pending rows, written into a separate labels JSON
merge-build to update cache and export the final workbook from the full base JSON

Collection Defaults

intake and collect no longer run keyword expansion implicitly in deep mode. deep now means the 18 combo pass only.
Run coverage-check after intake to compare current rows vs Amazon's visible reviews count before deciding to spend more requests.
Use --keywords only when you explicitly want a keyword pass.
Use --keywords with no values to run the built-in keyword preset for the selected --keyword-profile.
Use --keywords foo bar baz to provide an explicit keyword list.
Default pacing now inserts a 2.5s gap between combos/keywords to reduce rate-limit risk.
Built-in profiles:
- generic: universal consumer-product terms
- electronics: universal terms + common app/setup/hardware terms
- dashcam: electronics profile + recording/night/parking/GPS/Wi-Fi/mount terms
Default keyword reuse policy is successful: keywords that have produced results before are skipped on later runs; recent zero-result keywords are also suppressed for 72h to avoid immediate retries.
If you really want to brute-force rerun every keyword, use --keyword-reuse-scope none.
A tuned state file at \x3Coutput-dir>/keyword_tuning_state.json is now read automatically when present, and refreshed after keyword runs so the skill gradually reorders towards higher-yield terms.
keyword-autotune can also ingest old keyword-run JSON reports via --report-glob to seed the tuned state from historical experiments.

Failure Boundaries

Do not claim success if any of these is true:

The script did not reach a real review page.
The expected XLSX/CSV for the current phase was not generated.
Review links, review time, or helpful votes were guessed rather than extracted.
The model tagged rows without first grounding on the factual JSON/workbook.
The cropped *_tagging_input.json was used as --base-json for merge-build.
The model re-labeled rows that were already cached for the same taxonomy version.
The workflow still claims a 13-column contract after 评论用户名 was added as a real output column.

Resources

references/setup.md: first-run machine setup and environment requirements
references/output-schema.md: fixed 14-column workbook contract
references/tagging-guidelines.md: semantic labeling rules after factual collection
scripts/amazon_review_workbook.py: portable CLI for doctor/collect/intake/coverage-check/keyword-autotune/translate/taxonomy-bootstrap/prepare-tagging/merge-build
scripts/review_delivery_schema.py: workbook schema, normalization, and XLSX/CSV writer
scripts/deeplx_translate.py: optional DeepLX translation helper
scripts/label_workflow.py: cache, heuristics, bootstrap, and merge logic for faster labeling

Usage Guidance

This skill appears to do what it claims: scrape Amazon reviews via a locally running, logged-in Chrome session and produce deliverable spreadsheets. Before using it: 1) Understand that you must launch Chrome with remote debugging and a profile logged into Amazon — the script can access that browser session (cookies, authenticated pages). Only run it on a machine/profile you trust to be used for scraping. 2) If you enable automatic translation, you must set DEEPLX_API_URL (and optionally DEEPLX_API_KEY); translations will be POSTed to that URL, so only configure trusted endpoints and avoid committing real .env files with secrets into git. 3) Install the documented Python dependencies and run unit tests if desired. 4) The registry metadata did not declare the optional DeepLX env vars—treat that as a minor metadata inconsistency and review the deeplx_translate.py file and any .env before use. If you want extra assurance, inspect/grep the bundled scripts for network calls (requests, websocket usage) and run the 'doctor' command on a harmless product URL first to observe behavior.

Capability Analysis

Type: OpenClaw Skill Name: amazon-review-workbook Version: 1.0.3 The amazon-review-workbook skill is a comprehensive tool for scraping and analyzing Amazon reviews using Chrome's Remote Debugging Protocol (CDP). It implements a multi-stage workflow including data collection, translation via DeepLX, and AI-assisted semantic tagging. While it possesses high-privilege capabilities—such as executing JavaScript in a logged-in browser session and reading environment variables for API keys—these are strictly aligned with its stated purpose. The code in scripts/amazon_review_workbook.py and scripts/deeplx_translate.py is well-structured, utilizes local SQLite caching to reduce redundant network traffic, and lacks any indicators of malicious intent like unauthorized data exfiltration or persistence mechanisms.

Capability Assessment

✓ Purpose & Capability

Name/description match the included scripts: the code scrapes Amazon review pages through a Chrome remote-debugging session (localhost:9222), builds factual JSON/workbooks, offers optional DeepLX translation, and provides tagging/merge tooling. Nothing in the repository requests unrelated cloud credentials or surprising capabilities.

ℹ Instruction Scope

SKILL.md instructs the agent/operator to run the included Python CLI scripts and to launch Chrome with --remote-debugging-port=9222 using a profile logged into Amazon. This is coherent with the scraping use case, but connecting to a logged-in Chrome profile exposes that browser session (cookies, authenticated views) to the script via the Chrome DevTools Protocol — the user should understand that the script will access the pages and session state available to that profile.

✓ Install Mechanism

There is no automated install spec; this is an instruction-only skill with bundled Python scripts. Dependencies are documented (pandas, openpyxl, requests, websocket-client) and must be installed by the operator. No remote binary downloads or installers are present.

ℹ Credentials

Registry metadata lists no required env vars, but the code supports optional DEEPLX_API_URL and DEEPLX_API_KEY (read from environment or .env files) for translation. The scripts will read those specific values and will POST review text to the configured DeepLX host if set. That behavior is expected for optional translation, but the metadata omission is an inconsistency and users must avoid putting sensitive secrets into repository-tracked .env files and should trust any external translation endpoint they configure.

✓ Persistence & Privilege

The skill does not request permanent/always-on inclusion and does not modify other skills. It writes output artifacts and an SQLite cache under the chosen output directory (default amazon-review-output). Those writable files are normal for this workflow.

Version History

v1.0.3

amazon-review-workbook 1.0.3

v1.0.2

amazon-review-workbook 1.0.1 - 提供 Amazon 评论抓取、翻译、标注与 14 列交付工作簿的一体化流程：序号、评论用户名、国家、星级评分、评论原文、评论中文版、评论概括、情感倾向、类别分类、标签、重点标记、评论链接网址、评论时间、评论点赞数 - 新增 coverage-check、关键词 profile 与 keyword-autotune，减少重复搜索并提升抓取效率 - 补齐中文 README、发布说明与公开发布所需的仓库整理

v1.0.1

amazon-review-workbook 1.0.1 - 提供 Amazon 评论抓取、翻译、标注与 14 列交付工作簿的一体化流程：序号、评论用户名、国家、星级评分、评论原文、评论中文版、评论概括、情感倾向、类别分类、标签、重点标记、评论链接网址、评论时间、评论点赞数 - 新增 coverage-check、关键词 profile 与 keyword-autotune，减少重复搜索并提升抓取效率 - 补齐中文 README、发布说明与公开发布所需的仓库整理

Metadata

Slug amazon-review-workbook

Version 1.0.3

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Amazon Review Workbook?

Collect all customer reviews from an Amazon product URL or product-reviews URL through a logged-in Chrome session on port 9222, export a 14-column factual wo... It is an AI Agent Skill for Claude Code / OpenClaw, with 110 downloads so far.

How do I install Amazon Review Workbook?

Run "/install amazon-review-workbook" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Amazon Review Workbook free?

Yes, Amazon Review Workbook is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Amazon Review Workbook support?

Amazon Review Workbook is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Amazon Review Workbook?

It is built and maintained by aduo6668 (@aduo6668); the current version is v1.0.3.

More Skills

Amazon Review Workbook

Amazon Review Workbook

Quick Path

Workflow

1. Verify prerequisites

2. Use the smallest command that fits

3. Keep the workbook stable

4. Tag rows only after grounding on the factual file

Collection Defaults

Failure Boundaries

Resources

What is Amazon Review Workbook?

How do I install Amazon Review Workbook?

Is Amazon Review Workbook free?

Which platforms does Amazon Review Workbook support?

Who created Amazon Review Workbook?

💬 Comments