功能描述

Scrape e-commerce homepages from multiple websites in a target market, handle popups automatically, capture screenshots and HTML, extract product data, and g...

使用说明 (SKILL.md)

E-commerce Market Analyzer

Name: ecommerce-market-analyzer-skill
Author: nepp-an

Automated workflow for scraping e-commerce websites, handling popups, extracting product data, and generating comprehensive market analysis reports.

Workflow Overview

This skill follows a 4-step workflow:

Setup & Scraping - Run Playwright scraper to capture homepages
Visual Analysis - Analyze screenshots to identify product categories
Data Extraction - Parse HTML to extract specific products and prices
Report Generation - Create comprehensive market analysis report

User provides website list
         ↓
Step 1: Run scraper (handles popups automatically)
         ↓
Step 2: Analyze screenshots visually
         ↓
Step 3: Extract structured data from HTML
         ↓
Step 4: Generate final report

Step 1: Setup & Scraping

Quick Start

When user provides a list of e-commerce websites, immediately run the scraper:

# Create output directory
mkdir -p screenshots_clean

# Run the scraper
uv run python scripts/scrape_websites.py

Customizing the Website List

Edit scripts/scrape_websites.py and update the WEBSITES list:

WEBSITES = [
    "amazon.de",
    "ebay.de",
    "otto.de",
    # Add more websites...
]

Key Features

The scraper automatically:

Handles cookie consent popups (German, English, universal selectors)
Handles region/language selection dialogs
Captures full-page screenshots (1920x1080)
Saves HTML source code
Uses German locale settings (or customize for other markets)
Waits for page stabilization

Important: The script uses popup patterns from references/popup_patterns.md. Consult this file if dealing with new popup types.

Expected Output

After running, you'll have:

screenshots_clean/*.png - Full-page screenshots
screenshots_clean/*.html - HTML source files
Console output with success/failure summary

Success rate target: 85-95%

Common failures:

Anti-bot protection (requires manual intervention)
HTTP/2 protocol errors (some sites block automation)
Timeout on slow-loading sites

Step 2: Visual Analysis

Read Screenshots

After scraping, read the screenshot files to visually identify:

Product categories
Featured products
Promotional items
Visual design patterns

Example approach:

from pathlib import Path

screenshot_dir = Path("screenshots_clean")
screenshots = list(screenshot_dir.glob("*.png"))

# Read screenshots using the Read tool
for screenshot in screenshots[:5]:  # Start with 5 sites
    # Use Read tool to view image
    # Note product categories and featured items

What to Look For

Product Categories:

Clothing & Fashion (Bekleidung)
Electronics (Elektronik)
Home & Furniture (Möbel & Wohnen)
Food & Groceries (Lebensmittel)
Books & Media (Bücher)
Beauty & Personal Care (Beauty & Pflege)
Sports & Outdoor (Sport)
Toys & Baby (Spielzeug & Baby)

Featured Products:

Homepage banners
Promotional sections
"Deal of the day" items
New arrivals

Take notes on recurring patterns across multiple sites - these indicate market trends.

Step 3: Data Extraction

Strategy Selection

Choose extraction strategy based on site structure. See references/html_parsing_patterns.md for complete patterns.

Quick decision tree:

Try JSON-LD schema extraction (best for structured data)
Fall back to data attribute extraction
Fall back to class-based extraction
Last resort: keyword matching

Example: Extract from REWE.de

import re
from pathlib import Path

html_file = Path("screenshots_clean/rewe.de.html")
content = html_file.read_text(encoding='utf-8')

# REWE-specific patterns
title_pattern = r'data-offer-title="([^"]+)"'
price_pattern = r'\x3Cdiv class="cor-offer-price__tag-price">([^\x3C]+)\x3C/div>'

titles = re.findall(title_pattern, content)
prices = re.findall(price_pattern, content)

for i, title in enumerate(titles[:10]):
    price = prices[i] if i \x3C len(prices) else "N/A"
    print(f"{title}: {price}€")

Platform-Specific Parsing

Each e-commerce platform has unique HTML structure. Consult references/html_parsing_patterns.md for:

Amazon.de patterns
eBay.de patterns
Otto.de patterns
Zalando/AboutYou patterns
REWE/Lidl supermarket patterns
And more...

Price Normalization

Always normalize prices:

def normalize_price(price_str):
    """Convert German format (1.234,56€) to float"""
    price_str = price_str.replace('€', '').replace('EUR', '').strip()
    if ',' in price_str and '.' in price_str:
        price_str = price_str.replace('.', '').replace(',', '.')
    elif ',' in price_str:
        price_str = price_str.replace(',', '.')
    try:
        return float(price_str)
    except:
        return None

Handling Large Files

For HTML files >25k tokens:

# Use grep to search for specific patterns
grep -o 'data-product-name="[^"]*"' amazon.de.html | head -20

# Or extract specific sections
grep -A 5 'product-title' ebay.de.html

Extraction Best Practices

Try multiple patterns - Start with JSON-LD, fall back as needed
Validate extractions - Check for reasonable length (10-100 chars)
Remove duplicates - Use sets to track seen products
Limit results - Cap at 10-20 products per site
Handle encoding - Always use encoding='utf-8'

Step 4: Report Generation

Use the Report Template

Copy and customize assets/report_template.md:

cp assets/report_template.md final_report.md

Report Structure

The template includes these sections:

Executive Summary - Key findings
Top Product Categories - Ranked list with percentages
Verified Product Prices - Extracted data with exact prices
Platform-Specific Analysis - Per-site breakdown
Market Trends - Growth trends and consumer behavior
Seasonal Characteristics - Current and predicted
Technical Implementation - Success metrics and limitations
Business Insights - Opportunities and recommendations
Data Sources - Success/failure breakdown
Conclusions - Actionable takeaways

Filling the Template

Replace placeholder tokens:

{MARKET} → German, UK, US, etc.
{NUM_SITES} → 23, 25, etc.
{DATE} → 2026-03-19
{SUCCESS_RATE} → 92
{CATEGORY_1} → Clothing & Fashion
{PERCENTAGE_1} → 28
And so on...

Data Quality Indicators

Include these metrics:

Success rate: % of successfully scraped sites
Popup handling: # of sites with popups handled
Price accuracy: % of verified prices
Screenshot quality: Resolution and file size
HTML completeness: Average file size

Writing Tips

Be bilingual (for German market):

Product names: German + Chinese/English translation
Categories: "Bekleidung / Clothing"
Maintain both languages throughout

Be specific:

❌ "Electronics are popular"
✅ "AirPods 4 (89,90€ on eBay), PlayStation 5, and Samsung smartphones are top electronics"

Include evidence:

Reference screenshot file names
Quote exact prices with sources
Link specific platforms to products

Troubleshooting

Issue: Popup Not Closed

Solution: Check references/popup_patterns.md for the specific site. Add custom selector if needed:

# In scripts/scrape_websites.py, add to popup_selectors list:
popup_selectors = [
    # ... existing selectors ...
    'button:has-text("Neue Popup Text")',  # Add custom
]

Issue: HTML Parsing Returns Empty

Diagnose:

Check if HTML file exists and has content
Verify the pattern with grep: grep -o "your-pattern" file.html
Try alternative patterns from references/html_parsing_patterns.md
Use keyword matching as fallback

Issue: Anti-Bot Detection

Symptoms: CAPTCHA, "Verify you are human", IP blocking

Solutions:

Add delays between requests (already in script)
Customize user agent string
Use browser fingerprinting evasion
For production: consider proxy rotation (not included)

Issue: Timeout Errors

Solution: Adjust timeout in script:

await page.goto(url, wait_until="domcontentloaded", timeout=120000)  # 2min

Or use more relaxed loading strategy:

await page.goto(url, wait_until="load", timeout=90000)

Market-Specific Configuration

German Market (Default)

context = await browser.new_context(
    locale="de-DE",
    timezone_id="Europe/Berlin",
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."
)

Popup patterns: See references/popup_patterns.md → German Market section

UK Market

context = await browser.new_context(
    locale="en-GB",
    timezone_id="Europe/London",
)

Popup patterns: Use English/International selectors

US Market

context = await browser.new_context(
    locale="en-US",
    timezone_id="America/New_York",
)

Other Markets

Adjust locale and timezone_id accordingly. Update popup selectors in script based on language.

Advanced Usage

Parallel Scraping

For large website lists, modify script to use concurrent scraping:

import asyncio

async def scrape_all(websites):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        tasks = [capture_homepage(browser, url, output_dir) for url in websites]
        results = await asyncio.gather(*tasks)
        await browser.close()
    return results

Note: Be respectful of rate limits. Use delays.

Custom Analysis

Beyond the standard workflow, you can:

Compare prices across platforms
Track price changes over time (run periodically)
Identify pricing patterns (premium vs discount)
Analyze promotional strategies
Monitor competitor activity

Exporting Data

Consider exporting to structured formats:

CSV: For spreadsheet analysis
JSON: For programmatic access
Database: For long-term tracking

Example CSV export:

import csv

with open('products.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Platform', 'Product', 'Price', 'Category'])
    for product in products:
        writer.writerow([product['platform'], product['name'],
                        product['price'], product['category']])

Best Practices

Ethical Scraping

Respect robots.txt - Check before scraping
Rate limiting - Don't overwhelm servers (script includes delays)
Terms of Service - Review site ToS
Personal use - This skill is for market research, not commercial resale

Data Quality

Verify prices - Cross-check suspicious values
Update regularly - E-commerce changes fast
Document assumptions - Note any manual adjustments
Keep raw data - Save screenshots and HTML for reference

Report Quality

Be objective - Base conclusions on data
Show your work - Reference sources
Contextualize - Explain market-specific factors
Actionable - Provide specific recommendations

Resources Reference

scripts/scrape_websites.py

Main scraper with automatic popup handling. Uses Playwright to capture homepages.

Usage: uv run python scripts/scrape_websites.py

references/popup_patterns.md

Comprehensive collection of popup selectors for different markets and platforms.

When to read: When encountering new popup types or troubleshooting popup handling.

references/html_parsing_patterns.md

Platform-specific HTML parsing patterns and extraction strategies.

When to read: When extracting product data from HTML files. Contains patterns for Amazon, eBay, REWE, Otto, Zalando, and generic strategies.

assets/report_template.md

Structured template for the final market analysis report.

Usage: Copy and fill in with analysis results.

安全使用建议

The skill appears coherent with its stated purpose, but take these precautions before running: 1) Review the included scripts manually—only run code you trust. 2) Run the scraper in an isolated environment (VM/container) to limit risk and disk usage. 3) Install Playwright and dependencies according to the README in a controlled venv. 4) Respect robots.txt and target sites' Terms of Service; avoid scraping behind authentication or paid/content-restricted areas. 5) Start with a small site list and low request rates (the script already has delays) to reduce anti-bot triggers. 6) Be aware the tool saves full HTML/screenshots locally—don’t include sites containing sensitive personal data. If you want higher assurance, ask the maintainer for provenance (homepage, repo) or run the code through your static analysis toolchain before use.

功能分析

Type: OpenClaw Skill Name: ecommerce-market-analyzer-skill Version: 1.0.0 The skill bundle is a legitimate tool for e-commerce market analysis. It provides a structured workflow for scraping homepages using Playwright (scripts/scrape_websites.py), handling common cookie banners, and extracting product data via regex and JSON-LD patterns. The instructions in SKILL.md are well-aligned with the stated purpose, and there is no evidence of data exfiltration, malicious execution, or prompt injection intended to subvert the agent's behavior.

能力评估

✓ Purpose & Capability

The name/description (e‑commerce market scraping and analysis) aligns with the included artifacts: SKILL.md, reference pattern files, report template, and a Playwright scraper script that captures screenshots and HTML and contains parsing patterns for targeted e‑commerce sites. No unrelated credentials, binaries, or config paths are requested.

ℹ Instruction Scope

SKILL.md instructs the agent to run the included Playwright scraper, save screenshots and HTML, then analyze files locally (image reading, grep, regex parsing). All of that is within the stated workflow. Note: the instructions tell the agent to 'immediately run the scraper' when given a website list — this will cause network crawling and file writes; users should be aware the skill performs autonomous web requests and local I/O.

✓ Install Mechanism

There is no external install spec or remote download; the skill is instruction + bundled script. Dependencies (Playwright, Python) are documented in README. No suspicious download URLs or archive extraction steps are present.

✓ Credentials

The skill requests no environment variables or credentials. The script operates with local filesystem writes only (screenshots_clean/*.png and .html). No evidence it accesses or requests unrelated secrets or system configs.

✓ Persistence & Privilege

Skill flags are default (always:false) and do not request permanent elevated privileges. It does write files to a local output directory (expected for scraping) but does not modify other skills or global agent configuration.

版本历史

v1.0.0

E-commerce Market Analyzer 1.0.0 – initial release - Automates scraping of multiple e-commerce homepages, with popup/dialog handling for German and international sites. - Captures full-page screenshots and HTML, normalizes product price formats, and extracts product/category data. - Includes a step-by-step workflow: setup and scrape, visual analysis, data extraction, and comprehensive market report generation. - Provides templates, parsing guides, selector references, and output/report quality metrics for replicable market analysis. - Supports bilingual (e.g., German-English) report generation and offers troubleshooting steps for common scraping issues.

元数据

Slug ecommerce-market-analyzer-skill

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题