← 返回 Skills 市场
nepp-an

ecommerce-market-analyzer-skill

作者 nepp · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
192
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install ecommerce-market-analyzer-skill
功能描述
Scrape e-commerce homepages from multiple websites in a target market, handle popups automatically, capture screenshots and HTML, extract product data, and g...
使用说明 (SKILL.md)

E-commerce Market Analyzer

Automated workflow for scraping e-commerce websites, handling popups, extracting product data, and generating comprehensive market analysis reports.

Workflow Overview

This skill follows a 4-step workflow:

  1. Setup & Scraping - Run Playwright scraper to capture homepages
  2. Visual Analysis - Analyze screenshots to identify product categories
  3. Data Extraction - Parse HTML to extract specific products and prices
  4. Report Generation - Create comprehensive market analysis report
User provides website list
         ↓
Step 1: Run scraper (handles popups automatically)
         ↓
Step 2: Analyze screenshots visually
         ↓
Step 3: Extract structured data from HTML
         ↓
Step 4: Generate final report

Step 1: Setup & Scraping

Quick Start

When user provides a list of e-commerce websites, immediately run the scraper:

# Create output directory
mkdir -p screenshots_clean

# Run the scraper
uv run python scripts/scrape_websites.py

Customizing the Website List

Edit scripts/scrape_websites.py and update the WEBSITES list:

WEBSITES = [
    "amazon.de",
    "ebay.de",
    "otto.de",
    # Add more websites...
]

Key Features

The scraper automatically:

  • Handles cookie consent popups (German, English, universal selectors)
  • Handles region/language selection dialogs
  • Captures full-page screenshots (1920x1080)
  • Saves HTML source code
  • Uses German locale settings (or customize for other markets)
  • Waits for page stabilization

Important: The script uses popup patterns from references/popup_patterns.md. Consult this file if dealing with new popup types.

Expected Output

After running, you'll have:

  • screenshots_clean/*.png - Full-page screenshots
  • screenshots_clean/*.html - HTML source files
  • Console output with success/failure summary

Success rate target: 85-95%

Common failures:

  • Anti-bot protection (requires manual intervention)
  • HTTP/2 protocol errors (some sites block automation)
  • Timeout on slow-loading sites

Step 2: Visual Analysis

Read Screenshots

After scraping, read the screenshot files to visually identify:

  • Product categories
  • Featured products
  • Promotional items
  • Visual design patterns

Example approach:

from pathlib import Path

screenshot_dir = Path("screenshots_clean")
screenshots = list(screenshot_dir.glob("*.png"))

# Read screenshots using the Read tool
for screenshot in screenshots[:5]:  # Start with 5 sites
    # Use Read tool to view image
    # Note product categories and featured items

What to Look For

Product Categories:

  • Clothing & Fashion (Bekleidung)
  • Electronics (Elektronik)
  • Home & Furniture (Möbel & Wohnen)
  • Food & Groceries (Lebensmittel)
  • Books & Media (Bücher)
  • Beauty & Personal Care (Beauty & Pflege)
  • Sports & Outdoor (Sport)
  • Toys & Baby (Spielzeug & Baby)

Featured Products:

  • Homepage banners
  • Promotional sections
  • "Deal of the day" items
  • New arrivals

Take notes on recurring patterns across multiple sites - these indicate market trends.


Step 3: Data Extraction

Strategy Selection

Choose extraction strategy based on site structure. See references/html_parsing_patterns.md for complete patterns.

Quick decision tree:

  1. Try JSON-LD schema extraction (best for structured data)
  2. Fall back to data attribute extraction
  3. Fall back to class-based extraction
  4. Last resort: keyword matching

Example: Extract from REWE.de

import re
from pathlib import Path

html_file = Path("screenshots_clean/rewe.de.html")
content = html_file.read_text(encoding='utf-8')

# REWE-specific patterns
title_pattern = r'data-offer-title="([^"]+)"'
price_pattern = r'\x3Cdiv class="cor-offer-price__tag-price">([^\x3C]+)\x3C/div>'

titles = re.findall(title_pattern, content)
prices = re.findall(price_pattern, content)

for i, title in enumerate(titles[:10]):
    price = prices[i] if i \x3C len(prices) else "N/A"
    print(f"{title}: {price}€")

Platform-Specific Parsing

Each e-commerce platform has unique HTML structure. Consult references/html_parsing_patterns.md for:

  • Amazon.de patterns
  • eBay.de patterns
  • Otto.de patterns
  • Zalando/AboutYou patterns
  • REWE/Lidl supermarket patterns
  • And more...

Price Normalization

Always normalize prices:

def normalize_price(price_str):
    """Convert German format (1.234,56€) to float"""
    price_str = price_str.replace('€', '').replace('EUR', '').strip()
    if ',' in price_str and '.' in price_str:
        price_str = price_str.replace('.', '').replace(',', '.')
    elif ',' in price_str:
        price_str = price_str.replace(',', '.')
    try:
        return float(price_str)
    except:
        return None

Handling Large Files

For HTML files >25k tokens:

# Use grep to search for specific patterns
grep -o 'data-product-name="[^"]*"' amazon.de.html | head -20

# Or extract specific sections
grep -A 5 'product-title' ebay.de.html

Extraction Best Practices

  1. Try multiple patterns - Start with JSON-LD, fall back as needed
  2. Validate extractions - Check for reasonable length (10-100 chars)
  3. Remove duplicates - Use sets to track seen products
  4. Limit results - Cap at 10-20 products per site
  5. Handle encoding - Always use encoding='utf-8'

Step 4: Report Generation

Use the Report Template

Copy and customize assets/report_template.md:

cp assets/report_template.md final_report.md

Report Structure

The template includes these sections:

  1. Executive Summary - Key findings
  2. Top Product Categories - Ranked list with percentages
  3. Verified Product Prices - Extracted data with exact prices
  4. Platform-Specific Analysis - Per-site breakdown
  5. Market Trends - Growth trends and consumer behavior
  6. Seasonal Characteristics - Current and predicted
  7. Technical Implementation - Success metrics and limitations
  8. Business Insights - Opportunities and recommendations
  9. Data Sources - Success/failure breakdown
  10. Conclusions - Actionable takeaways

Filling the Template

Replace placeholder tokens:

  • {MARKET} → German, UK, US, etc.
  • {NUM_SITES} → 23, 25, etc.
  • {DATE} → 2026-03-19
  • {SUCCESS_RATE} → 92
  • {CATEGORY_1} → Clothing & Fashion
  • {PERCENTAGE_1} → 28
  • And so on...

Data Quality Indicators

Include these metrics:

  • Success rate: % of successfully scraped sites
  • Popup handling: # of sites with popups handled
  • Price accuracy: % of verified prices
  • Screenshot quality: Resolution and file size
  • HTML completeness: Average file size

Writing Tips

Be bilingual (for German market):

  • Product names: German + Chinese/English translation
  • Categories: "Bekleidung / Clothing"
  • Maintain both languages throughout

Be specific:

  • ❌ "Electronics are popular"
  • ✅ "AirPods 4 (89,90€ on eBay), PlayStation 5, and Samsung smartphones are top electronics"

Include evidence:

  • Reference screenshot file names
  • Quote exact prices with sources
  • Link specific platforms to products

Troubleshooting

Issue: Popup Not Closed

Solution: Check references/popup_patterns.md for the specific site. Add custom selector if needed:

# In scripts/scrape_websites.py, add to popup_selectors list:
popup_selectors = [
    # ... existing selectors ...
    'button:has-text("Neue Popup Text")',  # Add custom
]

Issue: HTML Parsing Returns Empty

Diagnose:

  1. Check if HTML file exists and has content
  2. Verify the pattern with grep: grep -o "your-pattern" file.html
  3. Try alternative patterns from references/html_parsing_patterns.md
  4. Use keyword matching as fallback

Issue: Anti-Bot Detection

Symptoms: CAPTCHA, "Verify you are human", IP blocking

Solutions:

  1. Add delays between requests (already in script)
  2. Customize user agent string
  3. Use browser fingerprinting evasion
  4. For production: consider proxy rotation (not included)

Issue: Timeout Errors

Solution: Adjust timeout in script:

await page.goto(url, wait_until="domcontentloaded", timeout=120000)  # 2min

Or use more relaxed loading strategy:

await page.goto(url, wait_until="load", timeout=90000)

Market-Specific Configuration

German Market (Default)

context = await browser.new_context(
    locale="de-DE",
    timezone_id="Europe/Berlin",
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."
)

Popup patterns: See references/popup_patterns.md → German Market section

UK Market

context = await browser.new_context(
    locale="en-GB",
    timezone_id="Europe/London",
)

Popup patterns: Use English/International selectors

US Market

context = await browser.new_context(
    locale="en-US",
    timezone_id="America/New_York",
)

Other Markets

Adjust locale and timezone_id accordingly. Update popup selectors in script based on language.


Advanced Usage

Parallel Scraping

For large website lists, modify script to use concurrent scraping:

import asyncio

async def scrape_all(websites):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        tasks = [capture_homepage(browser, url, output_dir) for url in websites]
        results = await asyncio.gather(*tasks)
        await browser.close()
    return results

Note: Be respectful of rate limits. Use delays.

Custom Analysis

Beyond the standard workflow, you can:

  • Compare prices across platforms
  • Track price changes over time (run periodically)
  • Identify pricing patterns (premium vs discount)
  • Analyze promotional strategies
  • Monitor competitor activity

Exporting Data

Consider exporting to structured formats:

  • CSV: For spreadsheet analysis
  • JSON: For programmatic access
  • Database: For long-term tracking

Example CSV export:

import csv

with open('products.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Platform', 'Product', 'Price', 'Category'])
    for product in products:
        writer.writerow([product['platform'], product['name'],
                        product['price'], product['category']])

Best Practices

Ethical Scraping

  1. Respect robots.txt - Check before scraping
  2. Rate limiting - Don't overwhelm servers (script includes delays)
  3. Terms of Service - Review site ToS
  4. Personal use - This skill is for market research, not commercial resale

Data Quality

  1. Verify prices - Cross-check suspicious values
  2. Update regularly - E-commerce changes fast
  3. Document assumptions - Note any manual adjustments
  4. Keep raw data - Save screenshots and HTML for reference

Report Quality

  1. Be objective - Base conclusions on data
  2. Show your work - Reference sources
  3. Contextualize - Explain market-specific factors
  4. Actionable - Provide specific recommendations

Resources Reference

scripts/scrape_websites.py

Main scraper with automatic popup handling. Uses Playwright to capture homepages.

Usage: uv run python scripts/scrape_websites.py

references/popup_patterns.md

Comprehensive collection of popup selectors for different markets and platforms.

When to read: When encountering new popup types or troubleshooting popup handling.

references/html_parsing_patterns.md

Platform-specific HTML parsing patterns and extraction strategies.

When to read: When extracting product data from HTML files. Contains patterns for Amazon, eBay, REWE, Otto, Zalando, and generic strategies.

assets/report_template.md

Structured template for the final market analysis report.

Usage: Copy and fill in with analysis results.

安全使用建议
The skill appears coherent with its stated purpose, but take these precautions before running: 1) Review the included scripts manually—only run code you trust. 2) Run the scraper in an isolated environment (VM/container) to limit risk and disk usage. 3) Install Playwright and dependencies according to the README in a controlled venv. 4) Respect robots.txt and target sites' Terms of Service; avoid scraping behind authentication or paid/content-restricted areas. 5) Start with a small site list and low request rates (the script already has delays) to reduce anti-bot triggers. 6) Be aware the tool saves full HTML/screenshots locally—don’t include sites containing sensitive personal data. If you want higher assurance, ask the maintainer for provenance (homepage, repo) or run the code through your static analysis toolchain before use.
功能分析
Type: OpenClaw Skill Name: ecommerce-market-analyzer-skill Version: 1.0.0 The skill bundle is a legitimate tool for e-commerce market analysis. It provides a structured workflow for scraping homepages using Playwright (scripts/scrape_websites.py), handling common cookie banners, and extracting product data via regex and JSON-LD patterns. The instructions in SKILL.md are well-aligned with the stated purpose, and there is no evidence of data exfiltration, malicious execution, or prompt injection intended to subvert the agent's behavior.
能力评估
Purpose & Capability
The name/description (e‑commerce market scraping and analysis) aligns with the included artifacts: SKILL.md, reference pattern files, report template, and a Playwright scraper script that captures screenshots and HTML and contains parsing patterns for targeted e‑commerce sites. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md instructs the agent to run the included Playwright scraper, save screenshots and HTML, then analyze files locally (image reading, grep, regex parsing). All of that is within the stated workflow. Note: the instructions tell the agent to 'immediately run the scraper' when given a website list — this will cause network crawling and file writes; users should be aware the skill performs autonomous web requests and local I/O.
Install Mechanism
There is no external install spec or remote download; the skill is instruction + bundled script. Dependencies (Playwright, Python) are documented in README. No suspicious download URLs or archive extraction steps are present.
Credentials
The skill requests no environment variables or credentials. The script operates with local filesystem writes only (screenshots_clean/*.png and .html). No evidence it accesses or requests unrelated secrets or system configs.
Persistence & Privilege
Skill flags are default (always:false) and do not request permanent elevated privileges. It does write files to a local output directory (expected for scraping) but does not modify other skills or global agent configuration.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ecommerce-market-analyzer-skill
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ecommerce-market-analyzer-skill 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
E-commerce Market Analyzer 1.0.0 – initial release - Automates scraping of multiple e-commerce homepages, with popup/dialog handling for German and international sites. - Captures full-page screenshots and HTML, normalizes product price formats, and extracts product/category data. - Includes a step-by-step workflow: setup and scrape, visual analysis, data extraction, and comprehensive market report generation. - Provides templates, parsing guides, selector references, and output/report quality metrics for replicable market analysis. - Supports bilingual (e.g., German-English) report generation and offers troubleshooting steps for common scraping issues.
元数据
Slug ecommerce-market-analyzer-skill
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

ecommerce-market-analyzer-skill 是什么?

Scrape e-commerce homepages from multiple websites in a target market, handle popups automatically, capture screenshots and HTML, extract product data, and g... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 192 次。

如何安装 ecommerce-market-analyzer-skill?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ecommerce-market-analyzer-skill」即可一键安装,无需额外配置。

ecommerce-market-analyzer-skill 是免费的吗?

是的,ecommerce-market-analyzer-skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ecommerce-market-analyzer-skill 支持哪些平台?

ecommerce-market-analyzer-skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ecommerce-market-analyzer-skill?

由 nepp(@nepp-an)开发并维护,当前版本 v1.0.0。

💬 留言讨论