← Back to Skills Marketplace
zhongrenfei1-hub

Clean Skill

by RenfeiZhong · GitHub ↗ · v1.1.0
cross-platform ⚠ suspicious
630
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install clean-skill
Description
Cross-reference restaurant recommendations from Xiaohongshu (小红书) and Dianping (大众点评) to validate restaurant quality and consistency. Use when querying resta...
README (SKILL.md)

Restaurant Review Cross-Check

Cross-reference restaurant data from Xiaohongshu and Dianping to provide validated recommendations.

Quick Start

Query restaurants by location and cuisine type:

# Basic query
crosscheck-restaurants "上海静安区" "日式料理"

# With filters
crosscheck-restaurants "北京朝阳区" "火锅" --min-rating 4.5 --min-reviews 100

Workflow

1. Data Collection

Query both platforms simultaneously:

Dianping:

  • Fetch restaurants matching location + cuisine
  • Extract: name, rating, review_count, price_range, address, tags

Xiaohongshu:

  • Search notes/posts matching location + cuisine
  • Extract: restaurant_name, engagement_metrics (likes/saves), sentiment_score
  • Note: Xiaohongshu data requires scraping as no public API

2. Data Matching

Match restaurants across platforms using fuzzy matching:

  • Restaurant name similarity (Levenshtein distance)
  • Location proximity (address matching)
  • Handle name variations (e.g., "银座寿司" vs "银座寿司静安店")

See scripts/match_restaurants.py for matching logic.

3. Consistency Analysis

Calculate consistency score based on:

  • Rating correlation (0-1): Correlation between platform ratings
  • Engagement validation (0-1): Do high ratings correlate with high engagement?
  • Sentiment alignment (0-1): Do user sentiments align across platforms?

Formula: consistency_score = (rating_corr * 0.5) + (engagement_val * 0.3) + (sentiment_align * 0.2)

4. Recommendation Score

Calculate final recommendation score:

recommendation_score = (
    (dianping_rating * 0.4) +
    (xhs_engagement_normalized * 0.3) +
    (consistency_score * 0.3)
) * 10

Output: 0-10 scale, where >8.0 = high confidence recommendation

Output Format

📍 [Location] [Cuisine Type] 餐厅推荐

1. [Restaurant Name]
   🏆 推荐指数: X.X/10
   ⭐ 大众点评: X.X (Xk评价)
   💬 小红书: X.X⭐ (X笔记)
   📍 地址: [Address]
   💰 人均: ¥[Price]
   ✅ 一致性: [高/中/低] - [Brief explanation]
   
   📊 平台对比:
   - 大众点评标签: [Tags]
   - 小红书热词: [Keywords]
   
   ⚠️ 注意: [Any discrepancies or warnings]

[Continue for top 5-10 restaurants...]

Thresholds

  • Min rating: 4.0/5.0 (configurable)
  • Min reviews: 50 on Dianping, 20 notes on Xiaohongshu (configurable)
  • Max results: Top 10 restaurants by recommendation score
  • High consistency: Score > 0.7
  • Medium consistency: Score 0.5-0.7
  • Low consistency: Score \x3C 0.5 (flag for manual review)

API & Data Sources

Dianping

  • Method: Web scraping (Dianping API requires business partnership)
  • Base URL: https://www.dianping.com
  • Rate limiting: 1 request/2 seconds minimum
  • Anti-scraping: Use residential proxies, rotate user agents

See scripts/fetch_dianping.py for implementation.

Xiaohongshu

  • Method: Web scraping (no public API)
  • Base URL: https://www.xiaohongshu.com
  • Rate limiting: 1 request/3 seconds minimum
  • Authentication: Cookies required for full access

See scripts/fetch_xiaohongshu.py for implementation.

Configuration

Edit scripts/config.py to set:

DEFAULT_THRESHOLDS = {
    "min_rating": 4.0,
    "min_dianping_reviews": 50,
    "min_xhs_notes": 20,
    "max_results": 10
}

PROXY_CONFIG = {
    "use_proxy": True,
    "proxy_list": ["http://proxy1:port", "http://proxy2:port"]
}

Error Handling

  • No matches found: Suggest broader search terms or nearby areas
  • Platform timeout: Retry with exponential backoff, max 3 attempts
  • Rate limiting detected: Pause for 60 seconds, rotate proxy
  • Low confidence results: Flag results with consistency \x3C 0.5 for manual review

Advanced Features

Sentiment Analysis

Xiaohongshu posts use NLP to extract:

  • Food quality mentions
  • Service quality mentions
  • Atmosphere mentions
  • Price/value mentions

See references/sentiment_analysis.md for methodology.

Fuzzy Matching

Handle restaurant name variations:

  • Chain stores (e.g., "海底捞火锅" vs "海底捞静安店")
  • Abbreviations (e.g., "鼎泰丰" vs "鼎泰丰上海店")
  • Translation differences

Uses thefuzz library for similarity scoring.

Dependencies

pip install requests beautifulsoup4 pandas numpy thefuzz selenium lxml

See scripts/requirements.txt for complete list.

Troubleshooting

Issue: Xiaohongshu returns empty results

  • Solution: Check if cookies expired, re-authenticate

Issue: Dianping blocks requests

  • Solution: Reduce request rate, rotate proxies

Issue: Poor matching between platforms

  • Solution: Adjust similarity threshold in match_restaurants.py

References

Usage Guidance
Key things to consider before installing or running this skill: - Dependency and install gap: The code uses Playwright (and Playwright will download browser binaries), but SKILL.md's pip list omits playwright and instead mentions selenium. Check scripts/requirements.txt and add/install playwright before running; expect browser downloads and larger disk/network usage. - Session/cookie persistence: The skill uses a session_manager and persistent Playwright contexts that store login cookies on disk. Inspect scripts/session_manager.py to see where session data are stored and ensure file permissions/restrictions are appropriate. Treat those session dirs as sensitive (they contain authentication state). - Secrets and proxies: Although the registry metadata lists no required env vars, the skill expects cookies and may require proxy credentials if you enable proxies. Do not paste third-party proxy credentials or cookies into code/config files on multi-user systems; prefer environment variables or a secure secret store and review where the skill will persist them. - Legal & ToS risk: The skill explicitly recommends scraping platforms that (per its own docs) prohibit scraping. Use only for personal research and be aware that sustained scraping may violate site terms and local laws. For production/commercial use, obtain official APIs. - Test in a sandbox: Run the provided mock/server-friendly variant (crosscheck_simple or the example tests) first to verify behavior without performing real scraping. Review logs and network calls during a real run in an isolated environment. - Review code paths: Before giving the skill network access or credentials, read session_manager.py and any remaining omitted files to check how credentials, proxies, and session directories are handled. Ensure no unexpected external endpoints are contacted and that data is not exfiltrated to third-party servers beyond the target platforms. If you need help auditing specific files (e.g., session_manager.py or requirements.txt), provide them and I can inspect for storage locations, network endpoints, and credential handling.
Capability Analysis
Type: OpenClaw Skill Name: clean-skill Version: 1.1.0 The skill is classified as suspicious due to its reliance on advanced web scraping techniques using Playwright with persistent browser sessions (`scripts/fetch_dianping_real.py`, `scripts/fetch_xiaohongshu_real.py`, `scripts/session_manager.py`). While the skill's stated purpose is benign (cross-referencing restaurant reviews), the use of persistent login sessions and browser automation, coupled with explicit acknowledgement of bypassing anti-scraping measures and potential Terms of Service violations (`references/api_limitations.md`), introduces significant risks. These capabilities, if exploited via prompt injection against the AI agent or system compromise, could be misused for unauthorized browser actions or data access beyond the skill's intended scope, even though the skill itself does not exhibit intentional malicious behavior like data exfiltration or arbitrary command execution.
Capability Assessment
Purpose & Capability
The code and SKILL.md align with the stated purpose (cross-referencing Dianping and Xiaohongshu), and the repo contains matching fetch/match/scoring logic. However, metadata claims 'instruction-only' with no install spec while the package includes many Python scripts that require runtime dependencies (Playwright, requests, bs4, thefuzz, etc.). SKILL.md lists selenium in dependencies but the real scrapers use Playwright; Playwright is used in code but not declared in the SKILL.md dependency list. That mismatch between claimed install/packaging and actual runtime needs is an incoherence.
Instruction Scope
Instructions explicitly direct web scraping (including use of residential proxies, cookie-based authentication, and rotating user-agents) and persistence of browser login sessions. This is within the stated purpose but expands the agent's runtime behavior into authenticated scraping and persistent local session storage. The SKILL.md and code ask the agent/user to maintain cookies and proxies (sensitive operational inputs) but do not declare how these should be supplied/secured. The instructions also recommend throttling and proxy rotation—practical but potentially enabling large-scale scraping that may violate third-party ToS or law.
Install Mechanism
There is no install spec in registry metadata (instruction-only), yet the skill contains non-trivial Python code that depends on third-party libraries and Playwright (which downloads browser binaries). SKILL.md lists a pip install line that includes selenium but omits playwright; code calls playwright.async_api. This mismatch means running the skill will likely fail or trigger implicit downloads (Playwright browser installs) with no guidance. Absence of a vetted install mechanism increases risk and friction.
Credentials
The skill requests no environment variables in metadata, but its runtime behavior expects and documents needing cookies (for Xiaohongshu), proxy endpoints/credentials (residential proxy providers), and persistent session directories. Those are effectively sensitive credentials/configs but are not declared as required environment or secret inputs. The skill recommends third-party proxy providers (Bright Data, Smartproxy) which may require credentials; storing those in config files (scripts/config.py) or inserting them without clear secret handling is disproportionate and risky.
Persistence & Privilege
The skill creates and uses persistent browser sessions (Playwright launch_persistent_context with user_data_dir via session_manager). That means it will write cookies/session state to disk under session directories managed by the skill. 'always' is false and the skill does not request system-wide privileges, but persistent session storage and local cookie files are significant persistence vectors and should be reviewed before use.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install clean-skill
  3. After installation, invoke the skill by name or use /clean-skill
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
restaurant-crosscheck-v2 now cross-references restaurant recommendations from Xiaohongshu (小红书) and Dianping (大众点评) for higher-confidence results. - Fetches and analyzes ratings, review counts, and sentiment from both platforms - Matches restaurants using fuzzy matching and address proximity - Calculates a platform consistency score and outputs confidence-based recommendations (0–10 scale) - Supports custom filtering by location, cuisine, rating, and review count - Integrates error handling, advanced NLP sentiment analysis, and proxy management for scraping robustness
Metadata
Slug clean-skill
Version 1.1.0
License
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Clean Skill?

Cross-reference restaurant recommendations from Xiaohongshu (小红书) and Dianping (大众点评) to validate restaurant quality and consistency. Use when querying resta... It is an AI Agent Skill for Claude Code / OpenClaw, with 630 downloads so far.

How do I install Clean Skill?

Run "/install clean-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Clean Skill free?

Yes, Clean Skill is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Clean Skill support?

Clean Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Clean Skill?

It is built and maintained by RenfeiZhong (@zhongrenfei1-hub); the current version is v1.1.0.

💬 Comments