Description

Cross-reference restaurant recommendations from Xiaohongshu (小红书) and Dianping (大众点评) to validate restaurant quality and consistency. Use when querying resta...

README (SKILL.md)

Restaurant Review Cross-Check

Name: Clean Skill
Author: zhongrenfei1-hub

Cross-reference restaurant data from Xiaohongshu and Dianping to provide validated recommendations.

Quick Start

Query restaurants by location and cuisine type:

# Basic query
crosscheck-restaurants "上海静安区" "日式料理"

# With filters
crosscheck-restaurants "北京朝阳区" "火锅" --min-rating 4.5 --min-reviews 100

Workflow

1. Data Collection

Query both platforms simultaneously:

Dianping:

Fetch restaurants matching location + cuisine
Extract: name, rating, review_count, price_range, address, tags

Xiaohongshu:

Search notes/posts matching location + cuisine
Extract: restaurant_name, engagement_metrics (likes/saves), sentiment_score
Note: Xiaohongshu data requires scraping as no public API

2. Data Matching

Match restaurants across platforms using fuzzy matching:

Restaurant name similarity (Levenshtein distance)
Location proximity (address matching)
Handle name variations (e.g., "银座寿司" vs "银座寿司静安店")

See scripts/match_restaurants.py for matching logic.

3. Consistency Analysis

Calculate consistency score based on:

Rating correlation (0-1): Correlation between platform ratings
Engagement validation (0-1): Do high ratings correlate with high engagement?
Sentiment alignment (0-1): Do user sentiments align across platforms?

Formula: consistency_score = (rating_corr * 0.5) + (engagement_val * 0.3) + (sentiment_align * 0.2)

4. Recommendation Score

Calculate final recommendation score:

recommendation_score = (
    (dianping_rating * 0.4) +
    (xhs_engagement_normalized * 0.3) +
    (consistency_score * 0.3)
) * 10

Output: 0-10 scale, where >8.0 = high confidence recommendation

Output Format

📍 [Location] [Cuisine Type] 餐厅推荐

1. [Restaurant Name]
   🏆 推荐指数: X.X/10
   ⭐ 大众点评: X.X (Xk评价)
   💬 小红书: X.X⭐ (X笔记)
   📍 地址: [Address]
   💰 人均: ¥[Price]
   ✅ 一致性: [高/中/低] - [Brief explanation]
   
   📊 平台对比:
   - 大众点评标签: [Tags]
   - 小红书热词: [Keywords]
   
   ⚠️ 注意: [Any discrepancies or warnings]

[Continue for top 5-10 restaurants...]

Thresholds

Min rating: 4.0/5.0 (configurable)
Min reviews: 50 on Dianping, 20 notes on Xiaohongshu (configurable)
Max results: Top 10 restaurants by recommendation score
High consistency: Score > 0.7
Medium consistency: Score 0.5-0.7
Low consistency: Score \x3C 0.5 (flag for manual review)

API & Data Sources

Dianping

Method: Web scraping (Dianping API requires business partnership)
Base URL: https://www.dianping.com
Rate limiting: 1 request/2 seconds minimum
Anti-scraping: Use residential proxies, rotate user agents

See scripts/fetch_dianping.py for implementation.

Xiaohongshu

Method: Web scraping (no public API)
Base URL: https://www.xiaohongshu.com
Rate limiting: 1 request/3 seconds minimum
Authentication: Cookies required for full access

See scripts/fetch_xiaohongshu.py for implementation.

Configuration

Edit scripts/config.py to set:

DEFAULT_THRESHOLDS = {
    "min_rating": 4.0,
    "min_dianping_reviews": 50,
    "min_xhs_notes": 20,
    "max_results": 10
}

PROXY_CONFIG = {
    "use_proxy": True,
    "proxy_list": ["http://proxy1:port", "http://proxy2:port"]
}

Error Handling

No matches found: Suggest broader search terms or nearby areas
Platform timeout: Retry with exponential backoff, max 3 attempts
Rate limiting detected: Pause for 60 seconds, rotate proxy
Low confidence results: Flag results with consistency \x3C 0.5 for manual review

Advanced Features

Sentiment Analysis

Xiaohongshu posts use NLP to extract:

Food quality mentions
Service quality mentions
Atmosphere mentions
Price/value mentions

See references/sentiment_analysis.md for methodology.

Fuzzy Matching

Handle restaurant name variations:

Chain stores (e.g., "海底捞火锅" vs "海底捞静安店")
Abbreviations (e.g., "鼎泰丰" vs "鼎泰丰上海店")
Translation differences

Uses thefuzz library for similarity scoring.

Dependencies

pip install requests beautifulsoup4 pandas numpy thefuzz selenium lxml

See scripts/requirements.txt for complete list.

Troubleshooting

Issue: Xiaohongshu returns empty results

Solution: Check if cookies expired, re-authenticate

Issue: Dianping blocks requests

Solution: Reduce request rate, rotate proxies

Issue: Poor matching between platforms

Solution: Adjust similarity threshold in match_restaurants.py

References

Usage Guidance

Key things to consider before installing or running this skill: - Dependency and install gap: The code uses Playwright (and Playwright will download browser binaries), but SKILL.md's pip list omits playwright and instead mentions selenium. Check scripts/requirements.txt and add/install playwright before running; expect browser downloads and larger disk/network usage. - Session/cookie persistence: The skill uses a session_manager and persistent Playwright contexts that store login cookies on disk. Inspect scripts/session_manager.py to see where session data are stored and ensure file permissions/restrictions are appropriate. Treat those session dirs as sensitive (they contain authentication state). - Secrets and proxies: Although the registry metadata lists no required env vars, the skill expects cookies and may require proxy credentials if you enable proxies. Do not paste third-party proxy credentials or cookies into code/config files on multi-user systems; prefer environment variables or a secure secret store and review where the skill will persist them. - Legal & ToS risk: The skill explicitly recommends scraping platforms that (per its own docs) prohibit scraping. Use only for personal research and be aware that sustained scraping may violate site terms and local laws. For production/commercial use, obtain official APIs. - Test in a sandbox: Run the provided mock/server-friendly variant (crosscheck_simple or the example tests) first to verify behavior without performing real scraping. Review logs and network calls during a real run in an isolated environment. - Review code paths: Before giving the skill network access or credentials, read session_manager.py and any remaining omitted files to check how credentials, proxies, and session directories are handled. Ensure no unexpected external endpoints are contacted and that data is not exfiltrated to third-party servers beyond the target platforms. If you need help auditing specific files (e.g., session_manager.py or requirements.txt), provide them and I can inspect for storage locations, network endpoints, and credential handling.

Capability Analysis

Type: OpenClaw Skill Name: clean-skill Version: 1.1.0 The skill is classified as suspicious due to its reliance on advanced web scraping techniques using Playwright with persistent browser sessions (`scripts/fetch_dianping_real.py`, `scripts/fetch_xiaohongshu_real.py`, `scripts/session_manager.py`). While the skill's stated purpose is benign (cross-referencing restaurant reviews), the use of persistent login sessions and browser automation, coupled with explicit acknowledgement of bypassing anti-scraping measures and potential Terms of Service violations (`references/api_limitations.md`), introduces significant risks. These capabilities, if exploited via prompt injection against the AI agent or system compromise, could be misused for unauthorized browser actions or data access beyond the skill's intended scope, even though the skill itself does not exhibit intentional malicious behavior like data exfiltration or arbitrary command execution.

Capability Assessment

⚠ Purpose & Capability

The code and SKILL.md align with the stated purpose (cross-referencing Dianping and Xiaohongshu), and the repo contains matching fetch/match/scoring logic. However, metadata claims 'instruction-only' with no install spec while the package includes many Python scripts that require runtime dependencies (Playwright, requests, bs4, thefuzz, etc.). SKILL.md lists selenium in dependencies but the real scrapers use Playwright; Playwright is used in code but not declared in the SKILL.md dependency list. That mismatch between claimed install/packaging and actual runtime needs is an incoherence.

⚠ Instruction Scope

Instructions explicitly direct web scraping (including use of residential proxies, cookie-based authentication, and rotating user-agents) and persistence of browser login sessions. This is within the stated purpose but expands the agent's runtime behavior into authenticated scraping and persistent local session storage. The SKILL.md and code ask the agent/user to maintain cookies and proxies (sensitive operational inputs) but do not declare how these should be supplied/secured. The instructions also recommend throttling and proxy rotation—practical but potentially enabling large-scale scraping that may violate third-party ToS or law.

⚠ Install Mechanism

There is no install spec in registry metadata (instruction-only), yet the skill contains non-trivial Python code that depends on third-party libraries and Playwright (which downloads browser binaries). SKILL.md lists a pip install line that includes selenium but omits playwright; code calls playwright.async_api. This mismatch means running the skill will likely fail or trigger implicit downloads (Playwright browser installs) with no guidance. Absence of a vetted install mechanism increases risk and friction.

⚠ Credentials

The skill requests no environment variables in metadata, but its runtime behavior expects and documents needing cookies (for Xiaohongshu), proxy endpoints/credentials (residential proxy providers), and persistent session directories. Those are effectively sensitive credentials/configs but are not declared as required environment or secret inputs. The skill recommends third-party proxy providers (Bright Data, Smartproxy) which may require credentials; storing those in config files (scripts/config.py) or inserting them without clear secret handling is disproportionate and risky.

ℹ Persistence & Privilege

The skill creates and uses persistent browser sessions (Playwright launch_persistent_context with user_data_dir via session_manager). That means it will write cookies/session state to disk under session directories managed by the skill. 'always' is false and the skill does not request system-wide privileges, but persistent session storage and local cookie files are significant persistence vectors and should be reviewed before use.

Version History

v1.1.0

restaurant-crosscheck-v2 now cross-references restaurant recommendations from Xiaohongshu (小红书) and Dianping (大众点评) for higher-confidence results. - Fetches and analyzes ratings, review counts, and sentiment from both platforms - Matches restaurants using fuzzy matching and address proximity - Calculates a platform consistency score and outputs confidence-based recommendations (0–10 scale) - Supports custom filtering by location, cuisine, rating, and review count - Integrates error handling, advanced NLP sentiment analysis, and proxy management for scraping robustness

Metadata

Slug clean-skill

Version 1.1.0

License —

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Clean Skill?

Cross-reference restaurant recommendations from Xiaohongshu (小红书) and Dianping (大众点评) to validate restaurant quality and consistency. Use when querying resta... It is an AI Agent Skill for Claude Code / OpenClaw, with 630 downloads so far.

How do I install Clean Skill?

Run "/install clean-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Clean Skill free?

Yes, Clean Skill is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Clean Skill support?

Clean Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Clean Skill?

It is built and maintained by RenfeiZhong (@zhongrenfei1-hub); the current version is v1.1.0.

More Skills

Clean Skill