← 返回 Skills 市场

Douban Self Taste Skill

Name: Douban Self Taste Skill
Author: xeric7

作者 XEric7 · GitHub ↗ · v0.1.2 · MIT-0

cross-platform ⚠ suspicious

282

总下载

当前安装

版本数

在 OpenClaw 中安装

/install douban-self-taste-skill

功能描述

Collect, refresh, normalize, and analyze the user's own Douban history for taste analysis and recommendation reasoning. Use when the task involves the user's...

使用说明 (SKILL.md)

Douban Self Taste Skill

Collect the user's own Douban history, keep a local cache fresh, and analyze it carefully.

Scope

Use this skill only for the user's own Douban data, including:

the user's own movie / book / music / game shelves
the user's own ratings, tags, short comments, reviews, and dates
local exports, saved HTML pages, or cached JSON derived from the user's own account
fresh re-crawls of the user's own logged-in pages when cache is missing or stale

Do not use this skill for public-user scraping, whole-site crawling, hidden/private data claims, or MCP-server design.

Storage layout

Use these paths unless the user explicitly asks for a different layout:

cookies: .local/douban-self-taste/cookies/douban_cookies.json
crawl cache: .local/douban-self-taste/cache/collections/
analysis outputs: .local/douban-self-taste/analysis/

Treat the cache as reusable local working data. Do not scatter generated files across the repo.

Read references/storage-layout.md for exact file naming conventions.

Required workflow

Follow this order.

1. Decide whether crawling is needed

Check whether local crawl cache already exists for the requested category.

If no cache exists, crawling is needed.
If cache exists but its fetched_at timestamp is older than 7 days, crawling is needed.
Otherwise, reuse the local cache.

Prefer the smallest sufficient refresh.

If the user asks about books, prioritize book cache.
If the user asks about movies, prioritize movie cache.
You may use small amounts of other categories as weak supplementary context, but keep the requested category primary.

2. If crawling is needed, verify cookie availability

Check whether the cookie file exists and is plausibly usable.

Treat cookies as unavailable when:

the cookie file is missing
the cookie file is empty or malformed
the crawl clearly redirects to login or otherwise fails due to authentication

If cookies are unavailable or expired, ask the user for fresh cookies before crawling.

Do not pretend a crawl succeeded when authentication failed.

3. Crawl and persist locally

When cookies are available, crawl the user's own Douban shelves and store the refreshed result in local JSON cache files.

Use scripts/crawl_douban_self_history.py for logged-in crawling. Use scripts/extract_douban_self_history.py when the user already has saved HTML files.

After crawling:

save normalized JSON to the cache directory
include fetched_at
keep category and status explicit
preserve raw comments and rating information

4. Analyze after data is ready

Only start analysis after confirming that either:

fresh cache exists, or
a successful new crawl has been saved locally

Use scripts/build_taste_profile.py to build an analysis-ready summary when helpful. Write the summary into .local/douban-self-taste/analysis/ when the user wants a reusable analysis artifact.

Analysis priorities

Always pay extra attention to:

items with comments
high-rated items
low-rated items
recent items
category boundaries

For scripts/build_taste_profile.py, use these summary rules:

Do not include the full items array in the profile output; keep full records in the crawl cache.
Keep the rest of the summary reasonably rich; avoid large deletions unless the user asks.
Define recent_items as the newest dated items sorted by date descending, capped at 20 items.
Define high_rated_items as all items tied at the user's highest observed rating within the focused dataset; if there are more than 20, keep only the most recent 20 by date.
Define low_rated_items as all items tied at the user's lowest observed rating within the focused dataset; if there are more than 20, keep only the most recent 20 by date.
Treat game tag analysis separately from creator analysis; games may have useful genre/platform-like tags but often do not have reliable creators.
Filter noisy book creators when obvious publisher / bookstore / distribution-style strings appear.
Prefer category-specific cleaning over one generic parser when extracting tags or creators.

When the user asks about one category, analyze that category first.

Examples:

Book questions → use books as primary evidence; only lightly reference movies/music/games if they add meaningful support.
Movie questions → use movies as primary evidence.

Separate:

stable preferences
weak signals
aversions / anti-preferences
recent shifts

Do not overfit from tiny samples.

Output expectations

Start with factual scope:

what data was used
whether it came from cache or a fresh crawl
cache age
category coverage
obvious data gaps

Then provide analysis. Keep generated profile files compact enough for downstream LLM analysis; prefer concise summaries over repeating the entire dataset.

Bundled resources

Read references/storage-layout.md for local file locations.
Read references/data-sources.md for cache/cookie refresh logic.
Read references/output-schema.md for normalized JSON structure.
Read references/analysis-rubric.md before writing conclusions.
Use scripts/crawl_douban_self_history.py to refresh local cache from logged-in pages.
Use scripts/extract_douban_self_history.py to convert saved HTML files into normalized JSON.
Use scripts/build_taste_profile.py to generate category-aware summaries.

安全使用建议

This skill appears to do what it says, but you should (1) only provide cookies for an account you control and understand that cookies are equivalent to logged-in access — consider creating a limited or temporary session if you are cautious; (2) inspect the cookie file before handing it over (it may contain session tokens like dbcl2); (3) ensure Python and the required packages (httpx, BeautifulSoup/lxml) are available in a safe environment or sandbox before running the scripts; (4) be aware the skill will persist cache and analysis files under .local/douban-self-taste — delete them when you no longer want them stored; and (5) review the included scripts yourself (they only target Douban hosts) if you need higher assurance.

功能分析

Type: OpenClaw Skill Name: douban-self-taste-skill Version: 0.1.2 The skill bundle is classified as suspicious because it implements authenticated web crawling using session cookies stored in '.local/douban-self-taste/cookies/douban_cookies.json'. While these capabilities are aligned with the stated goal of retrieving a user's Douban history, the handling of sensitive session tokens and automated network requests in 'scripts/crawl_douban_self_history.py' represents a high-risk capability. Additionally, that script uses an atypical '__import__("re")' pattern for regex operations instead of standard top-level imports, which is sometimes used to bypass basic static analysis, although no clear evidence of malicious intent or data exfiltration to non-Douban domains was found.

能力评估

✓ Purpose & Capability

Name/description match included scripts and SKILL.md: crawling logged-in Douban pages, extracting saved HTML, normalizing JSON, and building taste profiles. No unrelated credentials, binaries, or services are requested.

ℹ Instruction Scope

SKILL.md confines actions to the user's own Douban data and specifies exact local paths. It instructs reading a browser-exported cookie JSON and saved HTML or performing logged-in crawls. Note: those cookies are sensitive authentication material and the skill persists crawl/cache files locally; user consent and care are expected.

ℹ Install Mechanism

No install spec (instruction-only) — lowest install risk. However the bundled Python scripts require runtime dependencies (httpx, bs4/lxml) which are not declared; the environment must provide them. No external downloads or unusual installation steps are present.

✓ Credentials

No environment variables or unrelated credentials are requested. The only sensitive input is a browser-exported cookie JSON, which is proportionate to the stated logged-in crawl functionality.

✓ Persistence & Privilege

always is false and the skill only writes to its own .local/douban-self-taste paths by default. It does not request system-wide changes or other skills' configs.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install douban-self-taste-skill
安装完成后，直接呼叫该 Skill 的名称或使用 /douban-self-taste-skill 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.2

Stop surfacing book publisher metadata as top tags; keep profile output focused on taste-relevant signals.

v0.1.1

Fix false auth detection during crawl; remove full items from profile output; improve category-specific tag and creator cleanup; use observed max/min ratings for high/low item selection.

v0.1.0

Initial release: self-only Douban history crawl, local cache refresh, category-aware analysis, and JSON-based outputs.

元数据

Slug douban-self-taste-skill

版本 0.1.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 3

常见问题