← Back to Skills Marketplace
qirongzhang

Category Link Collector

by QirongZhang · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
219
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install category-link-collector
Description
采集电商网站分类链接信息,提取分类层级数据并保存为CSV文件。当需要从电商网站分类链接中提取结构化数据时使用此技能。
README (SKILL.md)

Category Link Collector Skill

功能

  • 从给定的分类链接URL中提取分类信息
  • 解析分类路径,提取一级和二级分类
  • 生成结构化的CSV文件
  • 支持自定义输出目录和文件名

使用方法

基本用法

采集以下分类链接:
https://lulumonclick-eu.shop/collections/women-women-clothes-tank-tops
https://lulumonclick-eu.shop/collections/women-women-clothes-bras-underwear

参数说明

  • 域名变量: 自动从链接中提取域名部分
  • 输出目录: 默认为 /Users/zhangqirong/工作/caiji,可自定义
  • 文件名: 自动使用域名作为文件名(如 lulumonclick-eu.shop.csv

数据结构

生成的CSV文件包含以下列:

  1. 完整链接: 原始分类链接
  2. 分类路径: 从URL中提取的分类路径(如 women-women-clothes-tank-tops
  3. 域名: 网站域名
  4. 1级分类: 提取的一级分类名称(如 Women
  5. 2级分类: 提取的二级分类名称(如 Tank Tops
  6. 3级分类: 提取的三级分类名称(如存在)
  7. 4级分类: 提取的四级分类名称(如存在)
  8. ...: 更多级别分类(根据实际深度动态生成)

多级分类支持

技能现在支持无限级分类提取:

  • 自动识别分类层级深度
  • 动态生成CSV列(1级分类、2级分类、3级分类...)
  • 智能合并特殊词组(T-shirts, Co-ord等)
  • 正确处理数字范围(0-18 months等)

处理逻辑

  1. 从URL中提取域名部分
  2. /collections/ 后提取分类路径
  3. 解析分类路径:
    • 使用智能算法分割分类路径
    • 识别一级分类(Women, Men, Kids, Beauty等)
    • 提取所有级别的下级分类
    • 智能合并特殊词组和数字范围
  4. 根据最大分类深度动态生成CSV列
  5. 生成CSV文件,保存到指定目录

示例

输入链接:

https://lulumonclick-eu.shop/collections/women-women-clothes-tank-tops

输出CSV行:

完整链接 分类路径 一级分类 二级分类 域名
https://lulumonclick-eu.shop/collections/women-women-clothes-tank-tops women-women-clothes-tank-tops Women Tank Tops lulumonclick-eu.shop

文件位置

  • Skill主文件: SKILL.md
  • 脚本文件: scripts/collect_categories.py
  • 配置文件: config/settings.json (可选)

依赖

  • Python 3.x
  • pandas 库 (用于CSV处理)

扩展能力

后续可以扩展的功能:

  1. 批量处理多个链接
  2. 支持更多分类层级(三级、四级等)
  3. 自动去重和验证
  4. 支持不同的URL格式
  5. 添加时间戳和采集状态
  6. 集成到自动化工作流中
Usage Guidance
This package appears to do what it says (parse /collections/... URLs into hierarchical CSV rows), but there are several red flags you should consider before installing or running it: - Default output directory: The code and docs hardcode /Users/zhangqirong/工作/caiji as the default output path. Override output_dir on first use or edit config/settings.json to avoid writing files into an unexpected location. - Inconsistencies in packaging: Tests and README/SKILL.md expect different CSV column names and filenames than the implementation actually produces (e.g., tests expect '一级分类' and filenames like example_com.csv, while code produces '1级分类' and filenames like example_com_multilevel.csv). This indicates sloppy packaging and means bundled tests may fail — review/adjust the code or tests before trusting results. - No network calls found: The scripts parse given URLs but do not fetch pages. If you planned to fetch remote pages, the code does not do that; check for additional 'fetch' logic if needed. - Dependencies: Ensure Python 3.x and pandas are installed in a controlled environment before running. - Domains in examples: Example links reference domains like zaraoutlet.top and lulumonclick-eu.shop. Those are only example inputs; the code won't contact them, but double-check any example data you reuse. Recommended actions: run the unit tests locally after fixing the column/filename mismatches or update the test expectations; change the hardcoded default output_dir to a sensible relative or configurable default; inspect and run the scripts in an isolated environment (temporary directory) the first time to confirm behavior. If you plan to integrate this into an agent, ensure the agent won't expose these CSV files to external endpoints (the skill itself does not transmit data externally).
Capability Analysis
Type: OpenClaw Skill Name: category-link-collector Version: 1.0.0 The skill bundle is a specialized tool for parsing e-commerce category hierarchies from URLs and saving the structured data into CSV files. While it contains hardcoded local file paths specific to the developer's environment (e.g., `/Users/zhangqirong/工作/caiji` in `scripts/collect_categories.py` and `config/settings.json`), which is a functional flaw regarding portability, the code logic is transparent and strictly follows its stated purpose. There are no indicators of data exfiltration, malicious network activity, or unauthorized system modifications.
Capability Assessment
Purpose & Capability
The name/description (collect category links and produce CSVs) matches the actual scripts: functions extract_domain, extract_category_path, parse_category_hierarchy and collect_category_links implement that. However the package hardcodes a user-specific default output directory (/Users/zhangqirong/工作/caiji) in multiple places (SKILL.md, config/settings.json, collect_categories.collect_category_links default). That absolute path is unrelated to the skill's purpose and is surprising for a generic skill.
Instruction Scope
SKILL.md and README describe only local parsing and CSV generation; the runtime instructions do not request any credentials or network access. The code likewise performs purely local parsing and file writes. The only scope concern is the hardcoded default output directory (will write files to /Users/zhangqirong/工作/caiji unless overridden), which is a surprising side-effect but not external exfiltration.
Install Mechanism
There is no install spec (instruction-only from the platform's perspective). Provided code uses standard Python libraries and pandas; nothing is downloaded from arbitrary URLs or installed automatically by the skill bundle.
Credentials
The skill requests no environment variables or credentials (good). But it writes files by default to a fixed absolute path in a particular user's home; this implicit filesystem access is disproportionate to an innocuous parser unless the user explicitly overrides output_dir. The bundle also depends on pandas (declared in SKILL.md).
Persistence & Privilege
always is false and the skill does not request any platform-level persistent privileges. It writes CSV files to disk (its own data), which is normal for this utility. There is no evidence it modifies other skills or system settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install category-link-collector
  3. After installation, invoke the skill by name or use /category-link-collector
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
初始版本:支持电商分类链接采集,自动提取多级分类并保存为CSV文件
Metadata
Slug category-link-collector
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Category Link Collector?

采集电商网站分类链接信息,提取分类层级数据并保存为CSV文件。当需要从电商网站分类链接中提取结构化数据时使用此技能。 It is an AI Agent Skill for Claude Code / OpenClaw, with 219 downloads so far.

How do I install Category Link Collector?

Run "/install category-link-collector" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Category Link Collector free?

Yes, Category Link Collector is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Category Link Collector support?

Category Link Collector is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Category Link Collector?

It is built and maintained by QirongZhang (@qirongzhang); the current version is v1.0.0.

💬 Comments