← 返回 Skills 市场
mx2013713828

Image-crawler

作者 MagicWolf · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
131
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install image-crawler
功能描述
图片采集/爬虫工具,支持百度和Bing图片搜索引擎。当用户要求采集、爬取、下载、 搜集图片时使用。支持关键词拓展、图片去重(URL+内容hash,跨次运行持久化)、 进度监控和停滞检测。触发词:采集图片、爬取图片、下载图片、图片爬虫、抓取图片。
使用说明 (SKILL.md)

Image Crawler

通过百度/Bing图片搜索批量采集图片,内置去重、关键词拓展、进度监控。

快速流程

1. 确认需求 → 2. 生成拓展关键词 → 3. 构造命令 → 4. 运行并监控 → 5. 汇报结果

Step 1: 确认采集需求

从用户请求中提取:

  • 关键词(必须):采集什么图片
  • 数量(默认 100):需要多少张
  • 输出目录(默认 ./crawled_images):存放位置
  • 引擎(默认 baidu):百度通常更稳定,中文搜索效果更好

Step 2: 关键词拓展

利用 LLM 能力生成 5-15 个拓展关键词,传入 --expand-terms

拓展策略(按领域选择):

设备/产品类:品牌 + 型号 + 使用场景

用户说"挖掘机" → 三一,卡特,小松,沃尔沃,日立,临工,大型,小型,施工现场,工地

动物/植物类:品种 + 环境 + 状态

用户说"猫" → 橘猫,英短,布偶,暹罗,黑猫,可爱,睡觉,户外

建筑/场景类:风格 + 地点 + 时间

用户说"别墅" → 欧式,中式,现代,豪华,花园,室内,外观,夜景

通用原则:拓展词应增加多样性而非重复。中英文混合可增加搜索覆盖面。

Step 3: 构造并运行命令

脚本位置:scripts/image_crawler.py(相对于此 SKILL.md)

python {skill_dir}/scripts/image_crawler.py \
  -k "关键词1" -k "关键词2" \
  -n 数量 \
  -o 输出目录 \
  -e baidu \
  --expand --expand-terms "拓展词1,拓展词2,..." \
  --json

始终使用 --json 模式以便解析输出。

典型示例:

# 采集 200 张挖掘机图片
python scripts/image_crawler.py \
  -k "挖掘机" -k "excavator" \
  -n 200 -o ./excavator_images \
  --expand --expand-terms "三一,卡特,小松,沃尔沃,临工,大型,施工现场" \
  --json

Step 4: 监控采集过程

以后台模式运行脚本,定期检查输出:

  1. execbackground: true 启动脚本
  2. process(poll) 获取最新输出
  3. 解析 JSON 行,关注以下事件:
type 含义 Agent 动作
progress 下载进度 向用户报告进度和预估时间
stall 采集停滞 提醒用户可能有问题
error 严重错误 立即中断并告知用户(反爬/网络问题)
done 采集完成 汇报统计信息

停滞判断:如果 poll 长时间无新 progress 输出(>60s),主动检查进程状态。

Step 5: 汇报结果

采集完成后,向用户报告:

  • 成功下载数 / 目标数
  • 去重移除数
  • 总耗时
  • 输出目录路径
  • 如有失败,说明可能原因(反爬、网络、源站不可用)

追加采集

脚本支持跨次运行去重。如果用户需要更多图片,直接用相同输出目录再次运行:

  • .dedup_hashes.json 自动跳过已有图片
  • 文件编号自动递增,不会覆盖

详细接口和自定义

参见 references/customization.md

  • 完整 CLI 参数表
  • JSON 输出格式详解
  • 去重机制说明
  • 添加新搜索引擎指南
  • 常见问题排查

脚本模板

scripts/ 下包含两个独立可用的引擎模板,适合用户学习或二次开发:

  • baidu_crawler.py — 百度图片搜索,接口清晰,中文搜索效果好
  • bing_crawler.py — Bing图片搜索,英文搜索覆盖面广
安全使用建议
This skill appears to do what it says (scrape images from Baidu/Bing and deduplicate). Before installing or running it: (1) ensure you run it in a controlled environment (sandbox or non-privileged account) because it will download many files and use network bandwidth; (2) install Python and the 'requests' package (pip install requests) — the skill doesn't declare this dependency in metadata; (3) set a safe output directory and disk quota to avoid filling your disk; (4) respect website terms of service and robots.txt and be aware of legal/ethical issues with mass scraping; (5) consider lowering concurrency and increasing delays (the code already exposes sleep/timeouts) to reduce anti-scraping risk; (6) review the scripts for any changes if you plan to run them on sensitive hosts — although no hidden network sinks or credential access were found, the crawler will fetch arbitrary external URLs, which can host unexpected content; (7) do not run as root/administrator and avoid supplying any unrelated credentials to the skill. If you want higher assurance, ask the publisher to update the metadata to declare Python and requests as required, and provide an explicit dependency/install instruction.
功能分析
Type: OpenClaw Skill Name: image-crawler Version: 1.0.0 The image-crawler skill bundle is a functional tool designed for batch downloading images from Baidu and Bing. The core logic in scripts/image_crawler.py and its engine-specific templates (baidu_crawler.py, bing_crawler.py) focuses on legitimate web scraping, including features like MD5-based deduplication, keyword expansion, and progress reporting via JSON. No evidence of data exfiltration, malicious execution, or prompt injection was found; all network activity is directed at well-known search engines, and file operations are restricted to the user-specified output directory.
能力评估
Purpose & Capability
Name/description match the provided scripts: the package contains crawler implementations for Baidu and Bing and a wrapper script that coordinates search, download, deduplication and progress reporting. However, the registry metadata declared no required binaries or environment variables while the SKILL.md and scripts assume a Python runtime and the 'requests' library; that runtime dependency is not declared in the metadata.
Instruction Scope
SKILL.md instructs the agent to extract keywords, expand them, run the bundled Python script in JSON mode and monitor its line-delimited JSON output. The instructions stay within the crawler's scope and do not request unrelated files, system credentials, or external endpoints beyond search engines and target image hosts. Use of the LLM to expand keywords is intentional for coverage and is documented.
Install Mechanism
This is an instruction-only skill (no install spec). The included code runs as Python scripts and makes network calls. There is no remote download/installation of code at install time and no obscure third-party install URLs. Note: the script will exit if 'requests' is not installed and prints instructions to pip install it — the dependency should be declared.
Credentials
The skill requests no environment variables or credentials and does not attempt to access system config paths beyond writing to the user-specified output directory. Network access to Bing, Baidu, and arbitrary image hosts is required and expected for its purpose.
Persistence & Privilege
The skill does not request permanent 'always' inclusion, nor does it modify other skills or system-wide settings. It persists deduplication hashes to a file under the chosen output directory (.dedup_hashes.json), which is consistent with stated behavior.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install image-crawler
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /image-crawler 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
image-crawler v1.0.0 – 首发版本 - 支持通过百度和Bing图片搜索按关键词批量采集图片 - 内置智能关键词拓展,提升图片多样性 - 提供图片去重(URL与内容hash,支持持久化) - 支持进度监控、停滞检测与自动化错误处理 - 脚本输出标准JSON,便于集成和结果追踪 - 支持追加采集并自动跳过已下载图片
元数据
Slug image-crawler
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Image-crawler 是什么?

图片采集/爬虫工具,支持百度和Bing图片搜索引擎。当用户要求采集、爬取、下载、 搜集图片时使用。支持关键词拓展、图片去重(URL+内容hash,跨次运行持久化)、 进度监控和停滞检测。触发词:采集图片、爬取图片、下载图片、图片爬虫、抓取图片。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 131 次。

如何安装 Image-crawler?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-crawler」即可一键安装,无需额外配置。

Image-crawler 是免费的吗?

是的,Image-crawler 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Image-crawler 支持哪些平台?

Image-crawler 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Image-crawler?

由 MagicWolf(@mx2013713828)开发并维护,当前版本 v1.0.0。

💬 留言讨论