← 返回 Skills 市场
yuzhihui886

Corpus Search

作者 yuzhihui886 · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ 安全检测通过
97
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install corpus-search
功能描述
语料检索工具,与 corpus-builder 配合使用。支持语义搜索、元数据过滤(场景/情绪/节奏/质量)。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。
使用说明 (SKILL.md)

Corpus Search - 语料检索工具

与 corpus-builder 配合使用的语料检索工具,支持语义搜索和元数据过滤。

快速开始

cd ~/.openclaw/workspace/skills/corpus-search

# 基础搜索
python3 scripts/search_corpus.py -q "紧张的打斗场景" -c xuanhuan-full --limit 10

# 按场景过滤
python3 scripts/search_corpus.py -q "围攻" -c xuanhuan-full --scene 打斗 --limit 5

# 按情绪过滤
python3 scripts/search_corpus.py -q "修炼" -c xuanhuan-full --emotion 紧张 --limit 10

# JSON 输出
python3 scripts/search_corpus.py -q "突破" -c xuanhuan-full --json

命令行选项

选项 说明
-q, --query 搜索查询(必填)
-c, --collection 语料库名称(必填)
--limit 返回数量(默认 10)
--scene 场景过滤(打斗/修炼/对话/探险等)
--emotion 情绪过滤(紧张/轻松/悲伤/热血等)
--min-quality 最低质量分(1-10)
--json JSON 格式输出
--export 导出到文件
--verbose 详细输出

输出示例

🔍 搜索结果:紧张的打斗场景
   语料库:xuanhuan-full
   返回数量:5

1. 相似度:87.5%
   场景:打斗
   情绪:紧张,热血
   节奏:快节奏
   来源:没钱修什么仙_第 1-10 章.txt

   内容预览:
   张羽只觉胸口一痛,低头看去,只见一柄长剑已刺入...

依赖

pip3 install -r requirements.txt --user

配置

编辑 configs/default_config.yml 修改语料库路径。

相关文件

  • scripts/search_corpus.py - 主程序
  • configs/default_config.yml - 配置文件

Version: 1.0.0

安全使用建议
This skill appears to do what it says: local semantic search over a ChromaDB corpus produced by corpus-builder. Before installing or running: 1) ensure the configured persist_directory points to the corpus you expect (inspect configs/default_config.yml); 2) be aware model loading (sentence-transformers) may download large weights from the internet — run in an environment with sufficient disk space and network policy you control; 3) verify the corpus directory contains only data you're willing to let the skill read (it will access files under the corpus-builder path); 4) optionally run the script in a sandbox or inspect the full script if you want to confirm behavior. The minor issues: requirements.txt includes diskcache although the code currently uses in-memory caching — harmless but worth noting.
功能分析
Type: OpenClaw Skill Name: corpus-search Version: 1.0.1 The corpus-search skill bundle is a legitimate utility for performing semantic searches and metadata filtering on text corpora. The primary script, scripts/search_corpus.py, uses standard libraries like ChromaDB and SentenceTransformers to query a vector database, with explicit privacy measures such as disabling anonymized telemetry. No indicators of malicious intent, data exfiltration, or prompt injection were found; the code aligns perfectly with its stated purpose of retrieving writing materials.
能力评估
Purpose & Capability
Name/description (语料检索,与 corpus-builder 配合) matches the files and behavior: it opens a ChromaDB persistent client in the corpus-builder corpus path, computes embeddings via sentence-transformers, and supports metadata filters. The storage path in default_config.yml explicitly points to the corpus-builder corpus directory, which is expected for this purpose.
Instruction Scope
SKILL.md only instructs running the included Python script and editing the config to point to the corpus. The script operates on the configured local persist_directory and does not reference unrelated system paths or require environment secrets. Note: loading the specified embedding model (SentenceTransformer with model name 'BAAI/bge-small-zh-v1.5') will typically download model weights from the model host (internet access) unless already cached.
Install Mechanism
There is no install hook; dependencies are declared in requirements.txt (pip). Those packages are plausible for the task (chromadb, sentence-transformers, pyyaml, rich, tqdm). No archives or external install URLs are used. The only minor mismatch: requirements.txt lists diskcache but the code currently uses only an in-memory cache (comment indicates diskcache was removed).
Credentials
The skill requests no environment variables or credentials and does not require unrelated secrets. The only notable external access is model download via sentence-transformers/HuggingFace (public model name provided) which does not require credentials for a public model; if a private model were used the user would need to provide HF credentials separately (not requested by this skill).
Persistence & Privilege
always is false and the skill is user-invocable. It does not modify other skills' configs or require persistent system-wide privileges. It reads from a local corpus directory (expected).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install corpus-search
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /corpus-search 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
代码质量优化
v1.0.0
语料检索工具
元数据
Slug corpus-search
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Corpus Search 是什么?

语料检索工具,与 corpus-builder 配合使用。支持语义搜索、元数据过滤(场景/情绪/节奏/质量)。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 97 次。

如何安装 Corpus Search?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install corpus-search」即可一键安装,无需额外配置。

Corpus Search 是免费的吗?

是的,Corpus Search 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Corpus Search 支持哪些平台?

Corpus Search 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Corpus Search?

由 yuzhihui886(@yuzhihui886)开发并维护,当前版本 v1.0.1。

💬 留言讨论