← 返回 Skills 市场

Corpus Search

Name: Corpus Search
Author: yuzhihui886

作者 yuzhihui886 · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install corpus-search

功能描述

语料检索工具，与 corpus-builder 配合使用。支持语义搜索、元数据过滤（场景/情绪/节奏/质量）。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。

使用说明 (SKILL.md)

Corpus Search - 语料检索工具

与 corpus-builder 配合使用的语料检索工具，支持语义搜索和元数据过滤。

快速开始

cd ~/.openclaw/workspace/skills/corpus-search

# 基础搜索
python3 scripts/search_corpus.py -q "紧张的打斗场景" -c xuanhuan-full --limit 10

# 按场景过滤
python3 scripts/search_corpus.py -q "围攻" -c xuanhuan-full --scene 打斗 --limit 5

# 按情绪过滤
python3 scripts/search_corpus.py -q "修炼" -c xuanhuan-full --emotion 紧张 --limit 10

# JSON 输出
python3 scripts/search_corpus.py -q "突破" -c xuanhuan-full --json

命令行选项

选项	说明
`-q, --query`	搜索查询（必填）
`-c, --collection`	语料库名称（必填）
`--limit`	返回数量（默认 10）
`--scene`	场景过滤（打斗/修炼/对话/探险等）
`--emotion`	情绪过滤（紧张/轻松/悲伤/热血等）
`--min-quality`	最低质量分（1-10）
`--json`	JSON 格式输出
`--export`	导出到文件
`--verbose`	详细输出

输出示例

🔍 搜索结果：紧张的打斗场景
   语料库：xuanhuan-full
   返回数量：5

1. 相似度：87.5%
   场景：打斗
   情绪：紧张，热血
   节奏：快节奏
   来源：没钱修什么仙_第 1-10 章.txt

   内容预览:
   张羽只觉胸口一痛，低头看去，只见一柄长剑已刺入...

依赖

pip3 install -r requirements.txt --user

配置

编辑 configs/default_config.yml 修改语料库路径。

相关文件

scripts/search_corpus.py - 主程序
configs/default_config.yml - 配置文件

Version: 1.0.0

安全使用建议

This skill appears to do what it says: local semantic search over a ChromaDB corpus produced by corpus-builder. Before installing or running: 1) ensure the configured persist_directory points to the corpus you expect (inspect configs/default_config.yml); 2) be aware model loading (sentence-transformers) may download large weights from the internet — run in an environment with sufficient disk space and network policy you control; 3) verify the corpus directory contains only data you're willing to let the skill read (it will access files under the corpus-builder path); 4) optionally run the script in a sandbox or inspect the full script if you want to confirm behavior. The minor issues: requirements.txt includes diskcache although the code currently uses in-memory caching — harmless but worth noting.

功能分析

Type: OpenClaw Skill Name: corpus-search Version: 1.0.1 The corpus-search skill bundle is a legitimate utility for performing semantic searches and metadata filtering on text corpora. The primary script, scripts/search_corpus.py, uses standard libraries like ChromaDB and SentenceTransformers to query a vector database, with explicit privacy measures such as disabling anonymized telemetry. No indicators of malicious intent, data exfiltration, or prompt injection were found; the code aligns perfectly with its stated purpose of retrieving writing materials.

能力评估

✓ Purpose & Capability

Name/description (语料检索，与 corpus-builder 配合) matches the files and behavior: it opens a ChromaDB persistent client in the corpus-builder corpus path, computes embeddings via sentence-transformers, and supports metadata filters. The storage path in default_config.yml explicitly points to the corpus-builder corpus directory, which is expected for this purpose.

✓ Instruction Scope

SKILL.md only instructs running the included Python script and editing the config to point to the corpus. The script operates on the configured local persist_directory and does not reference unrelated system paths or require environment secrets. Note: loading the specified embedding model (SentenceTransformer with model name 'BAAI/bge-small-zh-v1.5') will typically download model weights from the model host (internet access) unless already cached.

✓ Install Mechanism

There is no install hook; dependencies are declared in requirements.txt (pip). Those packages are plausible for the task (chromadb, sentence-transformers, pyyaml, rich, tqdm). No archives or external install URLs are used. The only minor mismatch: requirements.txt lists diskcache but the code currently uses only an in-memory cache (comment indicates diskcache was removed).

✓ Credentials

The skill requests no environment variables or credentials and does not require unrelated secrets. The only notable external access is model download via sentence-transformers/HuggingFace (public model name provided) which does not require credentials for a public model; if a private model were used the user would need to provide HF credentials separately (not requested by this skill).

✓ Persistence & Privilege

always is false and the skill is user-invocable. It does not modify other skills' configs or require persistent system-wide privileges. It reads from a local corpus directory (expected).

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install corpus-search
安装完成后，直接呼叫该 Skill 的名称或使用 /corpus-search 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

代码质量优化

v1.0.0

语料检索工具

元数据

Slug corpus-search

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

Corpus Search 是什么？

语料检索工具，与 corpus-builder 配合使用。支持语义搜索、元数据过滤（场景/情绪/节奏/质量）。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 97 次。

如何安装 Corpus Search？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install corpus-search」即可一键安装，无需额外配置。

Corpus Search 是免费的吗？

是的，Corpus Search 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Corpus Search 支持哪些平台？

Corpus Search 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Corpus Search？

由 yuzhihui886（@yuzhihui886）开发并维护，当前版本 v1.0.1。