← Back to Skills Marketplace

Corpus Search

Name: Corpus Search
Author: yuzhihui886

by yuzhihui886 · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install corpus-search

Description

语料检索工具，与 corpus-builder 配合使用。支持语义搜索、元数据过滤（场景/情绪/节奏/质量）。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。

README (SKILL.md)

Corpus Search - 语料检索工具

与 corpus-builder 配合使用的语料检索工具，支持语义搜索和元数据过滤。

快速开始

cd ~/.openclaw/workspace/skills/corpus-search

# 基础搜索
python3 scripts/search_corpus.py -q "紧张的打斗场景" -c xuanhuan-full --limit 10

# 按场景过滤
python3 scripts/search_corpus.py -q "围攻" -c xuanhuan-full --scene 打斗 --limit 5

# 按情绪过滤
python3 scripts/search_corpus.py -q "修炼" -c xuanhuan-full --emotion 紧张 --limit 10

# JSON 输出
python3 scripts/search_corpus.py -q "突破" -c xuanhuan-full --json

命令行选项

选项	说明
`-q, --query`	搜索查询（必填）
`-c, --collection`	语料库名称（必填）
`--limit`	返回数量（默认 10）
`--scene`	场景过滤（打斗/修炼/对话/探险等）
`--emotion`	情绪过滤（紧张/轻松/悲伤/热血等）
`--min-quality`	最低质量分（1-10）
`--json`	JSON 格式输出
`--export`	导出到文件
`--verbose`	详细输出

输出示例

🔍 搜索结果：紧张的打斗场景
   语料库：xuanhuan-full
   返回数量：5

1. 相似度：87.5%
   场景：打斗
   情绪：紧张，热血
   节奏：快节奏
   来源：没钱修什么仙_第 1-10 章.txt

   内容预览:
   张羽只觉胸口一痛，低头看去，只见一柄长剑已刺入...

依赖

pip3 install -r requirements.txt --user

配置

编辑 configs/default_config.yml 修改语料库路径。

相关文件

scripts/search_corpus.py - 主程序
configs/default_config.yml - 配置文件

Version: 1.0.0

Usage Guidance

This skill appears to do what it says: local semantic search over a ChromaDB corpus produced by corpus-builder. Before installing or running: 1) ensure the configured persist_directory points to the corpus you expect (inspect configs/default_config.yml); 2) be aware model loading (sentence-transformers) may download large weights from the internet — run in an environment with sufficient disk space and network policy you control; 3) verify the corpus directory contains only data you're willing to let the skill read (it will access files under the corpus-builder path); 4) optionally run the script in a sandbox or inspect the full script if you want to confirm behavior. The minor issues: requirements.txt includes diskcache although the code currently uses in-memory caching — harmless but worth noting.

Capability Analysis

Type: OpenClaw Skill Name: corpus-search Version: 1.0.1 The corpus-search skill bundle is a legitimate utility for performing semantic searches and metadata filtering on text corpora. The primary script, scripts/search_corpus.py, uses standard libraries like ChromaDB and SentenceTransformers to query a vector database, with explicit privacy measures such as disabling anonymized telemetry. No indicators of malicious intent, data exfiltration, or prompt injection were found; the code aligns perfectly with its stated purpose of retrieving writing materials.

Capability Assessment

✓ Purpose & Capability

Name/description (语料检索，与 corpus-builder 配合) matches the files and behavior: it opens a ChromaDB persistent client in the corpus-builder corpus path, computes embeddings via sentence-transformers, and supports metadata filters. The storage path in default_config.yml explicitly points to the corpus-builder corpus directory, which is expected for this purpose.

✓ Instruction Scope

SKILL.md only instructs running the included Python script and editing the config to point to the corpus. The script operates on the configured local persist_directory and does not reference unrelated system paths or require environment secrets. Note: loading the specified embedding model (SentenceTransformer with model name 'BAAI/bge-small-zh-v1.5') will typically download model weights from the model host (internet access) unless already cached.

✓ Install Mechanism

There is no install hook; dependencies are declared in requirements.txt (pip). Those packages are plausible for the task (chromadb, sentence-transformers, pyyaml, rich, tqdm). No archives or external install URLs are used. The only minor mismatch: requirements.txt lists diskcache but the code currently uses only an in-memory cache (comment indicates diskcache was removed).

✓ Credentials

The skill requests no environment variables or credentials and does not require unrelated secrets. The only notable external access is model download via sentence-transformers/HuggingFace (public model name provided) which does not require credentials for a public model; if a private model were used the user would need to provide HF credentials separately (not requested by this skill).

✓ Persistence & Privilege

always is false and the skill is user-invocable. It does not modify other skills' configs or require persistent system-wide privileges. It reads from a local corpus directory (expected).

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install corpus-search
After installation, invoke the skill by name or use /corpus-search
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

代码质量优化

v1.0.0

语料检索工具

Metadata

Slug corpus-search

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Corpus Search?

语料检索工具，与 corpus-builder 配合使用。支持语义搜索、元数据过滤（场景/情绪/节奏/质量）。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。 It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install Corpus Search?

Run "/install corpus-search" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Corpus Search free?

Yes, Corpus Search is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Corpus Search support?

Corpus Search is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Corpus Search?

It is built and maintained by yuzhihui886 (@yuzhihui886); the current version is v1.0.1.

More Skills