← Back to Skills Marketplace
yuzhihui886

Corpus Search

by yuzhihui886 · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
97
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install corpus-search
Description
语料检索工具,与 corpus-builder 配合使用。支持语义搜索、元数据过滤(场景/情绪/节奏/质量)。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。
README (SKILL.md)

Corpus Search - 语料检索工具

与 corpus-builder 配合使用的语料检索工具,支持语义搜索和元数据过滤。

快速开始

cd ~/.openclaw/workspace/skills/corpus-search

# 基础搜索
python3 scripts/search_corpus.py -q "紧张的打斗场景" -c xuanhuan-full --limit 10

# 按场景过滤
python3 scripts/search_corpus.py -q "围攻" -c xuanhuan-full --scene 打斗 --limit 5

# 按情绪过滤
python3 scripts/search_corpus.py -q "修炼" -c xuanhuan-full --emotion 紧张 --limit 10

# JSON 输出
python3 scripts/search_corpus.py -q "突破" -c xuanhuan-full --json

命令行选项

选项 说明
-q, --query 搜索查询(必填)
-c, --collection 语料库名称(必填)
--limit 返回数量(默认 10)
--scene 场景过滤(打斗/修炼/对话/探险等)
--emotion 情绪过滤(紧张/轻松/悲伤/热血等)
--min-quality 最低质量分(1-10)
--json JSON 格式输出
--export 导出到文件
--verbose 详细输出

输出示例

🔍 搜索结果:紧张的打斗场景
   语料库:xuanhuan-full
   返回数量:5

1. 相似度:87.5%
   场景:打斗
   情绪:紧张,热血
   节奏:快节奏
   来源:没钱修什么仙_第 1-10 章.txt

   内容预览:
   张羽只觉胸口一痛,低头看去,只见一柄长剑已刺入...

依赖

pip3 install -r requirements.txt --user

配置

编辑 configs/default_config.yml 修改语料库路径。

相关文件

  • scripts/search_corpus.py - 主程序
  • configs/default_config.yml - 配置文件

Version: 1.0.0

Usage Guidance
This skill appears to do what it says: local semantic search over a ChromaDB corpus produced by corpus-builder. Before installing or running: 1) ensure the configured persist_directory points to the corpus you expect (inspect configs/default_config.yml); 2) be aware model loading (sentence-transformers) may download large weights from the internet — run in an environment with sufficient disk space and network policy you control; 3) verify the corpus directory contains only data you're willing to let the skill read (it will access files under the corpus-builder path); 4) optionally run the script in a sandbox or inspect the full script if you want to confirm behavior. The minor issues: requirements.txt includes diskcache although the code currently uses in-memory caching — harmless but worth noting.
Capability Analysis
Type: OpenClaw Skill Name: corpus-search Version: 1.0.1 The corpus-search skill bundle is a legitimate utility for performing semantic searches and metadata filtering on text corpora. The primary script, scripts/search_corpus.py, uses standard libraries like ChromaDB and SentenceTransformers to query a vector database, with explicit privacy measures such as disabling anonymized telemetry. No indicators of malicious intent, data exfiltration, or prompt injection were found; the code aligns perfectly with its stated purpose of retrieving writing materials.
Capability Assessment
Purpose & Capability
Name/description (语料检索,与 corpus-builder 配合) matches the files and behavior: it opens a ChromaDB persistent client in the corpus-builder corpus path, computes embeddings via sentence-transformers, and supports metadata filters. The storage path in default_config.yml explicitly points to the corpus-builder corpus directory, which is expected for this purpose.
Instruction Scope
SKILL.md only instructs running the included Python script and editing the config to point to the corpus. The script operates on the configured local persist_directory and does not reference unrelated system paths or require environment secrets. Note: loading the specified embedding model (SentenceTransformer with model name 'BAAI/bge-small-zh-v1.5') will typically download model weights from the model host (internet access) unless already cached.
Install Mechanism
There is no install hook; dependencies are declared in requirements.txt (pip). Those packages are plausible for the task (chromadb, sentence-transformers, pyyaml, rich, tqdm). No archives or external install URLs are used. The only minor mismatch: requirements.txt lists diskcache but the code currently uses only an in-memory cache (comment indicates diskcache was removed).
Credentials
The skill requests no environment variables or credentials and does not require unrelated secrets. The only notable external access is model download via sentence-transformers/HuggingFace (public model name provided) which does not require credentials for a public model; if a private model were used the user would need to provide HF credentials separately (not requested by this skill).
Persistence & Privilege
always is false and the skill is user-invocable. It does not modify other skills' configs or require persistent system-wide privileges. It reads from a local corpus directory (expected).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install corpus-search
  3. After installation, invoke the skill by name or use /corpus-search
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
代码质量优化
v1.0.0
语料检索工具
Metadata
Slug corpus-search
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Corpus Search?

语料检索工具,与 corpus-builder 配合使用。支持语义搜索、元数据过滤(场景/情绪/节奏/质量)。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。 It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install Corpus Search?

Run "/install corpus-search" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Corpus Search free?

Yes, Corpus Search is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Corpus Search support?

Corpus Search is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Corpus Search?

It is built and maintained by yuzhihui886 (@yuzhihui886); the current version is v1.0.1.

💬 Comments