← 返回 Skills 市场
Byted Bytehouse Hybrid Search
作者
volcengine-skills
· GitHub ↗
· v1.0.0
· MIT-0
96
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install byted-bytehouse-hybrid-search
功能描述
ByteHouse 混合检索 Skill,支持全文检索 + 向量检索,结合 RRF 重排算法实现更精准的检索结果。当用户需要在ByteHouse数据库中进行全文检索 + 向量检索,结合 RRF 重排算法实现更精准的检索结果时,使用此Skill。
使用说明 (SKILL.md)
ByteHouse 混合检索 Skill
🚀 快速开始
环境准备
pip install clickhouse-connect volcengine-python-sdk[ark] numpy scipy
环境变量配置
优先从环境变量读取配置,禁止硬编码明文敏感信息:
# ByteHouse 配置
export BYTEHOUSE_HOST="\x3C你的ByteHouse连接地址>"
export BYTEHOUSE_PORT="\x3CByteHouse端口>"
export BYTEHOUSE_USER="\x3CByteHouse用户名>"
export BYTEHOUSE_PASSWORD="\x3CByteHouse密码>"
export BYTEHOUSE_DATABASE="\x3C默认数据库,可选,默认default>"
export BYTEHOUSE_SECURE="\x3C是否启用加密,可选,默认true>"
# 火山引擎方舟 API 配置
export ARK_API_KEY="\x3C火山引擎方舟API密钥>"
export ARK_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export EMBEDDING_MODEL="doubao-embedding-vision-251215" # 文本向量化模型
export EMBEDDING_DIMENSIONS="1536" # 向量维度,可选,默认1536
如果环境变量未配置,会自动提示用户输入。
📚 核心能力
1. 文本向量化
基于豆包文本向量化模型生成文本向量,支持任意长度中文文本。
2. 双索引构建
| 索引类型 | 说明 | 适用场景 |
|---|---|---|
| 全文倒排索引 | 基于BM25算法的全文检索,支持关键词匹配 | 精准关键词召回 |
| 向量索引 | 基于HNSW的向量相似度检索,支持语义匹配 | 语义相似召回 |
3. 核心功能
| 功能 | 方法 | 说明 |
|---|---|---|
| 全文检索 | fulltext_search() |
基于BM25的全文检索,返回BM25分数 |
| 向量检索 | vector_search() |
基于余弦相似度的向量检索,返回相似度分数 |
| 混合检索+RRF重排 | hybrid_search() |
双路召回后使用RRF算法重排,返回最终结果 |
| 自动生成向量 | insert_document()/batch_insert_documents() |
插入文档时自动生成向量并存储,无需手动处理 |
| 单个文档向量更新 | update_document_embedding() |
为单个文档重新生成并更新向量 |
| 批量补全缺失向量 | batch_update_missing_embeddings() |
自动扫描表中所有缺少向量的文档,批量生成并补全向量 |
4. RRF重排算法
Reciprocal Rank Fusion 算法,综合全文检索和向量检索的排名结果,公式:
score = Σ 1 / (k + rank)
默认k=60,可自定义调整。
📖 代码实现
完整示例代码实现位于 scripts/ 目录:
scripts/embedding.py- 文本向量化模块scripts/hybrid_search_client.py- ByteHouse 混合检索客户端scripts/examples.py- 使用示例
快速使用
from scripts import ByteHouseHybridSearch
# 初始化客户端
search = ByteHouseHybridSearch(connection_type="http")
# 创建混合检索表(自动构建全文索引和向量索引)
search.create_hybrid_table("my_hybrid_index")
# 插入文档(自动生成向量 + 存储原始文本)
search.insert_document("my_hybrid_index", doc_id=1,
title="ByteHouse 混合检索",
content="ByteHouse 支持全文检索和向量检索,可实现混合检索能力")
# 混合检索(自动执行全文+向量检索,RRF重排返回结果)
results = search.hybrid_search("my_hybrid_index", query="ByteHouse检索能力", top_k=10)
⚙️ 最佳实践
建表配置
CREATE TABLE {table_name} (
`doc_id` UInt64,
`title` String,
`content` String,
`embedding` Array(Float32),
-- 全文倒排索引(version=2支持BM25分数)
INDEX content_idx content TYPE inverted('standard', '{"version":"v2"}') GRANULARITY 1,
-- 向量索引(HNSW算法,余弦相似度)
INDEX embedding_idx embedding TYPE HNSW_SQ('DIM={vec_dimensions}', 'metric=COSINE', 'M=32', 'EF_CONSTRUCTION=256') GRANULARITY 1
)
ENGINE = MergeTree()
ORDER BY doc_id
SETTINGS
index_granularity = 1024,
enable_vector_index_preload = 1
RRF参数调整
- 当全文检索结果更重要时,可降低
rrf_k值(推荐30-60) - 当向量检索结果更重要时,可提高
rrf_k值(推荐60-100)
🔗 参考文档
安全使用建议
This skill appears to implement the advertised hybrid search, but the package metadata and README disagree with the code about what it needs. Before installing or running it:
- Treat it as requiring sensitive credentials: you must provide BYTEHOUSE_HOST and BYTEHOUSE_PASSWORD (ByteHouse DB access) and ARK_API_KEY (embedding API). Don't assume the registry declared these. Use least-privilege credentials and a test account.
- Note the discrepant defaults: SKILL.md suggests an embedding model and dimension (doubao-embedding-vision-251215 / 1536) but embedding.py sets a different default model and dimensions (doubao-embedding-text-240715 / 2560). Verify you want the model/dimension used by the code.
- SKILL.md's pip instructions omit the openai package that embedding.py imports; ensure your environment installs the openai client (or adapt the code to the volcengine SDK) to avoid runtime surprises.
- Run in an isolated environment (container/vm) and review network traffic if you need to be sure where data is sent; the code contacts the configured ARK_BASE_URL and the ByteHouse host only.
- If you don't trust the source, ask the publisher to correct the registry metadata (declare required env vars and primary credential) and to align SKILL.md with the code. If anything else looks unexpected after that, reconsider installation.
功能分析
Type: OpenClaw Skill
Name: byted-bytehouse-hybrid-search
Version: 1.0.0
The skill contains SQL injection vulnerabilities in `scripts/hybrid_search_client.py` where the `table_name` and `top_k` parameters are interpolated directly into SQL strings using f-strings in methods such as `create_hybrid_table`, `fulltext_search`, and `vector_search`. While the code's logic is consistent with its stated purpose of providing ByteHouse hybrid search capabilities, the lack of sanitization for these identifiers represents a significant security flaw that could be exploited if the agent passes untrusted input.
能力评估
Purpose & Capability
The name/description (ByteHouse hybrid search with RRF) matches the code and SKILL.md: the code implements full-text + vector search, embedding generation, and RRF re-ranking. However the registry metadata lists no required environment variables or primary credential while the code and SKILL.md clearly require ByteHouse connection info and an Ark/OpenAI API key — this is an internal inconsistency.
Instruction Scope
SKILL.md instructs installing clickhouse-connect and volcengine SDK and to set environment variables for BYTEHOUSE_* and ARK_API_KEY. The runtime code enforces these (it raises ValueError if BYTEHOUSE_HOST/BYTEHOUSE_PASSWORD or ARK_API_KEY are missing), generates embeddings, and executes SQL (CREATE TABLE, INSERT, queries) against the target ByteHouse instance. The instructions do not request or read any unrelated system files, but they do prompt the agent to collect and use secrets (DB password and API key).
Install Mechanism
This is instruction-only with no install spec; there is no packaged install step that would download arbitrary artifacts. SKILL.md recommends pip install of clickhouse-connect, volcengine-python-sdk[ark], numpy, scipy. No extracted downloads or obscure URLs are used.
Credentials
The registry claims no required env vars, but the code requires BYTEHOUSE_HOST, BYTEHOUSE_PASSWORD (and ARK_API_KEY) and will fail without them. That mismatch is meaningful: sensitive credentials are necessary for the skill to function but are not declared in metadata/primary credential fields. Also SKILL.md and code disagree on embedding model name and default dimension, and SKILL.md's pip install list does not include the openai package that the code uses.
Persistence & Privilege
The skill does not request persistent 'always' inclusion and does not modify other skills or system-wide settings. It opens network connections to ByteHouse and the configured embedding API (Ark via OpenAI-compatible client), which is expected for its purpose.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install byted-bytehouse-hybrid-search - 安装完成后,直接呼叫该 Skill 的名称或使用
/byted-bytehouse-hybrid-search触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of byted-bytehouse-hybrid-search.
- Supports hybrid search on ByteHouse using both full-text (BM25) and vector (HNSW) indexes.
- Implements Reciprocal Rank Fusion (RRF) algorithm for reranking combined search results.
- Provides automatic embedding generation, single and batch document insertion, and updating of missing embeddings.
- Example code and usage instructions included.
元数据
常见问题
Byted Bytehouse Hybrid Search 是什么?
ByteHouse 混合检索 Skill,支持全文检索 + 向量检索,结合 RRF 重排算法实现更精准的检索结果。当用户需要在ByteHouse数据库中进行全文检索 + 向量检索,结合 RRF 重排算法实现更精准的检索结果时,使用此Skill。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 96 次。
如何安装 Byted Bytehouse Hybrid Search?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install byted-bytehouse-hybrid-search」即可一键安装,无需额外配置。
Byted Bytehouse Hybrid Search 是免费的吗?
是的,Byted Bytehouse Hybrid Search 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Byted Bytehouse Hybrid Search 支持哪些平台?
Byted Bytehouse Hybrid Search 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Byted Bytehouse Hybrid Search?
由 volcengine-skills(@volcengine-skills)开发并维护,当前版本 v1.0.0。
推荐 Skills