Hybrid Retrieval (BM25 + Vector + Graph)

Name: Hybrid Retrieval (BM25 + Vector + Graph)
Author: vnesin-sarai

Description

Design and build a hybrid retrieval system combining BM25 keyword search, vector embeddings, and knowledge graph traversal for AI agent memory. Use when buil...

README (SKILL.md)

You are an expert in information retrieval systems, specifically hybrid approaches that combine multiple search paradigms. Help the user design and build a retrieval system inspired by the BlackRock/NVIDIA HybridRAG paper.

Core Insight

No single retrieval method works for everything:

Method	Strength	Weakness
BM25 (keyword)	Exact matches, names, IDs, codes	Misses synonyms and semantic meaning
Vector (embedding)	Semantic similarity, paraphrases	Struggles with exact terms, numbers, names
Graph (knowledge graph)	Relationships, multi-hop reasoning	Requires structured extraction, maintenance

The hybrid approach: Run all three in parallel, then fuse results with weighted scoring. Each method catches what the others miss.

Architecture Pattern

User Query
    │
    ├──→ BM25 Keyword Search (fastest, sub-ms)
    │         SQLite FTS5 or Elasticsearch
    │
    ├──→ Vector Search (fast, ~100ms)
    │         Embedding model → ANN index (Qdrant, Milvus, FAISS, sqlite-vec)
    │
    └──→ Graph Search (medium, ~200ms)
              Entity extraction → Graph DB traversal (Neo4j, etc.)
    │
    └──→ Fusion Layer
              Weighted merge → Deduplication → Reranking → Top-K results

Step-by-Step Design

Step 1: Choose Your Document Store

Your chunks need to live somewhere. Options:

SQLite + FTS5 + vec0 — Single file, zero infrastructure, good up to ~100K chunks
PostgreSQL + pgvector — Production-ready, handles millions
Qdrant / Milvus — Purpose-built vector DBs, best for scale
Elasticsearch — If you already use it, it does BM25 + vector natively

Recommendation for most projects: Start with SQLite (FTS5 for keywords, vec0 for vectors). Migrate when you hit performance limits.

Step 2: Choose Your Embedding Model

Model	Dimensions	Quality	Speed	Cost
OpenAI text-embedding-3-small	1536	Good	Fast	$0.02/1M tokens
Voyage AI voyage-3	1024	Very good	Fast	$0.06/1M tokens
NV-Embed-v2 (self-hosted)	4096	Excellent	Medium	Free (GPU needed)
nomic-embed-text (Ollama)	768	Good	Fast	Free (CPU ok)

Key decision: Self-hosted = free but needs GPU. Cloud = easy but recurring cost. For production agent memory, self-hosted pays for itself quickly.

Step 3: Chunking Strategy

Bad chunking ruins everything. Rules:

Chunk by semantic unit — sections, paragraphs, conversations. NOT fixed-size windows.
Include metadata — file path, date, source type. You'll filter on this later.
Overlap sparingly — 10-20% overlap prevents losing context at boundaries.
Keep chunks 200-600 tokens — too small = no context, too large = noise.

Step 4: BM25 Layer

-- SQLite FTS5 example
CREATE VIRTUAL TABLE chunks_fts USING fts5(path, text, source);

-- Search
SELECT path, text, rank
FROM chunks_fts
WHERE chunks_fts MATCH 'query terms'
ORDER BY rank
LIMIT 20;

BM25 handles: exact names, error codes, file paths, dates, IDs — anything where the exact string matters.

Step 5: Vector Layer

# Embed query
query_vec = embed("What is the deployment status?")

# ANN search (sqlite-vec example)
results = db.execute(
    "SELECT id, distance FROM chunks_vec "
    "WHERE embedding MATCH ? AND k = ? ORDER BY distance",
    (query_vec_blob, 20)
)

Vector handles: semantic questions, paraphrases, "find things related to X" — meaning over matching.

Step 6: Graph Layer (Optional but Powerful)

// Neo4j: Find entity and its connections
MATCH (n) WHERE n.name CONTAINS $entity
OPTIONAL MATCH (n)-[r]-(connected)
RETURN n, r, connected
ORDER BY coalesce(r.weight, 1.0) DESC
LIMIT 10

Graph handles: "Who works with X?", "What's related to Y?", multi-hop reasoning — relationships that flat search can't find.

Step 7: Fusion

The critical part — merging results from all three methods:

def fuse_results(bm25_results, vector_results, graph_results,
                 bm25_weight=0.3, vector_weight=0.5, graph_weight=0.8):
    all_results = {}

    for r in bm25_results:
        key = r["path"] + ":" + r["text"][:100]
        all_results[key] = {**r, "score": r["score"] * bm25_weight}

    for r in vector_results:
        key = r["path"] + ":" + r["text"][:100]
        if key in all_results:
            all_results[key]["score"] += r["score"] * vector_weight
        else:
            all_results[key] = {**r, "score": r["score"] * vector_weight}

    for r in graph_results:
        key = r["path"] + ":" + r["text"][:100]
        if key in all_results:
            all_results[key]["score"] += r["score"] * graph_weight
        else:
            all_results[key] = {**r, "score": r["score"] * graph_weight}

    return sorted(all_results.values(), key=lambda x: x["score"], reverse=True)

Weight tuning:

Graph results get highest weight — if the KG found a relevant entity, it's almost certainly right
Vector gets medium weight — good general recall
BM25 gets lowest weight — precise but narrow

Step 8: Deduplication and Reranking

After fusion:

Deduplicate by text content (not path — same file can have multiple relevant chunks)
MMR reranking (optional) — Maximal Marginal Relevance reduces redundancy by penalising results too similar to already-selected ones
Score threshold — drop anything below 0.3 (tune this for your data)

Common Mistakes

Using only vector search — Misses exact matches. "Port 8034" won't match semantically.
Fixed-size chunking — Splitting mid-sentence destroys context.
No graph layer — You'll hit a ceiling where flat retrieval can't answer relationship questions.
Reranking with the same model — If you rerank with the same embeddings you searched with, you're just re-sorting the same biases.
Ignoring BM25 — It's the fastest layer and catches what vectors miss. Always include it.

When to Add Complexity

If you have...	You need...
\x3C 1K chunks	BM25 only (SQLite FTS5)
1K - 50K chunks	BM25 + Vector
50K+ chunks	BM25 + Vector + Graph
Multiple data sources (chats, emails, docs)	Separate collections with routing
Real-time requirements	Parallel search with timeouts

Output

Help the user:

Assess their data volume and types
Choose appropriate layers (BM25, vector, graph)
Select embedding model and storage backend
Design their chunking strategy
Implement fusion with appropriate weights
Set up a simple evaluation (test queries → expected results)

This skill is an instructional guide for building a hybrid retrieval system and appears internally coherent. Before using it: (1) be aware that implementing the examples will require you to provide credentials for embedding APIs and DBs—do not paste secrets into chat or shared contexts; (2) evaluate data privacy: indexing private documents and sending them to cloud embedding providers may expose sensitive data—consider self-hosted models if that matters; (3) expect resource costs (GPU, vector DB hosting, API costs) and plan accordingly; (4) if you let an agent invoke this skill autonomously, restrict what data the agent can access and which credentials it can use. If you want a deeper security review, provide any install scripts, connector code, or explicit calls the agent will run so those can be inspected for risky behavior.

Capability Analysis

Type: OpenClaw Skill Name: hybrid-retrieval Version: 1.0.0 The skill bundle is an educational guide for designing hybrid retrieval systems (RAG) using BM25, vector embeddings, and knowledge graphs. It contains architectural patterns, design steps, and illustrative code snippets (SQL, Python, Cypher) in SKILL.md that are entirely consistent with its stated purpose and show no signs of malicious intent or data exfiltration.

Capability Assessment

✓ Purpose & Capability

The name and description match the SKILL.md content: it describes BM25, vector embeddings, and knowledge-graph search, plus fusion/reranking. All required pieces (SQLite/pgvector/Qdrant, embedding models, Neo4j) are appropriate for building the described hybrid retrieval system.

ℹ Instruction Scope

The SKILL.md gives implementation examples (SQL, Python, Cypher) and high-level operational guidance. It references chunking source data (files, paths, metadata) which is expected for a retrieval system, but it does not instruct the agent to read arbitrary system files or exfiltrate data. Note: examples reference using third-party embedding services and vector DBs; following those examples will require the user/agent to connect to external services (not done automatically by the SKILL.md itself).

✓ Install Mechanism

This is an instruction-only skill with no install spec and no code files. That minimizes risk: nothing is downloaded or written to disk by the skill itself.

ℹ Credentials

The skill declares no required environment variables or credentials, yet recommends using external embedding APIs (OpenAI, Voyage AI) and external DBs (Qdrant, Neo4j). This is not inherently malicious, but it's an omission: users will need to supply API keys/DB credentials when implementing the architecture—those are not requested by the skill but will be needed in practice.

✓ Persistence & Privilege

The skill does not request always:true, has no install actions, and does not modify agent/system configuration. It does not request permanent presence or escalate privileges.

Version History

v1.0.0

Initial release — design and build hybrid RAG systems combining BM25, vector embeddings, and knowledge graph traversal

Metadata

Slug hybrid-retrieval

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Hybrid Retrieval (BM25 + Vector + Graph)?

Design and build a hybrid retrieval system combining BM25 keyword search, vector embeddings, and knowledge graph traversal for AI agent memory. Use when buil... It is an AI Agent Skill for Claude Code / OpenClaw, with 104 downloads so far.

How do I install Hybrid Retrieval (BM25 + Vector + Graph)?

Run "/install hybrid-retrieval" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Hybrid Retrieval (BM25 + Vector + Graph) free?

Yes, Hybrid Retrieval (BM25 + Vector + Graph) is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Hybrid Retrieval (BM25 + Vector + Graph) support?

Hybrid Retrieval (BM25 + Vector + Graph) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Hybrid Retrieval (BM25 + Vector + Graph)?

It is built and maintained by SARAI Defence (@vnesin-sarai); the current version is v1.0.0.

More Skills