← 返回 Skills 市场
lookupmark

Local RAG

作者 LookUpMark · GitHub ↗ · v1.9.1 · MIT-0
cross-platform ✓ 安全检测通过
208
总下载
0
收藏
0
当前安装
14
版本数
在 OpenClaw 中安装
/install lookupmark-local-rag
功能描述
Semantic search over local files using all-MiniLM-L6-v2 embeddings and ms-marco-MiniLM-L-6-v2 cross-encoder reranking with ChromaDB and parent-child chunking...
使用说明 (SKILL.md)

Local RAG

Semantic search over indexed local files with parent-child chunking for precise retrieval with full context.

Architecture

Component Model Size
Embeddings sentence-transformers/all-MiniLM-L6-v2 ~80MB
Reranker cross-encoder/ms-marco-MiniLM-L-6-v2 ~80MB
Vector DB ChromaDB (persistent, cosine similarity, HNSW) varies
Chunking Parent-child

Memory strategy: Embedding model loaded first → freed with gc.collect() → reranker loaded → freed after scoring. This keeps peak RAM ~400MB on ARM.

Chunking Strategy

  • Child chunks: 128 words, 24 overlap → embedded for semantic search
  • Parent chunks: 768 words → stored as full context, returned to user
  • When a child matches → its parent is returned, giving surrounding context

Running

All scripts must use the venv Python:

VENV=~/.local/share/local-rag/venv/bin/python

Indexing

# Incremental index (default — skips unchanged files via SHA-256 hash)
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index.py

# Re-index from scratch
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index.py --reindex

# Custom paths
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index.py --paths ~/Documenti ~/Progetti

# Batch indexing (per-subfolder with git checkpoints, for low-RAM systems)
bash ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index-batch.sh

Querying

# Basic query
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "what are the termination clauses?"

# More results
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "Falcon LLM" --top-k 30 --top-n 5

# JSON output for programmatic use
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "transformer architecture" --json

# With timeout
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "deep learning" --timeout 60

Options:

  • --top-k N — Child candidates from vector search (default: 20)
  • --top-n N — Final parent results after reranking (default: 3)
  • --json — JSON output
  • --timeout N — Max seconds per query (default: 120)

Monitoring

$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py              # Status
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --watch      # Auto-refresh
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --log        # Logs
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --errors     # Errors only
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --git        # Git checkpoints

Supported Formats

Documents only (no code files):

  • Text: .txt, .md, .csv, .json, .yaml, .yml, .toml, .tex, .bib
  • Documents: .pdf (pdfminer.six), .docx (python-docx), .pptx

Excluded: .py, .js, .sh, .ipynb, .html, .css and all code files.

Limits (for 4GB ARM)

  • PDF max size: 5MB (larger PDFs cause OOM with pdfminer)
  • Max file size: 30MB
  • Embedding batch size: 1 (conservative)
  • Excluded dirs: .git, .venv, node_modules, __pycache__, labs, exercises, src, scripts, ablation, test*, fixtures

Storage

Path Purpose
~/.local/share/local-rag/chromadb/ ChromaDB data (git repo for rollback)
~/.local/share/local-rag/venv/ Python venv with dependencies
~/.local/share/local-rag/index.lock Prevents concurrent indexing
~/.local/share/local-rag/index-batch.log Batch indexing log
~/.local/share/local-rag/queries.log Query history log

Security

  • ALLOWED_ROOTS: Only ~/Documenti/github/thesis, ~/Documenti/github/polito, ~/Documenti, ~/Scaricati
  • BLOCKED_PATTERNS: .ssh, .gnupg, .env, credentials, tokens, .config/openclaw
  • Credentials directory is blacklisted — never indexed

Workflow

  1. Run index.py — builds/rebuilds the index (incremental via SHA-256 hash check)
  2. Run periodically to pick up new/changed files (daily cron recommended)
  3. Use query.py to search with natural language
  4. Results include: file path, relevance score, matched snippet, full parent context
  5. Check monitor.py for stats and queries.log for query history
安全使用建议
This appears to be a legitimate local-document RAG tool. Before installing or running it: 1) ensure you have or will create the indicated Python venv and install dependencies (sentence-transformers, chromadb, pdfminer, python-docx, etc.); 2) be aware the first run will download ML models from Hugging Face (internet access); 3) the tool creates a ChromaDB and logs under ~/.local/share/local-rag — check and if desired change that path or back it up; 4) the skill appends queries (truncated) to ~/.local/share/local-rag/queries.log, so avoid searching highly sensitive secrets or adjust/disable logging; 5) verify ALLOWED_ROOTS and BLOCKED_PATTERNS in scripts/index.py match what you want indexed (the default only indexes ~/Documenti and ~/Scaricati and subpaths listed); and 6) note small documentation mismatches (supported extensions list vs code) and the skill provides no automated installer — review and run the scripts manually the first time to confirm behavior.
功能分析
Type: OpenClaw Skill Name: lookupmark-local-rag Version: 1.9.1 The 'lookupmark-local-rag' skill is a legitimate implementation of a local Retrieval-Augmented Generation (RAG) system optimized for low-RAM ARM devices. It includes robust security controls in `scripts/index.py`, such as directory allow-listing (`ALLOWED_ROOTS`) and sensitive file blacklisting (`BLOCKED_PATTERNS` for `.ssh`, `.env`, etc.) to prevent accidental indexing of credentials. The use of Git in `scripts/index-batch.sh` for database checkpoints is a functional recovery mechanism for OOM errors, and no evidence of data exfiltration, unauthorized network access, or malicious prompt injection was found.
能力标签
crypto
能力评估
Purpose & Capability
Name/description (Local RAG for semantic search) match the included scripts (index.py, query.py, monitor.py, batch scripts). The code only operates on local files and a local ChromaDB, and uses expected components (sentence-transformers, cross-encoder, ChromaDB). No unrelated credentials, external service tokens, or cloud SDKs are requested.
Instruction Scope
SKILL.md and scripts limit indexing to specified home directories and explicitly blacklist typical credential dirs. The code reads system state (/proc/meminfo) and runs local commands (ps, du, git) for monitoring — appropriate for monitoring/indexing. Two minor scope mismatches: SKILL.md's 'Supported Formats' lists some extensions (e.g., .json, .yaml, .pptx) that are not all reflected in index.py's TEXT_EXTENSIONS/ALL_EXTENSIONS, and the SKILL.md expects a venv at ~/.local/share/local-rag/venv but the skill does not include an installer to create it. Also, queries are appended to a local queries.log (question truncated to 80 chars) — this is local logging and a privacy consideration but coherent with the tool's purpose.
Install Mechanism
No install spec is provided (instruction-only), minimizing install-time risk. The code will download models from Hugging Face when first run (network activity) and expects a Python venv with dependencies; those are normal for this functionality and there are no suspicious external URLs or archive downloads in the files.
Credentials
The skill requests no environment variables or external credentials. It does require read access to user document directories and write access to ~/.local/share/local-rag for DB, venv, and logs — appropriate for its purpose. Consider that it will download ML models (internet access) and maintains a local queries.log (may contain user queries, albeit truncated).
Persistence & Privilege
The skill does not request permanent platform-level privileges (always=false). It writes/creates files only under ~/.local/share/local-rag and does not modify other skills or system agent config. The included batch scripts use git for checkpointing inside the DB dir (local) which is reasonable for rollback behavior.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install lookupmark-local-rag
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /lookupmark-local-rag 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.9.1
Two-phase batch indexing, 25MB PDF limit, text-only/pdf-only modes, orphan detection fix
v1.9.0
Two-phase indexing: text first (25/batch), then PDFs (1/batch, up to 25MB). --text-only/--pdf-only flags.
v1.8.2
Batch indexing (--max-files), memory guards, GC, orphan detection fix, batch wrapper script
v1.8.1
Detailed indexing log: track added/updated/removed files with totals
v1.8.0
Memory guard, cached reranker model
v1.7.1
Fixed: args.json passed to timeout parameter (query.py main bug). Removed .html/.css from TEXT_EXTENSIONS to match SKILL.md documentation.
v1.7.0
Security: removed trust_remote_code=True. Fixed docstring (BGE-M3 → all-MiniLM). Narrowed pkill in batch script. Truncated query log for privacy.
v1.6.0
Security: removed auto pip install. Efficiency: dynamic batch size, embedding model caching, input sanitization.
v1.5.0
SKILL.md updated (correct models, limits, storage map). query.py: query logging, timeout flag. index.py: silenced FontBBox warnings.
v1.4.0
Added monitoring script (progress, logs, errors, git checkpoints, system stats)
v1.3.0
Security: path whitelist, blocked patterns, disk check. Exclude code files, max 30MB. pdfminer.six instead of pdfplumber.
v1.2.0
Fixed ChromaDB v1 compat ( filter), disabled reranker on ARM, batch size 1 for low RAM
v1.1.0
Parent-child chunking, GPU support, file lock, skip code files, Italian paths
v1.0.0
Initial release of local-rag: semantic search for local files. - Enables semantic search across local PDFs, DOCX, TXT, MD, and common code/text files. - Uses BGE-M3 embeddings and BGE-RERANKER-LARGE reranker for accurate retrieval. - Employs parent-child chunking: searches small chunks, returns larger context chunks. - Persistent local vector database (ChromaDB); CLI tools provided for indexing and searching. - Supports natural language queries and custom file path indexing.
元数据
Slug lookupmark-local-rag
版本 1.9.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 14
常见问题

Local RAG 是什么?

Semantic search over local files using all-MiniLM-L6-v2 embeddings and ms-marco-MiniLM-L-6-v2 cross-encoder reranking with ChromaDB and parent-child chunking... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 208 次。

如何安装 Local RAG?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install lookupmark-local-rag」即可一键安装,无需额外配置。

Local RAG 是免费的吗?

是的,Local RAG 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Local RAG 支持哪些平台?

Local RAG 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Local RAG?

由 LookUpMark(@lookupmark)开发并维护,当前版本 v1.9.1。

💬 留言讨论