Description

Create and manage modular portable database pods (SQLite + metadata + embeddings). Includes document ingestion with embeddings for semantic search. Full auto...

README (SKILL.md)

Data Pods

Name: data-pods
Author: init-v

Overview

Create and manage portable, consent-scoped database pods. Handles document ingestion with embeddings and semantic search.

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Ingestion  │ ──► │   DB Pods   │ ──► │  Generation │
│  (ingest)   │     │  (storage)  │     │   (query)   │
└─────────────┘     └─────────────┘     └─────────────┘

Triggers

"create a pod" / "new pod"
"list my pods" / "what pods do I have"
"add to pod" / "add note" / "add content"
"query pod" / "search pod"
"ingest documents" / "add files"
"semantic search" / "find相关内容"
"export pod" / "pack pod"

Core Features

1. Create Pod

When user asks to create a pod:

Ask for pod name and type (scholar/health/shared/projects)
Run: python3 .../scripts/pod.py create \x3Cname> --type \x3Ctype>
Confirm creation

2. Add Content (Manual)

When user asks to add content:

Ask for pod name, title, content, tags
Run: python3 .../scripts/pod.py add \x3Cpod> --title "\x3Ctitle>" --content "\x3Ccontent>" --tags "\x3Ctags>"
Confirm

3. Ingest Documents (Automated)

When user wants to ingest files:

Ask for pod name and folder path
Run: python3 .../scripts/ingest.py ingest \x3Cpod> \x3Cfolder>
Supports: PDF, TXT, MD, DOCX, PNG, JPG
Auto-embeds text (if sentence-transformers installed)

4. Semantic Search

When user wants to search:

Ask for pod name and query
Run: python3 .../scripts/ingest.py search \x3Cpod> "\x3Cquery>"
Returns ranked results with citations

5. Query (Basic)

When user asks to search notes:

Run: python3 .../scripts/pod.py query \x3Cpod> --text "\x3Cquery>"

6. Export

When user asks to export:

Run: python3 .../scripts/podsync.py pack \x3Cpod>

Dependencies

pip install PyPDF2 python-docx pillow pytesseract sentence-transformers

Storage Location

~/.openclaw/data-pods/

Key Commands

# Create pod
python3 .../scripts/pod.py create research --type scholar

# Add note
python3 .../scripts/pod.py add research --title "..." --content "..." --tags "..."

# Ingest folder
python3 .../scripts/ingest.py ingest research ./documents/

# Semantic search
python3 .../scripts/ingest.py search research "transformers"

# List documents
python3 .../scripts/ingest.py list research

# Query notes
python3 .../scripts/pod.py query research --text "..."

Notes

Ingestion auto-chunks large documents
Embeddings enable semantic search
File hash prevents duplicate ingestion
All data stored locally in SQLite

Usage Guidance

This skill appears to implement local data pods and a consent layer as advertised, but review before installing or running: - Inconsistency: There are two consent implementations that store grants in different places (~/.config/data-pods/consents/grants.json vs ~/.openclaw/consent/consent.db). Verify which consent manager your agent will call so you don't accidentally bypass consent checks. - Sensitive exports: The tool can export pods (.vpod/.zip) and pack entire pods as a single Markdown file intended for pasting into LLMs. Do not export or paste pods containing sensitive data (health, personal, or confidential research) into external services unless you explicitly intend to share. - Raw SQL: pod.py supports a --sql option that executes arbitrary SQL against the local DB. Be cautious when running it in contexts where results might be returned to an agent or transmitted elsewhere. - Dependencies: sentence-transformers is optional but required for semantic search; installing it can pull heavy model data. Because there's no automated install, manually review and install only the dependencies you need. - Audit and sandbox: If you want to test, run the scripts in a disposable environment (temporary user account or VM), check where files and consent records are created, and inspect outputs before integrating into daily workflows. If you plan to use this skill in production or with sensitive pods, ask the author to clarify which consent implementation is canonical, add a single consistent consent gateway, and consider removing or gating the 'pack for LLM' guidance to avoid accidental exfiltration.

Capability Analysis

Type: OpenClaw Skill Name: initv-data-pods Version: 0.2.0 This skill bundle is classified as suspicious due to multiple critical vulnerabilities that could lead to arbitrary code execution and arbitrary file system access. The `SKILL.md` and `README.md` instruct the AI agent to construct shell commands using unsanitized user input, creating a shell injection vulnerability. The `scripts/pod.py` file contains a severe SQL injection vulnerability in its `query_pod` function, allowing direct execution of user-provided SQL. Additionally, `scripts/podsync.py` is vulnerable to a Zip Slip attack in its `import_pod` function, which could lead to arbitrary file overwrite. While there is no clear evidence of intentional malicious behavior (e.g., data exfiltration or backdoor installation), these vulnerabilities are severe enough to enable such attacks if exploited by a malicious user or a prompt-injected agent.

Capability Assessment

ℹ Purpose & Capability

The code implements the advertised features: pod creation, notes, ingestion, optional embeddings (sentence-transformers), local storage under ~/.openclaw/data-pods, export/pack and a consent layer. However there are internal inconsistencies: the repository contains two consent implementations that use different storage locations (root consent.py writes grants to ~/.config/data-pods/consents/grants.json, while scripts/consent.py uses ~/.openclaw/consent/consent.db). README/usage also reference different paths (e.g. /home/claudio/.openclaw/workspace...). These mismatches could be harmless (old vs new code) but are disproportionate to a clean single-purpose skill and can create confusion about which consent check is actually used.

⚠ Instruction Scope

SKILL.md instructs the agent to run the included Python scripts (pod.py, ingest.py, podsync.py, consent scripts). Those scripts operate on local files and databases only, which aligns with the description. Concerns: 1) pod.py supports a --sql option that executes raw SQL against the SQLite DB — this can leak arbitrary data if results are included in agent responses or exported. 2) podsync.pack writes a single Markdown file 'Ready to paste into ChatGPT!' — exporting sensitive data into a format intended for pasting into external LLMs increases exfiltration risk if users follow that guidance. 3) There are two different consent implementations/paths (see purpose_capability) so an agent following SKILL.md could call one path while a user expects a different consent store — potential for bypassing intended consent checks.

ℹ Install Mechanism

No formal install spec is declared. SKILL.md lists pip dependencies (PyPDF2, python-docx, pillow, pytesseract, sentence-transformers). That's proportionate for document parsing and embeddings, but sentence-transformers is a heavy dependency that will download large models. Because there's no install automation, users must manually install dependencies; this reduces supply-chain risk but requires care when installing large ML packages.

✓ Credentials

The skill requests no environment variables, no credentials, and interacts with local file paths only. That is proportional to the stated purpose. There are no hard-coded network endpoints or secret exfiltration calls in the provided code.

✓ Persistence & Privilege

The skill does not request 'always: true' or any elevated platform privilege. It writes data under user dirs (~/.openclaw and ~/.config) which is expected for a local data management tool. Export and import functions create files under ~/.openclaw/sync; those are normal for export/sync functionality.

Version History

v0.2.0

v0.2 - Document ingestion with embeddings, semantic search

v0.1.0

v0.1 - Modular portable database pods with SQLite + metadata

Metadata

Slug initv-data-pods

Version 0.2.0

License —

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is data-pods?

Create and manage modular portable database pods (SQLite + metadata + embeddings). Includes document ingestion with embeddings for semantic search. Full auto... It is an AI Agent Skill for Claude Code / OpenClaw, with 375 downloads so far.

How do I install data-pods?

Run "/install initv-data-pods" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is data-pods free?

Yes, data-pods is completely free (open-source). You can download, install and use it at no cost.

Which platforms does data-pods support?

data-pods is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created data-pods?

It is built and maintained by init-v (@init-v); the current version is v0.2.0.

More Skills

data-pods