← 返回 Skills 市场
vitorhugoze

Data Vault

作者 vitorhugoze · GitHub ↗ · v1.0.18 · MIT-0
cross-platform ⚠ suspicious
171
总下载
0
收藏
1
当前安装
4
版本数
在 OpenClaw 中安装
/install data-vault
功能描述
Persist and retrieve structured data using the Lance columnar format. Use when you need to store, query, or analyze data across sessions — such as saving ski...
使用说明 (SKILL.md)

Data Vault

Installation

uv pip install pylance pandas

A persistent data store using the Lance columnar format for fast ML data access.

Quick Start

# List all datasets and their metadata
python3 scripts/command.py list-datasets-info

# Create a dataset
python3 scripts/command.py create-dataset \x3Cname> \x3Cfield1> \x3Cfield2> ...

# Append data
python3 scripts/command.py append-to-dataset \x3Cname> \x3Cvalue1> \x3Cvalue2> ...

# Read all records from a dataset
python3 scripts/command.py read-dataset \x3Cname>

Note: list-datasets-info shows dataset metadata (schema, field types, record count) — it does not return the actual data rows. Use read-dataset to retrieve records.

Storage Location

DataSets are created and stored on the current path '.'

Critical Behavior: Data Type Strictness

⚠️ Lance is strict about data types — they CANNOT change after the first record

When you append the first record to a dataset, Lance infers the data type for each field. All subsequent records MUST use the same types.

Example — this FAILS:

# First record: age as STRING
append-to-dataset users "John" "25" "[email protected]"

# Second record: age as INTEGER (will FAIL!)
append-to-dataset users "Jane" 30 "[email protected]"
# Error: `age` should have type large_string but type was int64

Correct approach — maintain consistent types:

# First record: age as STRING
append-to-dataset users "John" "25" "[email protected]"

# Second record: age as STRING
append-to-dataset users "Jane" "30" "[email protected]"

Why This Matters

Unlike traditional databases that may coerce types, Lance rejects type mismatches. If you store numbers as strings initially, you must always pass strings. Plan your schema carefully.

Initialization Workflow

When starting a session, always initialize by listing existing datasets first:

# This command returns ALL datasets with their structure
python3 scripts/command.py list-datasets-info

Example output:

{
    "skill": "data-vault",
    "operation": "list_datasets_info",
    "status": "success",
    "data": [
        {
            "dataset_name": "users",
            "path": "/data/users",
            "fields": ["name", "age", "email"],
            "field_types": {
                "_id": "large_string",
                "_updated_at": "timestamp[us]",
                "name": "large_string",
                "age": "large_string",
                "email": "large_string"
            },
            "record_count": 2,
            "columns": ["id", "_updated_at", "name", "age", "email"],
            "last_updated": "2026-03-21T17:57:44.595628"
        }
    ],
    "error": null
}

Understanding field_types

State Meaning
{} (empty) Dataset exists but no records yet — types not yet defined
populated Types are locked — appends must match

Important: If field_types is empty, the first append will define types. Be deliberate about the first record's types.

Commands Reference

Create Dataset

python3 scripts/command.py create-dataset \x3Cname> \x3Cfield1> \x3Cfield2> ...

Creates a metadata entry. Fields have no types until first append.

Append Record

python3 scripts/command.py append-to-dataset \x3Cname> \x3Cvalue1> \x3Cvalue2> ...

Appends one record. Types are inferred from first record.

Batch Append

python3 scripts/command.py batch-append-to-dataset \x3Cname> '\x3Cjson-array>'

Example: batch-append-to-dataset users '[["Alice", "22", "[email protected]"], ["Bob", "35", "[email protected]"]]'

Update Record

python3 scripts/command.py update-dataset-record \x3Cname> \x3Crecord_id> \x3Cvalue1> \x3Cvalue2> ...

Updates fields for a specific record by ID.

Delete Record

python3 scripts/command.py delete-dataset-record \x3Cname> \x3Crecord_id>

List All Datasets

python3 scripts/command.py list-datasets

Get Dataset Info

python3 scripts/command.py get-dataset-info \x3Cname>

Returns schema, field types (if data exists), and record count.

List All Datasets with Full Info

python3 scripts/command.py list-datasets-info

Recommended for initialization. Returns all datasets with complete metadata.

Get Dataset Path

python3 scripts/command.py get-dataset-path-info \x3Cname>

Backup Dataset

python3 scripts/command.py backup-dataset \x3Cname> \x3Cbackup_path>

Count Records

python3 scripts/command.py count-records \x3Cname>

Read All Records

Returns all records from the dataset as a list of objects.

python3 scripts/command.py read-dataset \x3Cname>

Drop Dataset

Requires confirmation if have not created a backup beforehand.

Delete the entire dataset and its metadata.

python3 scripts/command.py drop-dataset \x3Cname>

Internal fields available in every dataset:

Field Type Description
_id string UUID — unique record identifier
_updated_at timestamp When the record was last inserted or updated

List Records (Paginated)

python3 scripts/command.py list-records \x3Cname> --limit 10 --offset 0

Returns records with optional pagination.

Get Single Record

python3 scripts/command.py get-record \x3Cname> \x3Crecord_id>

Retrieves a specific record by its UUID.

Get Dataset Info

python3 scripts/command.py get-dataset-info \x3Cname>

Returns schema, field types (if data exists), and record count.

Response Format

All commands return JSON:

{
  "skill": "data-vault",
  "operation": "\x3Coperation_name>",
  "status": "success|error",
  "data": \x3Cresult_data_or_null>,
  "error": \x3Cerror_message_or_null>
}

Internal Fields

Every dataset automatically includes:

  • _id — UUID for each record
  • _updated_at — timestamp of last insert/update

These are managed automatically — when appending, only provide your defined fields.

Data Type Inference

Lance infers types from the first record:

Python Type Lance Type
"string" large_string
25 (int) int64
25.5 (float) float64
True/False bool

CLI caveat: When passing via command line, all values are strings. To ensure integer types, initialize with actual integers in a script rather than CLI.

Tips

  1. Initialize at session start: Run list-datasets-info to understand what data already exists
  2. Plan your schema: First record determines types for the entire dataset
  3. Use batch append when adding multiple records: More efficient than individual appends

Requirements

Dependencies are declared in frontmatter (metadata.openclaw.install) and handled by the OpenClaw install system via uv. The Python packages required are:

  • pylance — The Lance columnar format library.

    ⚠️ Naming note: Despite the PyPI package being named pylance, the library is imported as import lance in Python code. This is the official Lance project naming convention — it is NOT the VS Code "pylance" language server. See lance.org for details.

  • pandas — Data manipulation

安全使用建议
This skill's code implements a local Lance-based dataset CLI that stores data under the agent's current directory, which aligns with its description. The main red flag is the install instructions: they use a nonstandard 'uv' installer and insist on a PyPI package named 'pylance' (claimed to provide the 'lance' module). Before installing or running, do one of the following: (1) verify on PyPI that the intended Lance package name is correct (installing the wrong package could pull unrelated code), (2) ask the author to confirm the use of 'pylance' vs 'lance' and why 'uv' is required, or (3) run the skill in a disposable/sandboxed environment (container or VM) and inspect what pip installs. Also run the CLI from an isolated directory to avoid accidental reads/writes to other files. If the author confirms the package names and installer are intentional and you can inspect the installed packages, this would likely move the assessment toward benign.
功能分析
Type: OpenClaw Skill Name: data-vault Version: 1.0.18 The 'data-vault' skill is a legitimate tool for persisting and querying structured data using the Lance columnar format. The implementation in scripts/manage.py, scripts/read.py, and scripts/write.py follows standard practices for data management and includes proactive security measures, such as path traversal validation in the validate_dataset_name and validate_backup_path functions. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the package dependencies (pylance and pandas) and installation steps are consistent with the stated functionality.
能力评估
Purpose & Capability
The code files implement a local CLI for creating, reading, updating, deleting, and backing up Lance datasets under the current directory — this directly matches the 'Data Vault' description. However, the packaging/install metadata uses unusual package names (e.g., 'pylance' rather than 'lance') and declares 'uv' as a required binary/installer which is not standard for a simple Python CLI; that mismatch is unexplained.
Instruction Scope
SKILL.md instructs installation of Python dependencies and running the included scripts. At runtime the skill reads and writes files under the current working directory ('.') and uses a local 'metadata.lance' file and per-dataset folders. It does not request or transmit environment variables, network endpoints, or external tokens in the instructions. Note: storing data on '.' means the skill will read/write any files within the agent's working directory—so run it in an isolated directory if you care about data separation.
Install Mechanism
The install steps bootstrap pip and run 'pip install --upgrade uv' and then use 'uv' to install 'pylance' and 'pandas'. This is nonstandard: 'uv' as an installer/binary is uncommon and may be platform-specific. More importantly, the skill repeatedly insists the Lance PyPI package is named 'pylance' (and warns not to replace it with 'lance'), which is suspicious because 'pylance' is commonly known as a VSCode language server package — this could be a typo/author confusion or it could cause installation of the wrong package. No external download URLs are used, but the package name mismatch creates a supply-chain risk (you may end up installing an unrelated package).
Credentials
The skill declares no required environment variables, no credentials, and no config paths beyond using the current working directory. That is appropriate for a local data store. There are no requests for unrelated secrets.
Persistence & Privilege
The skill is not forced-always and is user-invocable. It persists data to disk under the current directory (per its design) but does not attempt to modify other skills or system-wide agent settings. Autonomous invocation is allowed (platform default) but not unusual here.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install data-vault
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /data-vault 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.18
- added 1 file(s). - Updated SKILL.md and bundle contents.
v1.0.16
- Added a .gitignore file to the project. - No changes to the code or documentation content. - No impact on functionality; version number remains unchanged in documentation.
v1.0.15
- Updated install instructions to use `uv pip install pylance pandas` for clearer setup. - Adjusted dependencies metadata to clarify use of `uv` and explicit pip type. - No changes to code functionality; this is a documentation update.
v1.0.14
- Improved documentation in SKILL.md with detailed instructions for installation, usage, commands, and critical behavior notes. - Added explicit callouts on Lance's strict data type enforcement and schema initialization. - Provided examples for every command including dataset creation, appending, updates, deletion, record retrieval, and backups. - Explained response format and internal fields included with each dataset. - Included practical CLI usage tips and a comprehensive quick start. - Declared installation requirements in metadata for streamlined setup.
元数据
Slug data-vault
版本 1.0.18
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 4
常见问题

Data Vault 是什么?

Persist and retrieve structured data using the Lance columnar format. Use when you need to store, query, or analyze data across sessions — such as saving ski... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 171 次。

如何安装 Data Vault?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-vault」即可一键安装,无需额外配置。

Data Vault 是免费的吗?

是的,Data Vault 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Data Vault 支持哪些平台?

Data Vault 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Data Vault?

由 vitorhugoze(@vitorhugoze)开发并维护,当前版本 v1.0.18。

💬 留言讨论