功能描述

专门处理日常办公场景下的高频、复杂数据分析与处理的助手。使用本地代码执行模式（SQL 或 Python + SQLite）来处理数据导入、清洗、查询、提取、合并拆分及报告生成，支持大数据量且保障数据隐私安全。当用户需要处理 Excel/CSV 文件、跨表查询、生成图表或输出数据分析报告时使用此 Skill。

使用说明 (SKILL.md)

Data Analysis Assistant Workflow

Name: data-skill
Author: lgwanai

This skill transforms the agent into a powerful local data analysis assistant, strictly adhering to a Local Code Execution paradigm.

Core Architecture & Principles

Local Execution First: NEVER read large datasets directly into the context window. Always generate Python scripts or SQL commands and execute them locally using RunCommand.
SQLite as the Engine: All CSV/Excel files should be imported into a local SQLite database (default: workspace.db). Rely on SQL for robust data manipulation (filtering, joining, grouping).
Non-Destructive Operations (Undo Mechanism): Do not overwrite original tables. When modifying data, create a new table (e.g., CREATE TABLE table_v2 AS SELECT ...) or a View. This guarantees the user can always say "undo the last step".
Data Privacy: Keep data local. Only send aggregated statistics or schema info into the context window.

Scenarios & Procedures

Scenario 1: Data Import & Auto-Cleaning

Trigger: User uploads or specifies a CSV/Excel/WPS(.et)/Numbers file. Action:

Run the built-in importer script (supports .csv, .xlsx, .xls, .et, .numbers):
```
python scripts/data_importer.py "path/to/file.xlsx" --db workspace.db
```
Note: This script calculates the MD5 hash of the file. If an identical file was already imported, it skips the import and returns the existing table name. It also automatically handles merged cells, detects the real header row, chunks large CSVs, and sanitizes column names for SQLite.

Once imported, run a quick check to understand the schema and data:

sqlite3 workspace.db "PRAGMA table_info(table_name);"
sqlite3 workspace.db "SELECT * FROM table_name LIMIT 3;" -header -column

Ask the user if they want to perform standard cleaning (e.g., handling missing values, deduplication). Execute these via SQL.

Scenario 2: Continuous Queries & Manipulation

Trigger: User asks to filter, sort, aggregate, or add columns. Action:

Formulate the SQL query.
Execute it via RunCommand: sqlite3 workspace.db "SELECT ..."
For structural changes, remember the Undo principle: CREATE TABLE table_name_step2 AS SELECT ...

Scenario 3: Semantic Extraction & Fuzzy Join

Trigger: User wants to split addresses, do sentiment analysis, or join tables with mismatched keys (e.g., "Beijing Branch" vs "Beijing Office"). Action:

Generate a Python script using pandas and sqlite3.
For Fuzzy Joins, use libraries like thefuzz or difflib in the Python script to match keys, then write the mapping back to SQLite.
For Semantic extraction, use regex or heuristic rules in Python. If LLM analysis is strictly required, write a script that processes the column locally or prompts the user for permission to send a sample.

Scenario 4: Chart Generation

Trigger: User requests a visualization (bar, pie, line, scatter, map, funnel, 3D charts, etc.). Action:

Do NOT write custom Python scripts from scratch.
We have a powerful template-based rendering engine. Use the built-in scripts/chart_generator.py script.
First, identify the required chart type. Look into references/prompts/ directory to find the corresponding Prompt skeleton for the exact chart type (e.g., references/prompts/line/stacked_area.md). Read the prompt to understand the data structure requirements.
Formulate the SQL query that aggregates the data correctly according to the prompt's requirements.
Generate the custom_js and echarts_option based on the prompt template.

Construct a JSON configuration file (save it in outputs/configs/) matching this structure:

{
    "db_path": "workspace.db",
    "query": "SELECT category, SUM(value) as val FROM table GROUP BY category",
    "title": "Chart Title",
    "output_path": "/Users/wuliang/workspace/data-skill/outputs/html/output_chart.html",
    "echarts_option": { ... }, // Generated option from prompt
    "custom_js": "..." // Optional JS logic for complex data binding
}

Note: For map charts requiring coordinates, use the built-in Geocoding capabilities or ECharts native geo coordinate systems. Output files MUST be stored in the isolated outputs/html/ directory.

Execute the command:

python scripts/chart_generator.py --config outputs/configs/your_config.json

The script will automatically start a local HTTP server and return an access URL. Provide this URL to the user to view the interactive chart.

Scenario 5: File Merging & Splitting

Trigger: User needs to combine multiple identical reports or split a master sheet by department. Action:

Merge: Iterate over the files and run data_importer.py pointing to the same table name (the script appends automatically if the table exists, or write a custom Python script).
Split: Generate a Python script that reads the master table from SQLite and exports it into multiple Excel files using pandas.DataFrame.to_excel() inside a loop.

Scenario 6: Export & Reporting

Trigger: User wants to download the final result or generate a summary report. Action:

Export CSV/Excel: Use the built-in exporter script to dump a table or query result to .csv or .xlsx:

# Export an entire table
python scripts/data_exporter.py "outputs/final_result.csv" --table "final_table"

# Export a specific query
python scripts/data_exporter.py "outputs/final_result.xlsx" --query "SELECT category, SUM(value) FROM sales GROUP BY category"

Report Generation: Write a Markdown file summarizing the analysis steps, key metrics (retrieved via SQL), and referencing any generated charts. Provide the user with the path to the report.

Scenario 7: Data Cleanup

Trigger: Routine maintenance or user request to clean up old data. Action:

Run the cleaner script to remove tables and metadata not accessed in the last 30 days:
```
python scripts/data_cleaner.py --db workspace.db --days 30
```

Scenario 8: Metrics Management

Trigger: User describes or defines a specific metric calculation logic or business definition (口径). Action:

When the user provides a metric definition, save it to the local markdown file references/metrics.md to build up context for future SQL generation.

Use the built-in script scripts/metrics_manager.py to append the metric:

python scripts/metrics_manager.py --name "Metric Name" --desc "Metric calculation logic or business description"

When generating SQL queries later, ALWAYS read references/metrics.md to ensure the generated SQL aligns with the saved business definitions.

安全使用建议

This skill appears to implement a local data-import/clean/visualization workflow and includes many helper scripts and chart templates, but there are several red flags you should check before installing or running it with real or sensitive data: 1) Verify runtimes and dependencies: The skill's SKILL.md assumes Python and sqlite3 and there is a requirements.txt (pandas, thefuzz, openpyxl, etc.), but the registry metadata does not declare required binaries or an install step. Ensure you install dependencies in an isolated virtualenv and do not run anything until you review the code. 2) Audit the code (especially server.py and data_exporter/data_importer): Look for any network code or server binding (check whether any HTTP server binds to 0.0.0.0), hard-coded external endpoints, calls that might send data off-host, or attempts to read unexpected system paths. If the server binds to a network interface, restrict it to localhost or run behind a firewall. 3) Search for encoded/obfuscated content: The scanner found a base64-block in SKILL.md. Search SKILL.md and scripts for base64 strings or other obfuscated payloads and decode them to verify intent before execution. 4) Correct hard-coded paths: Examples in SKILL.md include absolute paths (e.g., /Users/wuliang/...). Update to relative workspace paths or confirm they won’t overwrite user files. 5) Test with non-sensitive data in an isolated environment: Run the skill on synthetic data in a sandbox/container to confirm behavior, verify the undo/non-destructive mechanisms, and observe whether the local HTTP server only serves local files. 6) If you lack capacity to audit the code, treat it as untrusted: do not run on confidential data. Consider asking the author for a minimal install/run guide that declares required binaries and explains how the server binds and what URLs it returns. If you confirm the above (no hidden network exfiltration, server bound to localhost, no obfuscated payloads), the skill's functionality is coherent with its stated purpose. Until then, proceed cautiously.

功能分析

Type: OpenClaw Skill Name: data-skill Version: 1.0.0 The 'data-skill' bundle is a comprehensive and well-documented tool designed for local data analysis and visualization using SQLite and ECharts. It explicitly follows a 'Local Code Execution' paradigm to ensure data privacy by keeping datasets out of the LLM's context window. The Python scripts (e.g., `data_importer.py`, `chart_generator.py`, and `server.py`) are functional, transparently documented, and align perfectly with the stated purpose of processing Excel/CSV files and generating interactive charts. While the bundle includes a local HTTP server to serve visualizations and executes agent-generated SQL queries—behaviors that involve standard technical risks—there is no evidence of malicious intent, such as credential theft, data exfiltration to external domains, or hidden backdoors.

能力评估

ℹ Purpose & Capability

The name/description align with included scripts (data_importer.py, data_cleaner.py, chart_generator.py, exporter) and many ECharts templates/assets; those files are consistent with a local data-analysis/charting assistant. However, the registry metadata does not declare required runtimes/binaries even though SKILL.md and the scripts expect Python and sqlite3 (and the requirements.txt lists pandas/thefuzz/etc.). That mismatch (no declared required binaries or install steps) is an incoherence to be aware of.

⚠ Instruction Scope

SKILL.md explicitly instructs the agent to generate and run local Python/SQLite commands (python scripts/... and sqlite3 commands) and to start a local HTTP server to serve charts. Those instructions are within the described purpose but the file contains examples with a hard-coded absolute path (e.g., /Users/wuliang/...) which is inappropriate and brittle. The SKILL.md also instructs to keep data local and only surface aggregates — good practice — but the presence of a pre-scan 'base64-block' prompt-injection signal in SKILL.md is concerning (could hide encoded instructions). You should review SKILL.md and the scripts for any hidden/encoded content and confirm the server binding behavior (does it bind localhost only or 0.0.0.0?).

⚠ Install Mechanism

The skill has no install spec (instruction-only), yet it contains many executable scripts and a requirements.txt. Because nothing is declared to be installed automatically, an agent or user would need to run pip/other commands manually; the absence of an install mechanism but presence of code and dependency list is inconsistent. There are no external download URLs in the provided metadata (which is lower risk), but lack of guidance increases the chance of accidental insecure setup (e.g., running scripts without vetting).

ℹ Credentials

The skill requests no environment variables or credentials, which is appropriate for a purely local data tool. Still, SKILL.md's chart generator auto-starts a local HTTP server and returns an access URL; you should confirm that server.py does not expose the service to external networks or attempt to transmit data outward. Also check that the scripts do not reference unexpected environment variables or config files at runtime.

✓ Persistence & Privilege

The skill is not marked always:true and requests no special platform privileges. It does not claim to alter other skills or system-wide agent settings. Running the included local HTTP server and writing outputs to an outputs/ directory is normal for this type of tool, but you should verify server binding and file paths before use.

版本历史

v1.0.0

data-skill v1.0.0 initial release - Introduces a local data analysis assistant focused on secure, high-frequency data tasks in office settings. - Implements a workflow for importing, cleaning, querying, splitting, merging, and exporting Excel/CSV files using SQL and Python with SQLite as the backend. - Ensures data privacy by processing all data locally and only sharing metadata or aggregated results. - Features template-based chart generation, semantic extraction, fuzzy joins, and undo-safe table operations. - Adds modular scripts for data import/export, charting, cleanup, and business metric management to streamline analysis and reporting.

元数据

Slug data-skill

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

data-skill 是什么？

专门处理日常办公场景下的高频、复杂数据分析与处理的助手。使用本地代码执行模式（SQL 或 Python + SQLite）来处理数据导入、清洗、查询、提取、合并拆分及报告生成，支持大数据量且保障数据隐私安全。当用户需要处理 Excel/CSV 文件、跨表查询、生成图表或输出数据分析报告时使用此 Skill。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 134 次。

如何安装 data-skill？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-skill」即可一键安装，无需额外配置。

data-skill 是免费的吗？

是的，data-skill 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

data-skill 支持哪些平台？

data-skill 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 data-skill？

由 lgwanai（@lgwanai）开发并维护，当前版本 v1.0.0。

data-skill