/install data-skill
Data Analysis Assistant Workflow
This skill transforms the agent into a powerful local data analysis assistant, strictly adhering to a Local Code Execution paradigm.
Core Architecture & Principles
- Local Execution First: NEVER read large datasets directly into the context window. Always generate Python scripts or SQL commands and execute them locally using
RunCommand. - SQLite as the Engine: All CSV/Excel files should be imported into a local SQLite database (default:
workspace.db). Rely on SQL for robust data manipulation (filtering, joining, grouping). - Non-Destructive Operations (Undo Mechanism): Do not overwrite original tables. When modifying data, create a new table (e.g.,
CREATE TABLE table_v2 AS SELECT ...) or a View. This guarantees the user can always say "undo the last step". - Data Privacy: Keep data local. Only send aggregated statistics or schema info into the context window.
Scenarios & Procedures
Scenario 1: Data Import & Auto-Cleaning
Trigger: User uploads or specifies a CSV/Excel/WPS(.et)/Numbers file. Action:
- Run the built-in importer script (supports
.csv,.xlsx,.xls,.et,.numbers):
Note: This script calculates the MD5 hash of the file. If an identical file was already imported, it skips the import and returns the existing table name. It also automatically handles merged cells, detects the real header row, chunks large CSVs, and sanitizes column names for SQLite.python scripts/data_importer.py "path/to/file.xlsx" --db workspace.db - Once imported, run a quick check to understand the schema and data:
sqlite3 workspace.db "PRAGMA table_info(table_name);" sqlite3 workspace.db "SELECT * FROM table_name LIMIT 3;" -header -column - Ask the user if they want to perform standard cleaning (e.g., handling missing values, deduplication). Execute these via SQL.
Scenario 2: Continuous Queries & Manipulation
Trigger: User asks to filter, sort, aggregate, or add columns. Action:
- Formulate the SQL query.
- Execute it via
RunCommand:sqlite3 workspace.db "SELECT ..." - For structural changes, remember the Undo principle:
CREATE TABLE table_name_step2 AS SELECT ...
Scenario 3: Semantic Extraction & Fuzzy Join
Trigger: User wants to split addresses, do sentiment analysis, or join tables with mismatched keys (e.g., "Beijing Branch" vs "Beijing Office"). Action:
- Generate a Python script using
pandasandsqlite3. - For Fuzzy Joins, use libraries like
thefuzzordifflibin the Python script to match keys, then write the mapping back to SQLite. - For Semantic extraction, use regex or heuristic rules in Python. If LLM analysis is strictly required, write a script that processes the column locally or prompts the user for permission to send a sample.
Scenario 4: Chart Generation
Trigger: User requests a visualization (bar, pie, line, scatter, map, funnel, 3D charts, etc.). Action:
- Do NOT write custom Python scripts from scratch.
- We have a powerful template-based rendering engine. Use the built-in
scripts/chart_generator.pyscript. - First, identify the required chart type. Look into
references/prompts/directory to find the corresponding Prompt skeleton for the exact chart type (e.g.,references/prompts/line/stacked_area.md). Read the prompt to understand the data structure requirements. - Formulate the SQL query that aggregates the data correctly according to the prompt's requirements.
- Generate the
custom_jsandecharts_optionbased on the prompt template. - Construct a JSON configuration file (save it in
outputs/configs/) matching this structure:
Note: For map charts requiring coordinates, use the built-in Geocoding capabilities or ECharts native{ "db_path": "workspace.db", "query": "SELECT category, SUM(value) as val FROM table GROUP BY category", "title": "Chart Title", "output_path": "/Users/wuliang/workspace/data-skill/outputs/html/output_chart.html", "echarts_option": { ... }, // Generated option from prompt "custom_js": "..." // Optional JS logic for complex data binding }geocoordinate systems. Output files MUST be stored in the isolatedoutputs/html/directory. - Execute the command:
python scripts/chart_generator.py --config outputs/configs/your_config.json - The script will automatically start a local HTTP server and return an access URL. Provide this URL to the user to view the interactive chart.
Scenario 5: File Merging & Splitting
Trigger: User needs to combine multiple identical reports or split a master sheet by department. Action:
- Merge: Iterate over the files and run
data_importer.pypointing to the same table name (the script appends automatically if the table exists, or write a custom Python script). - Split: Generate a Python script that reads the master table from SQLite and exports it into multiple Excel files using
pandas.DataFrame.to_excel()inside a loop.
Scenario 6: Export & Reporting
Trigger: User wants to download the final result or generate a summary report. Action:
- Export CSV/Excel: Use the built-in exporter script to dump a table or query result to
.csvor.xlsx:# Export an entire table python scripts/data_exporter.py "outputs/final_result.csv" --table "final_table" # Export a specific query python scripts/data_exporter.py "outputs/final_result.xlsx" --query "SELECT category, SUM(value) FROM sales GROUP BY category" - Report Generation: Write a Markdown file summarizing the analysis steps, key metrics (retrieved via SQL), and referencing any generated charts. Provide the user with the path to the report.
Scenario 7: Data Cleanup
Trigger: Routine maintenance or user request to clean up old data. Action:
- Run the cleaner script to remove tables and metadata not accessed in the last 30 days:
python scripts/data_cleaner.py --db workspace.db --days 30
Scenario 8: Metrics Management
Trigger: User describes or defines a specific metric calculation logic or business definition (口径). Action:
- When the user provides a metric definition, save it to the local markdown file
references/metrics.mdto build up context for future SQL generation. - Use the built-in script
scripts/metrics_manager.pyto append the metric:python scripts/metrics_manager.py --name "Metric Name" --desc "Metric calculation logic or business description" - When generating SQL queries later, ALWAYS read
references/metrics.mdto ensure the generated SQL aligns with the saved business definitions.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install data-skill - 安装完成后,直接呼叫该 Skill 的名称或使用
/data-skill触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
data-skill 是什么?
专门处理日常办公场景下的高频、复杂数据分析与处理的助手。使用本地代码执行模式(SQL 或 Python + SQLite)来处理数据导入、清洗、查询、提取、合并拆分及报告生成,支持大数据量且保障数据隐私安全。当用户需要处理 Excel/CSV 文件、跨表查询、生成图表或输出数据分析报告时使用此 Skill。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 134 次。
如何安装 data-skill?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-skill」即可一键安装,无需额外配置。
data-skill 是免费的吗?
是的,data-skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
data-skill 支持哪些平台?
data-skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 data-skill?
由 lgwanai(@lgwanai)开发并维护,当前版本 v1.0.0。