/install data-analyst-pipeline
数据分析师 (Data Analyst)
AI-powered data analysis workflow. Cover the full pipeline from data ingestion to interactive HTML report generation.
When to Use
Trigger when the user asks to:
- Analyze a dataset (CSV / Excel / JSON / SQLite)
- Generate a data analysis report
- Do exploratory data analysis (EDA)
- Clean or preprocess data
- Create data visualizations
- Understand data distributions and relationships
Workflow Overview
The skill follows a 7-phase CRISP-DM pipeline, executed automatically:
- Data Loading — Auto-detect format, load into DataFrame
- Data Audit — 4-layer defense: health check, structure, business rules, model readiness
- Data Cleaning — Missing values, outliers, type conversion, dedup
- EDA — Distribution analysis, correlation, group aggregation
- Statistical Analysis — Descriptive stats, hypothesis tests, trend detection
- Visualization — Charts for distributions, correlations, category breakdowns
- Report Generation — Interactive HTML report with scorecards, charts, and insights
Usage
Quick Start
To analyze a data file:
python {baseDir}/scripts/run_analysis.py \x3Cdata_file> [--output report.html]
The script auto-detects the file format and runs the full pipeline.
Module-Level Usage
Each module can be used independently:
# Load data
from data_loader import load_data
df = load_data("sales.csv")
# Audit data quality
from data_auditor import audit_data
report = audit_data(df)
# Clean data
from data_cleaner import clean_data
df_clean = clean_data(df)
# Run EDA
from eda_runner import run_eda
eda_results = run_eda(df_clean)
# Generate report
from report_builder import build_report
build_report(df_clean, eda_results, "report.html")
Scripts Reference
| Script | Purpose | Input | Output |
|---|---|---|---|
scripts/run_analysis.py |
Main entry — orchestrates full pipeline | data file path | HTML report |
scripts/data_loader.py |
Multi-format data loading | file path | pandas DataFrame |
scripts/data_auditor.py |
4-layer quality defense | DataFrame | audit dict |
scripts/data_cleaner.py |
Data cleaning & preprocessing | DataFrame | cleaned DataFrame |
scripts/eda_runner.py |
Exploratory data analysis | DataFrame | EDA results dict |
scripts/visualizer.py |
Chart generation | DataFrame + config | saved .png charts |
scripts/report_builder.py |
HTML report generation | Data + results | HTML report |
Templates
templates/report.html— Jinja2 template for the final HTML report
Config
config/business_rules.yaml— Optional business validation rules
Dependencies
Install before first use:
pip install pandas numpy matplotlib seaborn scipy jinja2 pyyaml missingno
Notes
- For files > 100MB, the audit module uses sampling (n=50000) to stay performant
- Business rules in
config/business_rules.yamlare optional; skip if no domain-specific rules exist - All charts are saved to a
charts/subdirectory in the output folder before embedding in HTML
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install data-analyst-pipeline - 安装完成后,直接呼叫该 Skill 的名称或使用
/data-analyst-pipeline触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
数据分析师skill 是什么?
数据分析师自动化工作流。从数据加载、质量审计、数据清洗、探索性分析(EDA)、统计建模到可视化HTML报告生成,覆盖完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入,内置4层数据防御体系。触发词:分析数据、数据分析、帮我分析数据、数据报告、EDA、data analysis、analyz... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 38 次。
如何安装 数据分析师skill?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-analyst-pipeline」即可一键安装,无需额外配置。
数据分析师skill 是免费的吗?
是的,数据分析师skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
数据分析师skill 支持哪些平台?
数据分析师skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 数据分析师skill?
由 bettermen(@bettermen)开发并维护,当前版本 v1.0.0。