功能描述

数据分析师自动化工作流。从数据加载、质量审计、数据清洗、探索性分析(EDA)、统计建模到可视化HTML报告生成，覆盖完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入，内置4层数据防御体系。触发词：分析数据、数据分析、帮我分析数据、数据报告、EDA、data analysis、analyz...

使用说明 (SKILL.md)

数据分析师 (Data Analyst)

Name: 数据分析师skill
Author: bettermen

AI-powered data analysis workflow. Cover the full pipeline from data ingestion to interactive HTML report generation.

When to Use

Trigger when the user asks to:

Analyze a dataset (CSV / Excel / JSON / SQLite)
Generate a data analysis report
Do exploratory data analysis (EDA)
Clean or preprocess data
Create data visualizations
Understand data distributions and relationships

Workflow Overview

The skill follows a 7-phase CRISP-DM pipeline, executed automatically:

Data Loading — Auto-detect format, load into DataFrame
Data Audit — 4-layer defense: health check, structure, business rules, model readiness
Data Cleaning — Missing values, outliers, type conversion, dedup
EDA — Distribution analysis, correlation, group aggregation
Statistical Analysis — Descriptive stats, hypothesis tests, trend detection
Visualization — Charts for distributions, correlations, category breakdowns
Report Generation — Interactive HTML report with scorecards, charts, and insights

Usage

Quick Start

To analyze a data file:

python {baseDir}/scripts/run_analysis.py \x3Cdata_file> [--output report.html]

The script auto-detects the file format and runs the full pipeline.

Module-Level Usage

Each module can be used independently:

# Load data
from data_loader import load_data
df = load_data("sales.csv")

# Audit data quality
from data_auditor import audit_data
report = audit_data(df)

# Clean data
from data_cleaner import clean_data
df_clean = clean_data(df)

# Run EDA
from eda_runner import run_eda
eda_results = run_eda(df_clean)

# Generate report
from report_builder import build_report
build_report(df_clean, eda_results, "report.html")

Scripts Reference

Script	Purpose	Input	Output
`scripts/run_analysis.py`	Main entry — orchestrates full pipeline	data file path	HTML report
`scripts/data_loader.py`	Multi-format data loading	file path	pandas DataFrame
`scripts/data_auditor.py`	4-layer quality defense	DataFrame	audit dict
`scripts/data_cleaner.py`	Data cleaning & preprocessing	DataFrame	cleaned DataFrame
`scripts/eda_runner.py`	Exploratory data analysis	DataFrame	EDA results dict
`scripts/visualizer.py`	Chart generation	DataFrame + config	saved .png charts
`scripts/report_builder.py`	HTML report generation	Data + results	HTML report

Templates

templates/report.html — Jinja2 template for the final HTML report

Config

config/business_rules.yaml — Optional business validation rules

Dependencies

Install before first use:

pip install pandas numpy matplotlib seaborn scipy jinja2 pyyaml missingno

Notes

For files > 100MB, the audit module uses sampling (n=50000) to stay performant
Business rules in config/business_rules.yaml are optional; skip if no domain-specific rules exist
All charts are saved to a charts/ subdirectory in the output folder before embedding in HTML

安全使用建议

Install only if you are comfortable giving the skill access to the datasets you explicitly point it at. Treat generated HTML reports, charts, summary JSON, and terminal output as potentially sensitive, and avoid using untrusted datasets because report text is built from dataset-derived names and values.

能力评估

✓ Purpose & Capability

The files coherently implement the stated data-analysis workflow: load CSV/Excel/JSON/SQLite data, audit and clean it, run EDA, generate charts, and build an HTML report.

ℹ Instruction Scope

The trigger phrases are broad, but they are all data-analysis related; users should invoke it intentionally with explicit dataset paths.

✓ Install Mechanism

No hidden installer, hooks, or auto-start mechanism were found. Dependencies are disclosed as normal Python packages to install manually.

ℹ Credentials

Filesystem reads and writes are proportionate to the purpose: it reads the chosen data file and writes reports, charts, and summary JSON. Outputs may reveal dataset schema, values, and analysis results.

✓ Persistence & Privilege

No persistence, privilege escalation, credential access, network access, or destructive behavior was found. It only creates local report artifacts and chart files.

版本历史

v1.0.0

初版发布：完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入，4层数据质量审计，自动EDA分析，7类可视化图表，交互式HTML报告。

元数据

Slug data-analyst-pipeline

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

数据分析师skill 是什么？

数据分析师自动化工作流。从数据加载、质量审计、数据清洗、探索性分析(EDA)、统计建模到可视化HTML报告生成，覆盖完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入，内置4层数据防御体系。触发词：分析数据、数据分析、帮我分析数据、数据报告、EDA、data analysis、analyz... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 38 次。

如何安装数据分析师skill？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-analyst-pipeline」即可一键安装，无需额外配置。

数据分析师skill 是免费的吗？

是的，数据分析师skill 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

数据分析师skill 支持哪些平台？

数据分析师skill 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了数据分析师skill？

由 bettermen（@bettermen）开发并维护，当前版本 v1.0.0。

数据分析师skill