Description

数据分析师自动化工作流。从数据加载、质量审计、数据清洗、探索性分析(EDA)、统计建模到可视化HTML报告生成，覆盖完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入，内置4层数据防御体系。触发词：分析数据、数据分析、帮我分析数据、数据报告、EDA、data analysis、analyz...

README (SKILL.md)

数据分析师 (Data Analyst)

Name: 数据分析师skill
Author: bettermen

AI-powered data analysis workflow. Cover the full pipeline from data ingestion to interactive HTML report generation.

When to Use

Trigger when the user asks to:

Analyze a dataset (CSV / Excel / JSON / SQLite)
Generate a data analysis report
Do exploratory data analysis (EDA)
Clean or preprocess data
Create data visualizations
Understand data distributions and relationships

Workflow Overview

The skill follows a 7-phase CRISP-DM pipeline, executed automatically:

Data Loading — Auto-detect format, load into DataFrame
Data Audit — 4-layer defense: health check, structure, business rules, model readiness
Data Cleaning — Missing values, outliers, type conversion, dedup
EDA — Distribution analysis, correlation, group aggregation
Statistical Analysis — Descriptive stats, hypothesis tests, trend detection
Visualization — Charts for distributions, correlations, category breakdowns
Report Generation — Interactive HTML report with scorecards, charts, and insights

Usage

Quick Start

To analyze a data file:

python {baseDir}/scripts/run_analysis.py \x3Cdata_file> [--output report.html]

The script auto-detects the file format and runs the full pipeline.

Module-Level Usage

Each module can be used independently:

# Load data
from data_loader import load_data
df = load_data("sales.csv")

# Audit data quality
from data_auditor import audit_data
report = audit_data(df)

# Clean data
from data_cleaner import clean_data
df_clean = clean_data(df)

# Run EDA
from eda_runner import run_eda
eda_results = run_eda(df_clean)

# Generate report
from report_builder import build_report
build_report(df_clean, eda_results, "report.html")

Scripts Reference

Script	Purpose	Input	Output
`scripts/run_analysis.py`	Main entry — orchestrates full pipeline	data file path	HTML report
`scripts/data_loader.py`	Multi-format data loading	file path	pandas DataFrame
`scripts/data_auditor.py`	4-layer quality defense	DataFrame	audit dict
`scripts/data_cleaner.py`	Data cleaning & preprocessing	DataFrame	cleaned DataFrame
`scripts/eda_runner.py`	Exploratory data analysis	DataFrame	EDA results dict
`scripts/visualizer.py`	Chart generation	DataFrame + config	saved .png charts
`scripts/report_builder.py`	HTML report generation	Data + results	HTML report

Templates

templates/report.html — Jinja2 template for the final HTML report

Config

config/business_rules.yaml — Optional business validation rules

Dependencies

Install before first use:

pip install pandas numpy matplotlib seaborn scipy jinja2 pyyaml missingno

Notes

For files > 100MB, the audit module uses sampling (n=50000) to stay performant
Business rules in config/business_rules.yaml are optional; skip if no domain-specific rules exist
All charts are saved to a charts/ subdirectory in the output folder before embedding in HTML

Usage Guidance

Install only if you are comfortable giving the skill access to the datasets you explicitly point it at. Treat generated HTML reports, charts, summary JSON, and terminal output as potentially sensitive, and avoid using untrusted datasets because report text is built from dataset-derived names and values.

Capability Assessment

✓ Purpose & Capability

The files coherently implement the stated data-analysis workflow: load CSV/Excel/JSON/SQLite data, audit and clean it, run EDA, generate charts, and build an HTML report.

ℹ Instruction Scope

The trigger phrases are broad, but they are all data-analysis related; users should invoke it intentionally with explicit dataset paths.

✓ Install Mechanism

No hidden installer, hooks, or auto-start mechanism were found. Dependencies are disclosed as normal Python packages to install manually.

ℹ Credentials

Filesystem reads and writes are proportionate to the purpose: it reads the chosen data file and writes reports, charts, and summary JSON. Outputs may reveal dataset schema, values, and analysis results.

✓ Persistence & Privilege

No persistence, privilege escalation, credential access, network access, or destructive behavior was found. It only creates local report artifacts and chart files.

Version History

v1.0.0

初版发布：完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入，4层数据质量审计，自动EDA分析，7类可视化图表，交互式HTML报告。

Metadata

Slug data-analyst-pipeline

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is 数据分析师skill?

数据分析师自动化工作流。从数据加载、质量审计、数据清洗、探索性分析(EDA)、统计建模到可视化HTML报告生成，覆盖完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入，内置4层数据防御体系。触发词：分析数据、数据分析、帮我分析数据、数据报告、EDA、data analysis、analyz... It is an AI Agent Skill for Claude Code / OpenClaw, with 38 downloads so far.

How do I install 数据分析师skill?

Run "/install data-analyst-pipeline" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 数据分析师skill free?

Yes, 数据分析师skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 数据分析师skill support?

数据分析师skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 数据分析师skill?

It is built and maintained by bettermen (@bettermen); the current version is v1.0.0.

More Skills

数据分析师skill