← Back to Skills Marketplace
bettermen

数据分析师skill

by bettermen · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
38
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install data-analyst-pipeline
Description
数据分析师自动化工作流。从数据加载、质量审计、数据清洗、探索性分析(EDA)、统计建模到可视化HTML报告生成,覆盖完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入,内置4层数据防御体系。触发词:分析数据、数据分析、帮我分析数据、数据报告、EDA、data analysis、analyz...
README (SKILL.md)

数据分析师 (Data Analyst)

AI-powered data analysis workflow. Cover the full pipeline from data ingestion to interactive HTML report generation.

When to Use

Trigger when the user asks to:

  • Analyze a dataset (CSV / Excel / JSON / SQLite)
  • Generate a data analysis report
  • Do exploratory data analysis (EDA)
  • Clean or preprocess data
  • Create data visualizations
  • Understand data distributions and relationships

Workflow Overview

The skill follows a 7-phase CRISP-DM pipeline, executed automatically:

  1. Data Loading — Auto-detect format, load into DataFrame
  2. Data Audit — 4-layer defense: health check, structure, business rules, model readiness
  3. Data Cleaning — Missing values, outliers, type conversion, dedup
  4. EDA — Distribution analysis, correlation, group aggregation
  5. Statistical Analysis — Descriptive stats, hypothesis tests, trend detection
  6. Visualization — Charts for distributions, correlations, category breakdowns
  7. Report Generation — Interactive HTML report with scorecards, charts, and insights

Usage

Quick Start

To analyze a data file:

python {baseDir}/scripts/run_analysis.py \x3Cdata_file> [--output report.html]

The script auto-detects the file format and runs the full pipeline.

Module-Level Usage

Each module can be used independently:

# Load data
from data_loader import load_data
df = load_data("sales.csv")

# Audit data quality
from data_auditor import audit_data
report = audit_data(df)

# Clean data
from data_cleaner import clean_data
df_clean = clean_data(df)

# Run EDA
from eda_runner import run_eda
eda_results = run_eda(df_clean)

# Generate report
from report_builder import build_report
build_report(df_clean, eda_results, "report.html")

Scripts Reference

Script Purpose Input Output
scripts/run_analysis.py Main entry — orchestrates full pipeline data file path HTML report
scripts/data_loader.py Multi-format data loading file path pandas DataFrame
scripts/data_auditor.py 4-layer quality defense DataFrame audit dict
scripts/data_cleaner.py Data cleaning & preprocessing DataFrame cleaned DataFrame
scripts/eda_runner.py Exploratory data analysis DataFrame EDA results dict
scripts/visualizer.py Chart generation DataFrame + config saved .png charts
scripts/report_builder.py HTML report generation Data + results HTML report

Templates

  • templates/report.html — Jinja2 template for the final HTML report

Config

  • config/business_rules.yaml — Optional business validation rules

Dependencies

Install before first use:

pip install pandas numpy matplotlib seaborn scipy jinja2 pyyaml missingno

Notes

  • For files > 100MB, the audit module uses sampling (n=50000) to stay performant
  • Business rules in config/business_rules.yaml are optional; skip if no domain-specific rules exist
  • All charts are saved to a charts/ subdirectory in the output folder before embedding in HTML
Usage Guidance
Install only if you are comfortable giving the skill access to the datasets you explicitly point it at. Treat generated HTML reports, charts, summary JSON, and terminal output as potentially sensitive, and avoid using untrusted datasets because report text is built from dataset-derived names and values.
Capability Assessment
Purpose & Capability
The files coherently implement the stated data-analysis workflow: load CSV/Excel/JSON/SQLite data, audit and clean it, run EDA, generate charts, and build an HTML report.
Instruction Scope
The trigger phrases are broad, but they are all data-analysis related; users should invoke it intentionally with explicit dataset paths.
Install Mechanism
No hidden installer, hooks, or auto-start mechanism were found. Dependencies are disclosed as normal Python packages to install manually.
Credentials
Filesystem reads and writes are proportionate to the purpose: it reads the chosen data file and writes reports, charts, and summary JSON. Outputs may reveal dataset schema, values, and analysis results.
Persistence & Privilege
No persistence, privilege escalation, credential access, network access, or destructive behavior was found. It only creates local report artifacts and chart files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install data-analyst-pipeline
  3. After installation, invoke the skill by name or use /data-analyst-pipeline
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
初版发布:完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入,4层数据质量审计,自动EDA分析,7类可视化图表,交互式HTML报告。
Metadata
Slug data-analyst-pipeline
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is 数据分析师skill?

数据分析师自动化工作流。从数据加载、质量审计、数据清洗、探索性分析(EDA)、统计建模到可视化HTML报告生成,覆盖完整数据分析管线。支持CSV/Excel/JSON/SQLite多格式输入,内置4层数据防御体系。触发词:分析数据、数据分析、帮我分析数据、数据报告、EDA、data analysis、analyz... It is an AI Agent Skill for Claude Code / OpenClaw, with 38 downloads so far.

How do I install 数据分析师skill?

Run "/install data-analyst-pipeline" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 数据分析师skill free?

Yes, 数据分析师skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 数据分析师skill support?

数据分析师skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 数据分析师skill?

It is built and maintained by bettermen (@bettermen); the current version is v1.0.0.

💬 Comments