Data Analyzer
/install zcx-data-analyzer
Data Analyzer — 数据分析工具
Load, analyze, and report on structured data from CSV, Excel, and JSON files. Compute statistics, detect anomalies, identify trends, and generate reports with visualization recommendations.
Workflow
1. Load data → Read the file, inspect structure
2. Profile → Column types, missing values, basic stats
3. Analyze → Statistics, trends, anomalies, correlations
4. Report → Summary with visual recommendations
Step 1 — Data Loading
Supported Formats
| Format | How to Read | Notes |
|---|---|---|
| CSV | Read the file directly, parse header row + data rows | Check delimiter (comma, tab, semicolon). Handle quoted fields. |
| Excel (.xlsx) | Read via openpyxl or pandas. If unavailable, convert to CSV first. |
Handle multiple sheets. Note which sheet was used. |
| JSON | Parse as structured objects. Detect if array-of-objects or object-of-arrays. | Flatten nested structures where possible. |
| TSV | Same as CSV with tab delimiter. |
If Python is available (recommended for large datasets):
pip install pandas openpyxl # if missing
python3 -c "
import pandas as pd
df = pd.read_csv('data.csv')
print(df.info())
print(df.describe())
print(df.head())
"
If Python is not available, parse manually:
- Read the file line by line
- Identify headers (first row)
- Identify column types (numeric vs text vs date)
- Store as an array of rows or objects
Initial Inspection
After loading, always answer these questions:
- Shape: How many rows and columns?
- Column names: What are they and what data types?
- Missing values: Which columns have gaps, and how many?
- Date/time columns: Are they parsed as datetime objects?
- Unique values: For categorical columns, how many unique categories?
Step 2 — Descriptive Statistics
Numeric Columns
Compute and report:
| Statistic | What It Tells You |
|---|---|
| Count | Number of non-null values |
| Mean | Average value |
| Median | Midpoint (50th percentile) — more robust than mean for skewed data |
| Std Dev | Spread around the mean |
| Min / Max | Full range |
| 25th / 75th Percentile | Interquartile range bounds |
| Skewness | Symmetry of the distribution. Positive = right tail, negative = left tail. |
Formula reference (manual calculation):
Mean = sum(x) / n
Median = middle value when sorted
Std Dev = sqrt(sum((x - mean)^2) / (n-1))
Percentile = sort values, take value at position (p/100 * n)
Categorical Columns
| Statistic | What It Tells You |
|---|---|
| Count | Total non-null values |
| Unique | Number of distinct categories |
| Top | Most frequent category |
| Frequency | How often the top category appears |
| Distribution | Share of each category (as percentages) |
Step 3 — Analysis
3a. Anomaly Detection
Method: IQR (Interquartile Range)
Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q1
Lower fence = Q1 - 1.5 * IQR
Upper fence = Q3 + 1.5 * IQR
Anomaly = any value outside [Lower fence, Upper fence]
Method: Z-Score (for approximately normal distributions)
z = (x - mean) / std_dev
Anomaly = |z| > 3 (values more than 3 std devs from mean)
Output anomalies: For each detected anomaly, report:
- Row index
- Column name
- Anomalous value
- Distance from expected (how many IQRs or std devs)
3b. Trend Analysis
For time-series data (data with a date/time column):
- Identify the time column — Sort by date
- Aggregate by period — Group by day/week/month/quarter/year
- Direction — Is the metric increasing, decreasing, or flat?
- Rate of change — Period-over-period percentage change
- Seasonality — Recurring patterns (monthly, quarterly, yearly)
- Breakout — Sudden jumps or drops (potential regime changes)
Output format:
📈 Trend: [Metric Name]
Period: [Date Range]
Direction: [Up/Down/Flat] (slope: ±X%)
Key Points:
- [Date]: Value = X (↗/↘/→)
- Highest point: [Date] = X
- Lowest point: [Date] = X
For non-time-series data, analyze rank order and distribution shape:
Top 5 by [metric]:
1. [Category] = X
2. [Category] = Y
...
Bottom 5 by [metric]:
3c. Correlation Analysis
Pearson correlation coefficient (for linear relationships between two numeric variables):
r = sum((x - mean_x) * (y - mean_y)) / (n * std_x * std_y)
Interpretation:
| r value | Strength | Direction |
|---|---|---|
| 0.7 to 1.0 | Strong | Positive (both rise together) |
| 0.3 to 0.7 | Moderate | Positive |
| 0 to 0.3 | Weak | Positive |
| -0.3 to 0 | Weak | Negative (one rises, other falls) |
| -0.7 to -0.3 | Moderate | Negative |
| -1.0 to -0.7 | Strong | Negative |
Caveats:
- Correlation ≠ causation. Always note this.
- Pearson only captures linear relationships.
- Outliers can distort correlation heavily — check after removing anomalies.
Step 4 — Report Generation
Visualization Recommendations
For each finding, recommend the best chart type:
| Analysis Type | Recommended Chart | Why |
|---|---|---|
| Distribution of one variable | Histogram | Shows shape, skew, peaks |
| Comparison across categories | Bar chart | Easy to compare magnitudes |
| Trend over time | Line chart | Emphasizes direction and continuity |
| Relationship between 2 variables | Scatter plot | Shows correlation, clusters, outliers |
| Part of a whole | Pie / Donut chart | Use only for 2-5 categories |
| Composition over time | Stacked area chart | Shows both total and parts |
| Rank order | Horizontal bar chart | Easy to read sorted values |
| Comparing multiple distributions | Box plot | Shows median, IQR, outliers |
| Heatmap (correlation matrix) | Heatmap | Quick visual of many correlations |
Full Report Template
# Data Analysis Report: [Dataset Name]
Date: [YYYY-MM-DD]
## 1. Overview
- Rows: X | Columns: Y
- Missing data: X cells (X%)
- Key columns: [list with types]
## 2. Descriptive Statistics
### Numeric Columns
[Table: col_name, count, mean, median, std, min, 25%, 75%, max]
### Categorical Columns
[Table: col_name, unique_count, top_value, frequency%]
## 3. Key Findings
### Finding 1: [Title]
[Description of finding]
📊 Recommended chart: [Chart type]
Supporting data: [stats/view]
### Finding 2: [Title]
...
## 4. Anomalies Detected
[Table: row, column, value, severity]
## 5. Correlations
[Notable correlations >|0.3| or \x3C -|0.3|]
## 6. Recommendations
[Data-driven suggestions based on analysis]
One-Page Summary (Quick)
For quick results, use this compact format:
📊 [Dataset]: [N] rows × [M] cols
📈 Key metrics:
- [metric1]: mean=X, median=Y, range=[min, max]
- [metric2]: ...
🔍 Top findings:
1. [Finding] — [chart recommendation]
2. [Finding] — [chart recommendation]
⚠️ Anomalies: X detected
Python Script (Optional)
For complex analysis, create and run a Python script:
import csv, json, statistics
from collections import Counter
# Load data
with open('data.csv') as f:
reader = csv.DictReader(f)
rows = list(reader)
# Get numeric columns
# (column name → list of float values, filtering out blanks)
# Compute mean, median, stdev, percentiles
# Detect outliers via IQR
# Compute correlations between pairs
# Print formatted results
Run with:
python3 analysis.py
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install zcx-data-analyzer - 安装完成后,直接呼叫该 Skill 的名称或使用
/zcx-data-analyzer触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Data Analyzer 是什么?
Load structured CSV, Excel, or JSON data to compute stats, detect anomalies, analyze trends and correlations, and generate summary reports with chart suggest... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 37 次。
如何安装 Data Analyzer?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install zcx-data-analyzer」即可一键安装,无需额外配置。
Data Analyzer 是免费的吗?
是的,Data Analyzer 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Data Analyzer 支持哪些平台?
Data Analyzer 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Data Analyzer?
由 zhaocaixia888(@zhaocaixia888)开发并维护,当前版本 v1.1.1。