Description

数据分析 - 加载CSV/JSON自动计算统计描述(均值/中位数/标准差/极值)，异常检测，趋势分析，结果本地持久化

README (SKILL.md)

Data Analyzer - 数据分析引擎

Name: 数据分析引擎
Author: 534422530

激活词: 分析数据 / data analyze / 统计

功能

CSV/JSON 数据加载解析
自动统计描述：均值、中位数、标准差、极值
异常值检测（IQR/z-score）
趋势判断（上升/下降/波动）
结果保存到本地 JSON

Python 实现

import csv, json, statistics, math
from datetime import datetime
from typing import List, Dict, Any

class DataAnalyzer:
    def __init__(self):
        self.data: List[Dict[str, Any]] = []
        self.numeric_cols: List[str] = []
    
    def load_csv(self, path: str, delimiter: str = ",") -> int:
        """从CSV加载数据"""
        with open(path, newline="", encoding="utf-8") as f:
            reader = csv.DictReader(f, delimiter=delimiter)
            self.data = list(reader)
        self._detect_numeric()
        return len(self.data)
    
    def load_json(self, path: str) -> int:
        """从JSON加载数据（支持列表和记录列表）"""
        with open(path, encoding="utf-8") as f:
            raw = json.load(f)
        if isinstance(raw, list):
            self.data = raw
        elif isinstance(raw, dict):
            # 尝试找到第一个列表字段
            for v in raw.values():
                if isinstance(v, list):
                    self.data = v
                    break
        self._detect_numeric()
        return len(self.data)
    
    def _detect_numeric(self):
        """自动检测数值列"""
        if not self.data:
            return
        for col in self.data[0]:
            try:
                float(self.data[0][col])
                self.numeric_cols.append(col)
            except (ValueError, TypeError):
                pass
    
    def describe(self, col: str) -> dict:
        """数值列的统计描述"""
        if col not in self.numeric_cols:
            return {"error": f"'{col}' is not numeric"}
        vals = [float(r[col]) for r in self.data if r.get(col)]
        
        n = len(vals)
        mean = statistics.mean(vals)
        median = statistics.median(vals)
        stdev = statistics.stdev(vals) if n > 1 else 0
        
        # 异常检测 (IQR方法)
        sorted_vals = sorted(vals)
        q1 = sorted_vals[n // 4]
        q3 = sorted_vals[3 * n // 4]
        iqr = q3 - q1
        lower = q1 - 1.5 * iqr
        upper = q3 + 1.5 * iqr
        outliers = [v for v in vals if v \x3C lower or v > upper]
        
        # 趋势判断
        half = n // 2
        first_half = statistics.mean(vals[:half]) if half > 0 else mean
        second_half = statistics.mean(vals[half:]) if half > 0 else mean
        trend = "up" if second_half > first_half * 1.05 else "down" if second_half \x3C first_half * 0.95 else "stable"
        
        return {
            "column": col,
            "count": n,
            "mean": round(mean, 2),
            "median": round(median, 2),
            "stdev": round(stdev, 2),
            "min": round(min(vals), 2),
            "max": round(max(vals), 2),
            "range": round(max(vals) - min(vals), 2),
            "q1": round(q1, 2),
            "q3": round(q3, 2),
            "iqr": round(iqr, 2),
            "outliers": len(outliers),
            "outlier_values": [round(v, 2) for v in outliers[:10]],
            "trend": trend,
        }
    
    def correlation(self, col1: str, col2: str) -> float:
        """Pearson相关系数"""
        if col1 not in self.numeric_cols or col2 not in self.numeric_cols:
            return None
        pairs = [(float(r[col1]), float(r[col2])) for r in self.data
                 if r.get(col1) and r.get(col2)]
        n = len(pairs)
        if n \x3C 3:
            return None
        sum_x = sum(p[0] for p in pairs)
        sum_y = sum(p[1] for p in pairs)
        sum_xy = sum(p[0] * p[1] for p in pairs)
        sum_x2 = sum(p[0] ** 2 for p in pairs)
        sum_y2 = sum(p[1] ** 2 for p in pairs)
        num = n * sum_xy - sum_x * sum_y
        den = math.sqrt((n * sum_x2 - sum_x ** 2) * (n * sum_y2 - sum_y ** 2))
        return round(num / den, 3) if den else 0
    
    def report(self, output: str = None) -> dict:
        """完整分析报告"""
        report = {
            "rows": len(self.data),
            "columns": list(self.data[0].keys()) if self.data else [],
            "numeric_columns": self.numeric_cols,
            "statistics": {col: self.describe(col) for col in self.numeric_cols},
            "timestamp": datetime.now().isoformat(),
        }
        # 相关性矩阵
        if len(self.numeric_cols) >= 2:
            report["correlations"] = {}
            for i, c1 in enumerate(self.numeric_cols):
                for c2 in self.numeric_cols[i+1:]:
                    corr = self.correlation(c1, c2)
                    if corr is not None:
                        report["correlations"][f"{c1}_vs_{c2}"] = corr
        
        if output:
            with open(output, "w", encoding="utf-8") as f:
                json.dump(report, f, ensure_ascii=False, indent=2)
        return report

# 使用示例
analyzer = DataAnalyzer()

# 模拟数据
sample_data = [
    {"date": "2026-05-01", "revenue": 1200, "users": 45, "conversion": 0.12},
    {"date": "2026-05-02", "revenue": 1350, "users": 52, "conversion": 0.14},
    {"date": "2026-05-03", "revenue": 1100, "users": 38, "conversion": 0.11},
    {"date": "2026-05-04", "revenue": 1600, "users": 61, "conversion": 0.13},
    {"date": "2026-05-05", "revenue": 900,  "users": 30, "conversion": 0.09},
    {"date": "2026-05-06", "revenue": 1450, "users": 55, "conversion": 0.15},
    {"date": "2026-05-07", "revenue": 1300, "users": 48, "conversion": 0.11},
]
analyzer.data = sample_data
analyzer._detect_numeric()

# 描述统计
desc = analyzer.describe("revenue")
print(f"营收: 均值={desc['mean']}, 中位数={desc['median']}, 趋势={desc['trend']}")
print(f"异常值: {desc['outliers']}个")

# 相关性
corr = analyzer.correlation("revenue", "users")
print(f"营收-用户 相关系数: {corr}")

# 完整报告
report = analyzer.report("analysis_results.json")
print(f"分析完成: {report['rows']}条记录, {len(report['statistics'])}个数值列")

输出示例

{
  "rows": 7,
  "columns": ["date", "revenue", "users", "conversion"],
  "statistics": {
    "revenue": {
      "mean": 1271.43,
      "median": 1300.0,
      "stdev": 239.05,
      "min": 900,
      "max": 1600,
      "trend": "stable"
    }
  },
  "correlations": {
    "revenue_vs_users": 0.985,
    "revenue_vs_conversion": 0.672
  }
}

使用场景

业务报表: 月度/周度运营数据自动分析
A/B测试: 实验组vs对照组的关键指标对比
数据质量: 异常值检测发现数据采集问题
趋势监控: 连续跟踪指标变化方向

依赖

Python 3.8+
标准库（csv, json, statistics, math）

Usage Guidance

Before installing, users should understand that analyzing sensitive datasets may expose derived information in the agent’s context, and saving a report will leave a local JSON file behind. Use it with files you intended to analyze and choose output paths deliberately.

Capability Assessment

✓ Purpose & Capability

The advertised purpose is data analysis, and the artifact implements matching capabilities: CSV/JSON loading, statistical summaries, outlier detection, trend checks, correlations, and optional JSON report output.

ℹ Instruction Scope

The activation phrases are broad, so the skill could be invoked for ordinary data-analysis requests, but the behavior remains aligned with those requests and does not add unrelated authority.

✓ Install Mechanism

The package contains only a markdown skill file with embedded example Python code; there are no executable install scripts, dependency installs, background workers, or package registry dependencies.

✓ Credentials

File access is limited to user-specified CSV/JSON input paths and an optional user-specified report path, using Python standard-library modules only.

ℹ Persistence & Privilege

The skill can persist derived analysis results to a local JSON file, including column names, statistics, outlier samples, correlations, and a timestamp; this is disclosed in the description, feature list, and example output flow.

Version History

v1.0.0

Initial release of laosi-data-analyzer. - Supports loading and parsing CSV/JSON data - Automatically computes statistical descriptions: mean, median, standard deviation, min/max - Includes outlier detection (IQR/z-score) and trend analysis - Generates correlations between numeric columns - Saves analysis results locally as JSON - Suitable for business reports, A/B testing, data quality checks, and trend monitoring

Metadata

Slug laosi-data-analyzer

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is 数据分析引擎?

数据分析 - 加载CSV/JSON自动计算统计描述(均值/中位数/标准差/极值)，异常检测，趋势分析，结果本地持久化. It is an AI Agent Skill for Claude Code / OpenClaw, with 24 downloads so far.

How do I install 数据分析引擎?

Run "/install laosi-data-analyzer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 数据分析引擎 free?

Yes, 数据分析引擎 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 数据分析引擎 support?

数据分析引擎 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 数据分析引擎?

It is built and maintained by 534422530 (@534422530); the current version is v1.0.0.

More Skills

数据分析引擎