← 返回 Skills 市场
oyi77

Data Analyst

作者 Paijo · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
18930
总下载
62
收藏
193
当前安装
1
版本数
在 OpenClaw 中安装
/install data-analyst
功能描述
Data visualization, report generation, SQL queries, and spreadsheet automation. Transform your AI agent into a data-savvy analyst that turns raw data into actionable insights.
使用说明 (SKILL.md)

Data Analyst Skill 📊

Turn your AI agent into a data analysis powerhouse.

Query databases, analyze spreadsheets, create visualizations, and generate insights that drive decisions.


What This Skill Does

SQL Queries — Write and execute queries against databases ✅ Spreadsheet Analysis — Process CSV, Excel, Google Sheets data ✅ Data Visualization — Create charts, graphs, and dashboards ✅ Report Generation — Automated reports with insights ✅ Data Cleaning — Handle missing data, outliers, formatting ✅ Statistical Analysis — Descriptive stats, trends, correlations


Quick Start

  1. Configure your data sources in TOOLS.md:
### Data Sources
- Primary DB: [Connection string or description]
- Spreadsheets: [Google Sheets URL / local path]
- Data warehouse: [BigQuery/Snowflake/etc.]
  1. Set up your workspace:
./scripts/data-init.sh
  1. Start analyzing!

SQL Query Patterns

Common Query Templates

Basic Data Exploration

-- Row count
SELECT COUNT(*) FROM table_name;

-- Sample data
SELECT * FROM table_name LIMIT 10;

-- Column statistics
SELECT 
    column_name,
    COUNT(*) as count,
    COUNT(DISTINCT column_name) as unique_values,
    MIN(column_name) as min_val,
    MAX(column_name) as max_val
FROM table_name
GROUP BY column_name;

Time-Based Analysis

-- Daily aggregation
SELECT 
    DATE(created_at) as date,
    COUNT(*) as daily_count,
    SUM(amount) as daily_total
FROM transactions
GROUP BY DATE(created_at)
ORDER BY date DESC;

-- Month-over-month comparison
SELECT 
    DATE_TRUNC('month', created_at) as month,
    COUNT(*) as count,
    LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)) as prev_month,
    (COUNT(*) - LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at))) / 
        NULLIF(LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)), 0) * 100 as growth_pct
FROM transactions
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

Cohort Analysis

-- User cohort by signup month
SELECT 
    DATE_TRUNC('month', u.created_at) as cohort_month,
    DATE_TRUNC('month', o.created_at) as activity_month,
    COUNT(DISTINCT u.id) as users
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY cohort_month, activity_month
ORDER BY cohort_month, activity_month;

Funnel Analysis

-- Conversion funnel
WITH funnel AS (
    SELECT
        COUNT(DISTINCT CASE WHEN event = 'page_view' THEN user_id END) as views,
        COUNT(DISTINCT CASE WHEN event = 'signup' THEN user_id END) as signups,
        COUNT(DISTINCT CASE WHEN event = 'purchase' THEN user_id END) as purchases
    FROM events
    WHERE date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT 
    views,
    signups,
    ROUND(signups * 100.0 / NULLIF(views, 0), 2) as signup_rate,
    purchases,
    ROUND(purchases * 100.0 / NULLIF(signups, 0), 2) as purchase_rate
FROM funnel;

Data Cleaning

Common Data Quality Issues

Issue Detection Solution
Missing values IS NULL or empty string Impute, drop, or flag
Duplicates GROUP BY with HAVING COUNT(*) > 1 Deduplicate with rules
Outliers Z-score > 3 or IQR method Investigate, cap, or exclude
Inconsistent formats Sample and pattern match Standardize with transforms
Invalid values Range checks, referential integrity Validate and correct

Data Cleaning SQL Patterns

-- Find duplicates
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Find nulls
SELECT 
    COUNT(*) as total,
    SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) as null_emails,
    SUM(CASE WHEN name IS NULL THEN 1 ELSE 0 END) as null_names
FROM users;

-- Standardize text
UPDATE products
SET category = LOWER(TRIM(category));

-- Remove outliers (IQR method)
WITH stats AS (
    SELECT 
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY value) as q1,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY value) as q3
    FROM data
)
SELECT * FROM data, stats
WHERE value BETWEEN q1 - 1.5*(q3-q1) AND q3 + 1.5*(q3-q1);

Data Cleaning Checklist

# Data Quality Audit: [Dataset]

## Row-Level Checks
- [ ] Total row count: [X]
- [ ] Duplicate rows: [X]
- [ ] Rows with any null: [X]

## Column-Level Checks
| Column | Type | Nulls | Unique | Min | Max | Issues |
|--------|------|-------|--------|-----|-----|--------|
| [col] | [type] | [n] | [n] | [v] | [v] | [notes] |

## Data Lineage
- Source: [Where data came from]
- Last updated: [Date]
- Known issues: [List]

## Cleaning Actions Taken
1. [Action and reason]
2. [Action and reason]

Spreadsheet Analysis

CSV/Excel Processing with Python

import pandas as pd

# Load data
df = pd.read_csv('data.csv')  # or pd.read_excel('data.xlsx')

# Basic exploration
print(df.shape)  # (rows, columns)
print(df.info())  # Column types and nulls
print(df.describe())  # Numeric statistics

# Data cleaning
df = df.drop_duplicates()
df['date'] = pd.to_datetime(df['date'])
df['amount'] = df['amount'].fillna(0)

# Analysis
summary = df.groupby('category').agg({
    'amount': ['sum', 'mean', 'count'],
    'quantity': 'sum'
}).round(2)

# Export
summary.to_csv('analysis_output.csv')

Common Pandas Operations

# Filtering
filtered = df[df['status'] == 'active']
filtered = df[df['amount'] > 1000]
filtered = df[df['date'].between('2024-01-01', '2024-12-31')]

# Aggregation
by_category = df.groupby('category')['amount'].sum()
pivot = df.pivot_table(values='amount', index='month', columns='category', aggfunc='sum')

# Window functions
df['running_total'] = df['amount'].cumsum()
df['pct_change'] = df['amount'].pct_change()
df['rolling_avg'] = df['amount'].rolling(window=7).mean()

# Merging
merged = pd.merge(df1, df2, on='id', how='left')

Data Visualization

Chart Selection Guide

Data Type Best Chart Use When
Trend over time Line chart Showing patterns/changes over time
Category comparison Bar chart Comparing discrete categories
Part of whole Pie/Donut Showing proportions (≤5 categories)
Distribution Histogram Understanding data spread
Correlation Scatter plot Relationship between two variables
Many categories Horizontal bar Ranking or comparing many items
Geographic Map Location-based data

Python Visualization with Matplotlib/Seaborn

import matplotlib.pyplot as plt
import seaborn as sns

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Line chart (trends)
plt.figure(figsize=(10, 6))
plt.plot(df['date'], df['value'], marker='o')
plt.title('Trend Over Time')
plt.xlabel('Date')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('trend.png', dpi=150)

# Bar chart (comparisons)
plt.figure(figsize=(10, 6))
sns.barplot(data=df, x='category', y='amount')
plt.title('Amount by Category')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('comparison.png', dpi=150)

# Heatmap (correlations)
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.tight_layout()
plt.savefig('correlation.png', dpi=150)

ASCII Charts (Quick Terminal Visualization)

When you can't generate images, use ASCII:

Revenue by Month (in $K)
========================
Jan: ████████████████ 160
Feb: ██████████████████ 180
Mar: ████████████████████████ 240
Apr: ██████████████████████ 220
May: ██████████████████████████ 260
Jun: ████████████████████████████ 280

Report Generation

Standard Report Template

# [Report Name]
**Period:** [Date range]
**Generated:** [Date]
**Author:** [Agent/Human]

## Executive Summary
[2-3 sentences with key findings]

## Key Metrics

| Metric | Current | Previous | Change |
|--------|---------|----------|--------|
| [Metric] | [Value] | [Value] | [+/-X%] |

## Detailed Analysis

### [Section 1]
[Analysis with supporting data]

### [Section 2]
[Analysis with supporting data]

## Visualizations
[Insert charts]

## Insights
1. **[Insight]**: [Supporting evidence]
2. **[Insight]**: [Supporting evidence]

## Recommendations
1. [Actionable recommendation]
2. [Actionable recommendation]

## Methodology
- Data source: [Source]
- Date range: [Range]
- Filters applied: [Filters]
- Known limitations: [Limitations]

## Appendix
[Supporting data tables]

Automated Report Script

#!/bin/bash
# generate-report.sh

# Pull latest data
python scripts/extract_data.py --output data/latest.csv

# Run analysis
python scripts/analyze.py --input data/latest.csv --output reports/

# Generate report
python scripts/format_report.py --template weekly --output reports/weekly-$(date +%Y-%m-%d).md

echo "Report generated: reports/weekly-$(date +%Y-%m-%d).md"

Statistical Analysis

Descriptive Statistics

Statistic What It Tells You Use Case
Mean Average value Central tendency
Median Middle value Robust to outliers
Mode Most common Categorical data
Std Dev Spread around mean Variability
Min/Max Range Data boundaries
Percentiles Distribution shape Benchmarking

Quick Stats with Python

# Full descriptive statistics
stats = df['amount'].describe()
print(stats)

# Additional stats
print(f"Median: {df['amount'].median()}")
print(f"Mode: {df['amount'].mode()[0]}")
print(f"Skewness: {df['amount'].skew()}")
print(f"Kurtosis: {df['amount'].kurtosis()}")

# Correlation
correlation = df['sales'].corr(df['marketing_spend'])
print(f"Correlation: {correlation:.3f}")

Statistical Tests Quick Reference

Test Use Case Python
T-test Compare two means scipy.stats.ttest_ind(a, b)
Chi-square Categorical independence scipy.stats.chi2_contingency(table)
ANOVA Compare 3+ means scipy.stats.f_oneway(a, b, c)
Pearson Linear correlation scipy.stats.pearsonr(x, y)

Analysis Workflow

Standard Analysis Process

  1. Define the Question

    • What are we trying to answer?
    • What decisions will this inform?
  2. Understand the Data

    • What data is available?
    • What's the structure and quality?
  3. Clean and Prepare

    • Handle missing values
    • Fix data types
    • Remove duplicates
  4. Explore

    • Descriptive statistics
    • Initial visualizations
    • Identify patterns
  5. Analyze

    • Deep dive into findings
    • Statistical tests if needed
    • Validate hypotheses
  6. Communicate

    • Clear visualizations
    • Actionable insights
    • Recommendations

Analysis Request Template

# Analysis Request

## Question
[What are we trying to answer?]

## Context
[Why does this matter? What decision will it inform?]

## Data Available
- [Dataset 1]: [Description]
- [Dataset 2]: [Description]

## Expected Output
- [Deliverable 1]
- [Deliverable 2]

## Timeline
[When is this needed?]

## Notes
[Any constraints or considerations]

Scripts

data-init.sh

Initialize your data analysis workspace.

query.sh

Quick SQL query execution.

# Run query from file
./scripts/query.sh --file queries/daily-report.sql

# Run inline query
./scripts/query.sh "SELECT COUNT(*) FROM users"

# Save output to file
./scripts/query.sh --file queries/export.sql --output data/export.csv

analyze.py

Python analysis toolkit.

# Basic analysis
python scripts/analyze.py --input data/sales.csv

# With specific analysis type
python scripts/analyze.py --input data/sales.csv --type cohort

# Generate report
python scripts/analyze.py --input data/sales.csv --report weekly

Integration Tips

With Other Skills

Skill Integration
Marketing Analyze campaign performance, content metrics
Sales Pipeline analytics, conversion analysis
Business Dev Market research data, competitor analysis

Common Data Sources

  • Databases: PostgreSQL, MySQL, SQLite
  • Warehouses: BigQuery, Snowflake, Redshift
  • Spreadsheets: Google Sheets, Excel, CSV
  • APIs: REST endpoints, GraphQL
  • Files: JSON, Parquet, XML

Best Practices

  1. Start with the question — Know what you're trying to answer
  2. Validate your data — Garbage in = garbage out
  3. Document everything — Queries, assumptions, decisions
  4. Visualize appropriately — Right chart for right data
  5. Show your work — Methodology matters
  6. Lead with insights — Not just data dumps
  7. Make it actionable — "So what?" → "Now what?"
  8. Version your queries — Track changes over time

Common Mistakes

Confirmation bias — Looking for data to support a conclusion ❌ Correlation ≠ causation — Be careful with claims ❌ Cherry-picking — Using only favorable data ❌ Ignoring outliers — Investigate before removing ❌ Over-complicating — Simple analysis often wins ❌ No context — Numbers without comparison are meaningless


License

License: MIT — use freely, modify, distribute.


"The goal is to turn data into information, and information into insight." — Carly Fiorina

安全使用建议
This skill appears to implement a reasonable local data-analysis toolkit, but the package metadata omits important runtime requirements. Before installing or running it: 1) Inspect the scripts (data-init.sh and query.sh) yourself — they will create files under $HOME/.openclaw and run local DB clients. 2) Do not provide production DB credentials to this skill; if you run queries, use a read-only account and limit network access. 3) Ensure required binaries (sqlite3, psql, mysql) and Python + pandas are present and trusted; the skill does not install them. 4) Consider running the skill in a sandbox or test environment first. 5) If you plan to use cloud warehouses or Google Sheets, confirm how credentials will be stored and used (the skill does not declare or manage those secrets). If the owner/source is unknown, exercise extra caution and prefer manual review and least-privilege credentials.
功能分析
Type: OpenClaw Skill Name: data-analyst Version: 1.0.0 The skill bundle is benign, providing comprehensive documentation and utility scripts for data analysis tasks. The `SKILL.md` serves as a detailed guide for an AI agent, outlining SQL queries, spreadsheet analysis, data visualization, and report generation, without any evidence of prompt injection or instructions for malicious actions. The `scripts/data-init.sh` script safely initializes a dedicated workspace within the agent's home directory, creating standard directories and template files. The `scripts/query.sh` script is a utility for executing SQL queries against various database types, relying on user-provided connection strings and queries, and does not exhibit any shell injection vulnerabilities or malicious intent in its implementation.
能力评估
Purpose & Capability
The skill's name, description, SKILL.md, and scripts all describe database querying, spreadsheet analysis, and report generation — that purpose is consistent. However, the metadata declares no required environment variables or binaries, yet the provided scripts expect DB_CONNECTION/DB_TYPE and call sqlite3/psql/mysql and create a Python analysis template (pandas). The missing declarations are disproportionate to the stated package manifest.
Instruction Scope
SKILL.md stays within the data-analyst purpose and provides SQL/Pandas examples. The runtime instructions tell the agent to run ./scripts/data-init.sh and optionally use ./scripts/query.sh to execute arbitrary SQL. Those scripts will create files under $HOME/.openclaw/workspace/data-analysis and run local DB clients; they do not instruct exfiltration to external endpoints. However, query.sh executes arbitrary SQL provided by the user/agent and will use DB_CONNECTION if set, so it can access sensitive database data if credentials are supplied.
Install Mechanism
There is no install spec and all code is included in the skill bundle (two shell scripts + SKILL.md). No external downloads or extract operations are performed by the skill itself, which minimizes supply-chain risk.
Credentials
The skill metadata lists no required env vars, but scripts reference DB_CONNECTION and DB_TYPE and the SKILL.md suggests configuring various data sources (including cloud warehouses that require credentials). The skill also implicitly needs DB client binaries (sqlite3/psql/mysql) and a Python environment with pandas for the template — none of which are declared. Requesting database connection strings and potentially credentials without declaring them in the metadata is an incoherence and increases risk.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide settings. It creates a workspace directory under the user's home ($HOME/.openclaw/workspace/data-analysis), which is reasonable for its purpose and does not indicate elevated or unexpected privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install data-analyst
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /data-analyst 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of the Data Analyst skill: - Provides SQL query patterns for common analyses, including cohort and funnel analysis. - Enables spreadsheet processing and data cleaning techniques. - Offers Python code samples for data analysis and visualization. - Includes guides for chart selection and terminal-friendly ASCII charts. - Delivers templates and checklists for data audits and report generation.
元数据
Slug data-analyst
版本 1.0.0
许可证
累计安装 221
当前安装数 193
历史版本数 1
常见问题

Data Analyst 是什么?

Data visualization, report generation, SQL queries, and spreadsheet automation. Transform your AI agent into a data-savvy analyst that turns raw data into actionable insights. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 18930 次。

如何安装 Data Analyst?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-analyst」即可一键安装,无需额外配置。

Data Analyst 是免费的吗?

是的,Data Analyst 完全免费(开源免费),可自由下载、安装和使用。

Data Analyst 支持哪些平台?

Data Analyst 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Data Analyst?

由 Paijo(@oyi77)开发并维护,当前版本 v1.0.0。

💬 留言讨论