← 返回 Skills 市场

Data Scientist

Name: Data Scientist
Author: runkecheng

作者 runkecheng · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install data-scientist

功能描述

Expertise in statistical analysis, predictive modeling, causal inference, and data-driven storytelling to generate validated business insights and support de...

使用说明 (SKILL.md)

Data Scientist

Purpose

Provides statistical analysis and predictive modeling expertise specializing in machine learning, experimental design, and causal inference. Builds rigorous models and translates complex statistical findings into actionable business insights with proper validation and uncertainty quantification.

Core Capabilities

Statistical Modeling

Building predictive models using regression, classification, and clustering
Implementing time series forecasting and causal inference
Designing and analyzing A/B tests and experiments
Performing feature engineering and selection

Machine Learning

Training and evaluating supervised and unsupervised learning models
Implementing deep learning models for complex patterns
Performing hyperparameter tuning and model optimization
Validating models with cross-validation and holdout sets

Data Exploration

Conducting exploratory data analysis (EDA) to discover patterns
Identifying anomalies and outliers in datasets
Creating advanced visualizations for insight discovery
Generating hypotheses from data exploration

Communication and Storytelling

Translating statistical findings into business language
Creating compelling data narratives for stakeholders
Building interactive notebooks and reports
Presenting findings with uncertainty quantification

Core Workflows

Workflow 1: EDA & Data Cleaning

Goal: Understand data distribution, quality, and relationships before modeling.

# Load and profile
import pandas as pd, numpy as np, seaborn as sns, matplotlib.pyplot as plt
df = pd.read_csv("data.csv")
print(df.info()); print(df.describe())
missing = df.isnull().sum() / len(df)
print(missing[missing > 0].sort_values(ascending=False))

# Univariate analysis
num_cols = df.select_dtypes(include=[np.number]).columns
for col in num_cols:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4))
    sns.histplot(df[col], kde=True, ax=ax1)
    sns.boxplot(x=df[col], ax=ax2)
    plt.show()

# Correlation
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

# Cleaning
df['age'].fillna(df['age'].median(), inplace=True)
cap = df['income'].quantile(0.99)
df['income'] = np.where(df['income'] > cap, cap, df['income'])

Workflow 2: A/B Test Analysis (Proportions Z-test)

from statsmodels.stats.proportion import proportions_ztest, proportion_confint
results = df.groupby('group')['converted'].agg(['count','sum','mean'])
control, treatment = results.loc['A'], results.loc['B']
count = np.array([treatment['sum'], control['sum']])
nobs  = np.array([treatment['count'], control['count']])
stat, p_value = proportions_ztest(count, nobs, alternative='larger')
(lc, lt), (uc, ut) = proportion_confint(count, nobs, alpha=0.05)

If p \x3C 0.05: reject H0 (statistically significant). Check practical significance (lift magnitude).

Workflow 3: Causal Inference (Propensity Score Matching)

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# Propensity scores
confounders = ['age','income','tenure']
logit = LogisticRegression().fit(df[confounders], df['is_premium'])
df['pscore'] = logit.predict_proba(df[confounders])[:, 1]

# Nearest neighbor matching
nn = NearestNeighbors(n_neighbors=1).fit(control[['pscore']])
_, indices = nn.kneighbors(treatment[['pscore']])
matched_control = control.iloc[indices.flatten()]
ate = treatment['spend'].mean() - matched_control['spend'].mean()

Anti-Patterns

Anti-Pattern	Problem	Fix
Data Leakage	Scaling/encoding before split	Pipeline; fit only on train
P-Hacking	Testing 50 hypotheses, reporting p\x3C0.05	Bonferroni/FDR correction; pre-register
Imbalanced Classes	99.9% accuracy on 0.1% fraud	Use PR-AUC, F1; SMOTE; class_weights

Quality Checklist

Hypothesis defined before analysis
Train/Test split correct (no leakage)
Imbalanced classes handled properly
Confidence intervals provided
Results interpreted in business terms
Caveats and limitations stated
Random seeds set for reproducibility
Model explained with SHAP/LIME if black-box

安全使用建议

Install this if you want reusable data science analysis guidance. Review generated analysis code before running it on private or regulated data, especially where examples load local datasets or perform modeling decisions that may affect business outcomes.

能力评估

✓ Purpose & Capability

The stated purpose is statistical analysis, predictive modeling, experiment design, causal inference, and data storytelling; the artifact content matches that purpose.

✓ Instruction Scope

Instructions are limited to analytical workflows, examples, anti-patterns, and a quality checklist; there are no role overrides, hidden commands, autonomous actions, or prompt-injection patterns.

✓ Install Mechanism

The inspected artifact is a markdown SKILL.md file only, with no executable scripts, declared dependencies, package installation steps, or runtime hooks.

ℹ Credentials

Example code reads a local placeholder file named data.csv and uses common data science libraries, which is expected for this skill but should be adapted carefully for sensitive datasets.

✓ Persistence & Privilege

No persistence, background execution, privilege escalation, credential/session access, network calls, or mutation of system/account state was found.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install data-scientist
安装完成后，直接呼叫该 Skill 的名称或使用 /data-scientist 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of the data-scientist skill, enabling advanced statistical and machine learning analysis. - Provides workflows for EDA, A/B test analysis, and causal inference (propensity score matching) - Supports predictive modeling, statistical inference, experiment design, and feature engineering - Includes guidance on communication, data storytelling, and common anti-patterns to avoid - Offers a robust quality checklist to ensure rigorous and reproducible analysis

元数据

Slug data-scientist

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Data Scientist 是什么？

Expertise in statistical analysis, predictive modeling, causal inference, and data-driven storytelling to generate validated business insights and support de... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 50 次。

如何安装 Data Scientist？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-scientist」即可一键安装，无需额外配置。

Data Scientist 是免费的吗？

是的，Data Scientist 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Data Scientist 支持哪些平台？

Data Scientist 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Data Scientist？

由 runkecheng（@runkecheng）开发并维护，当前版本 v1.0.0。

Data Scientist

Data Scientist

Purpose

Core Capabilities

Statistical Modeling

Machine Learning

Data Exploration

Communication and Storytelling

Core Workflows

Workflow 1: EDA & Data Cleaning

Workflow 2: A/B Test Analysis (Proportions Z-test)

Workflow 3: Causal Inference (Propensity Score Matching)

Anti-Patterns

Quality Checklist

Data Scientist 是什么？

如何安装 Data Scientist？

Data Scientist 是免费的吗？

Data Scientist 支持哪些平台？

谁开发了 Data Scientist？

💬 留言讨论