← Back to Skills Marketplace
runkecheng

Data Scientist

by runkecheng · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
50
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install data-scientist
Description
Expertise in statistical analysis, predictive modeling, causal inference, and data-driven storytelling to generate validated business insights and support de...
README (SKILL.md)

Data Scientist

Purpose

Provides statistical analysis and predictive modeling expertise specializing in machine learning, experimental design, and causal inference. Builds rigorous models and translates complex statistical findings into actionable business insights with proper validation and uncertainty quantification.

Core Capabilities

Statistical Modeling

  • Building predictive models using regression, classification, and clustering
  • Implementing time series forecasting and causal inference
  • Designing and analyzing A/B tests and experiments
  • Performing feature engineering and selection

Machine Learning

  • Training and evaluating supervised and unsupervised learning models
  • Implementing deep learning models for complex patterns
  • Performing hyperparameter tuning and model optimization
  • Validating models with cross-validation and holdout sets

Data Exploration

  • Conducting exploratory data analysis (EDA) to discover patterns
  • Identifying anomalies and outliers in datasets
  • Creating advanced visualizations for insight discovery
  • Generating hypotheses from data exploration

Communication and Storytelling

  • Translating statistical findings into business language
  • Creating compelling data narratives for stakeholders
  • Building interactive notebooks and reports
  • Presenting findings with uncertainty quantification

Core Workflows

Workflow 1: EDA & Data Cleaning

Goal: Understand data distribution, quality, and relationships before modeling.

# Load and profile
import pandas as pd, numpy as np, seaborn as sns, matplotlib.pyplot as plt
df = pd.read_csv("data.csv")
print(df.info()); print(df.describe())
missing = df.isnull().sum() / len(df)
print(missing[missing > 0].sort_values(ascending=False))

# Univariate analysis
num_cols = df.select_dtypes(include=[np.number]).columns
for col in num_cols:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4))
    sns.histplot(df[col], kde=True, ax=ax1)
    sns.boxplot(x=df[col], ax=ax2)
    plt.show()

# Correlation
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

# Cleaning
df['age'].fillna(df['age'].median(), inplace=True)
cap = df['income'].quantile(0.99)
df['income'] = np.where(df['income'] > cap, cap, df['income'])

Workflow 2: A/B Test Analysis (Proportions Z-test)

from statsmodels.stats.proportion import proportions_ztest, proportion_confint
results = df.groupby('group')['converted'].agg(['count','sum','mean'])
control, treatment = results.loc['A'], results.loc['B']
count = np.array([treatment['sum'], control['sum']])
nobs  = np.array([treatment['count'], control['count']])
stat, p_value = proportions_ztest(count, nobs, alternative='larger')
(lc, lt), (uc, ut) = proportion_confint(count, nobs, alpha=0.05)

If p \x3C 0.05: reject H0 (statistically significant). Check practical significance (lift magnitude).

Workflow 3: Causal Inference (Propensity Score Matching)

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# Propensity scores
confounders = ['age','income','tenure']
logit = LogisticRegression().fit(df[confounders], df['is_premium'])
df['pscore'] = logit.predict_proba(df[confounders])[:, 1]

# Nearest neighbor matching
nn = NearestNeighbors(n_neighbors=1).fit(control[['pscore']])
_, indices = nn.kneighbors(treatment[['pscore']])
matched_control = control.iloc[indices.flatten()]
ate = treatment['spend'].mean() - matched_control['spend'].mean()

Anti-Patterns

Anti-Pattern Problem Fix
Data Leakage Scaling/encoding before split Pipeline; fit only on train
P-Hacking Testing 50 hypotheses, reporting p\x3C0.05 Bonferroni/FDR correction; pre-register
Imbalanced Classes 99.9% accuracy on 0.1% fraud Use PR-AUC, F1; SMOTE; class_weights

Quality Checklist

  • Hypothesis defined before analysis
  • Train/Test split correct (no leakage)
  • Imbalanced classes handled properly
  • Confidence intervals provided
  • Results interpreted in business terms
  • Caveats and limitations stated
  • Random seeds set for reproducibility
  • Model explained with SHAP/LIME if black-box
Usage Guidance
Install this if you want reusable data science analysis guidance. Review generated analysis code before running it on private or regulated data, especially where examples load local datasets or perform modeling decisions that may affect business outcomes.
Capability Assessment
Purpose & Capability
The stated purpose is statistical analysis, predictive modeling, experiment design, causal inference, and data storytelling; the artifact content matches that purpose.
Instruction Scope
Instructions are limited to analytical workflows, examples, anti-patterns, and a quality checklist; there are no role overrides, hidden commands, autonomous actions, or prompt-injection patterns.
Install Mechanism
The inspected artifact is a markdown SKILL.md file only, with no executable scripts, declared dependencies, package installation steps, or runtime hooks.
Credentials
Example code reads a local placeholder file named data.csv and uses common data science libraries, which is expected for this skill but should be adapted carefully for sensitive datasets.
Persistence & Privilege
No persistence, background execution, privilege escalation, credential/session access, network calls, or mutation of system/account state was found.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install data-scientist
  3. After installation, invoke the skill by name or use /data-scientist
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of the data-scientist skill, enabling advanced statistical and machine learning analysis. - Provides workflows for EDA, A/B test analysis, and causal inference (propensity score matching) - Supports predictive modeling, statistical inference, experiment design, and feature engineering - Includes guidance on communication, data storytelling, and common anti-patterns to avoid - Offers a robust quality checklist to ensure rigorous and reproducible analysis
Metadata
Slug data-scientist
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Data Scientist?

Expertise in statistical analysis, predictive modeling, causal inference, and data-driven storytelling to generate validated business insights and support de... It is an AI Agent Skill for Claude Code / OpenClaw, with 50 downloads so far.

How do I install Data Scientist?

Run "/install data-scientist" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Scientist free?

Yes, Data Scientist is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Scientist support?

Data Scientist is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Scientist?

It is built and maintained by runkecheng (@runkecheng); the current version is v1.0.0.

💬 Comments