← ClaudeAtlas

senior-data-scientistlisted

World-class senior data scientist skill specialising in statistical modeling, experiment design, causal inference, and predictive analytics. Covers A/B testing (sample sizing, two-proportion z-tests, Bonferroni correction), difference-in-differences, feature engineering pipelines (Scikit-learn, XGBoost), cross-validated model evaluation (AUC-ROC, AUC-PR, SHAP), and MLflow experiment tracking — using Python (NumPy, Pandas, Scikit-learn), R, and SQL. Use when designing or analysing controlled experiments, building and evaluating classification or regression models, performing causal analysis on observational data, engineering features for structured tabular datasets, or translating statistical findings into data-driven business decisions.
mdnaimul22/human-skills · ★ 2 · AI & Automation · score 73
Install: claude install-skill mdnaimul22/human-skills
# Senior Data Scientist World-class senior data scientist skill for production-grade AI/ML/Data systems. ## Core Workflows ### 1. Design an A/B Test ```python import numpy as np from scipy import stats def calculate_sample_size(baseline_rate, mde, alpha=0.05, power=0.8): """ Calculate required sample size per variant. baseline_rate: current conversion rate (e.g. 0.10) mde: minimum detectable effect (relative, e.g. 0.05 = 5% lift) """ p1 = baseline_rate p2 = baseline_rate * (1 + mde) effect_size = abs(p2 - p1) / np.sqrt((p1 * (1 - p1) + p2 * (1 - p2)) / 2) z_alpha = stats.norm.ppf(1 - alpha / 2) z_beta = stats.norm.ppf(power) n = ((z_alpha + z_beta) / effect_size) ** 2 return int(np.ceil(n)) def analyze_experiment(control, treatment, alpha=0.05): """ Run two-proportion z-test and return structured results. control/treatment: dicts with 'conversions' and 'visitors'. """ p_c = control["conversions"] / control["visitors"] p_t = treatment["conversions"] / treatment["visitors"] pooled = (control["conversions"] + treatment["conversions"]) / (control["visitors"] + treatment["visitors"]) se = np.sqrt(pooled * (1 - pooled) * (1 / control["visitors"] + 1 / treatment["visitors"])) z = (p_t - p_c) / se p_value = 2 * (1 - stats.norm.cdf(abs(z))) ci_low = (p_t - p_c) - stats.norm.ppf(1 - alpha / 2) * se ci_high = (p_t - p_c) + stats.norm.ppf(1 - alpha / 2) * se return { "lift":