missing-data-imputationlisted

Use this skill when the user wants to fill missing values in a tabular dataset and obtain a reusable, fitted scikit-learn imputer plus an auditable report of what was imputed. Triggers include "impute missing values", "fill the NaNs", "handle missing data", "KNN imputation", "iterative / MICE imputation", "remplir les valeurs manquantes", "impute mes données", "gérer les données manquantes", "imputation KNN", "imputation itérative". Supports numeric strategies (mean, median, KNN, iterative/MICE) and categorical strategies (mode, constant), exports a pickled fitted imputer for reuse on new data, and produces a JSON report mapping each column to its strategy, fill value, and missing counts. Aligned to the DAMA-DMBOK2 Completeness dimension.
RAFCERAY/claude-skills-data-tasks · ★ 0 · Data & Documents · score 60

Install: claude install-skill RAFCERAY/claude-skills-data-tasks

# Missing Data Imputation A completeness-remediation skill. Takes a dataset with missing values and produces a **fitted, pickled scikit-learn imputer** that can be reused on new data, plus a **machine-readable JSON report** documenting every imputation decision. Aligned to the DAMA-DMBOK2 **Completeness** dimension. ## When to use this skill Activate when the user has a dataset with missing values and wants to **fill them in a principled, reproducible way** — not just a one-off `fillna`. Typical signals: - "Impute the missing values in this dataset" - "Fill the NaNs so I can train a model" - "Use KNN / iterative / MICE imputation on the numeric columns" - "Remplis les valeurs manquantes de ce dataset" - "Gère les données manquantes avant le feature engineering" **Pre-conditions:** - The dataset is loaded and tabular (CSV, Excel, Parquet). - The user has decided to impute rather than drop. If missingness is very high (>50% on many columns), warn that imputation may inject noise and suggest dropping those columns instead — but let the user decide. **Do NOT activate this skill for:** - Auditing quality without fixing it → use `data-quality-report` - General exploration → use `eda-explorer` - Time-series gap filling that needs forward/backward fill or interpolation along the time axis → use `time-series-features` (lag/rolling logic); this skill is for cross-sectional imputation. **Position in the pipeline:** this skill typically runs **after** `eda-explorer` / `data-qualit