feature-engineerlisted

Use this skill when the user wants to prepare a dataset for a machine learning model — encoding categorical variables, scaling numeric features, decomposing datetime columns, creating interactions, or building a reusable feature pipeline. Triggers include "prepare features", "encode my data", "feature engineering", "build a pipeline", "make this ML-ready", "fais du feature engineering", "encode mes variables", "prépare mes features pour un modèle". Outputs a fitted scikit-learn pipeline (pickled), a feature dictionary (JSON), and the transformed dataset.
RAFCERAY/claude-skills-data-tasks · ★ 0 · Data & Documents · score 60

Install: claude install-skill RAFCERAY/claude-skills-data-tasks

# Feature Engineer A standardized feature engineering skill. Same logic, every dataset, fully traceable. ## When to use this skill Activate when the user has a **clean-ish tabular dataset** and wants to make it model-ready. Typical signals: - "Prepare these features for a model" - "Encode my categorical variables" - "Build a feature pipeline" - "I want to train a model on this — what should I do first?" - "Fais du feature engineering" / "encode mes variables" **Pre-condition:** the data should already be reasonably clean. If you see > 30% missing in many columns or obvious quality issues, **call `eda-explorer` first** and tell the user to clean before feature engineering. **Do NOT activate this skill for:** - Initial exploration (use `eda-explorer`) - Time-series-specific features (lags, rolling windows) — wait for `time-series-features` skill - Text NLP feature extraction (out of scope) ## Workflow For every dataset, follow these 5 phases in order: ### Phase 1 — Type detection Auto-classify each column into one of: - `numeric` (int, float, bool) - `categorical_low_card` (object, < 10 unique values) - `categorical_high_card` (object, 10–50 unique values) - `categorical_very_high_card` (object, > 50 unique values) - `datetime` - `id_or_constant` (drop these — `n_unique == n_rows` or `n_unique == 1`) - `text` (object with average string length > 30 chars — out of scope, drop with warning) Print the classification table to the user **before** proceeding. ### Phase 2 —