tidymodels-review-patternslisted

Review patterns for tidymodels workflows, including leakage, resampling, tuning, metrics, and reproducibility.
choxos/BiostatAgent · ★ 4 · AI & Automation · score 75

Install: claude install-skill choxos/BiostatAgent

# Tidymodels Code Review Patterns ## Overview Anti-pattern detection and best practices for tidymodels workflows based on "Tidy Modeling with R" (TMwR) principles. This skill enables systematic code review for data leakage, resampling violations, workflow issues, evaluation problems, and reproducibility concerns. ## Data Leakage Patterns (CRITICAL) ### DL-001: Recipe Fitted on Test Data **Severity**: CRITICAL **Anti-Pattern**: ```r # WRONG: Fitting recipe on test data rec <- recipe(outcome ~ ., data = test_data) |> prep() # WRONG: prep() using test data rec <- recipe(outcome ~ ., data = train_data) |> prep(training = test_data) ``` **Correct Pattern**: ```r # CORRECT: Recipe always prepped on training data only rec <- recipe(outcome ~ ., data = train_data) |> prep(training = train_data) # BEST: Use workflow (handles automatically) wf <- workflow() |> add_recipe(rec) |> add_model(model_spec) fit <- fit(wf, data = train_data) ``` **Detection**: Look for `prep()` calls with test data or recipes defined on test sets. --- ### DL-002: Preprocessing Before Split **Severity**: CRITICAL **Anti-Pattern**: ```r # WRONG: Normalizing before splitting df_normalized <- df |> mutate(across(where(is.numeric), scale)) split <- initial_split(df_normalized) # WRONG: Feature selection before split important_vars <- df |> select(where(~cor(.x, df$outcome) > 0.3)) split <- initial_split(important_vars) ``` **Correct Pattern**: ```r # CORRECT: Split first, then prepro