tidymodels-review-patternslisted
Install: claude install-skill choxos/BiostatAgent
# Tidymodels Code Review Patterns
## Overview
Anti-pattern detection and best practices for tidymodels workflows based on "Tidy Modeling with R" (TMwR) principles. This skill enables systematic code review for data leakage, resampling violations, workflow issues, evaluation problems, and reproducibility concerns.
## Data Leakage Patterns (CRITICAL)
### DL-001: Recipe Fitted on Test Data
**Severity**: CRITICAL
**Anti-Pattern**:
```r
# WRONG: Fitting recipe on test data
rec <- recipe(outcome ~ ., data = test_data) |>
prep()
# WRONG: prep() using test data
rec <- recipe(outcome ~ ., data = train_data) |>
prep(training = test_data)
```
**Correct Pattern**:
```r
# CORRECT: Recipe always prepped on training data only
rec <- recipe(outcome ~ ., data = train_data) |>
prep(training = train_data)
# BEST: Use workflow (handles automatically)
wf <- workflow() |>
add_recipe(rec) |>
add_model(model_spec)
fit <- fit(wf, data = train_data)
```
**Detection**: Look for `prep()` calls with test data or recipes defined on test sets.
---
### DL-002: Preprocessing Before Split
**Severity**: CRITICAL
**Anti-Pattern**:
```r
# WRONG: Normalizing before splitting
df_normalized <- df |>
mutate(across(where(is.numeric), scale))
split <- initial_split(df_normalized)
# WRONG: Feature selection before split
important_vars <- df |>
select(where(~cor(.x, df$outcome) > 0.3))
split <- initial_split(important_vars)
```
**Correct Pattern**:
```r
# CORRECT: Split first, then prepro