← ClaudeAtlas

resampling-strategieslisted

Resampling strategies in tidymodels, including validation splits, cross-validation, bootstrap, nested resampling, and grouped data.
choxos/BiostatAgent · ★ 4 · Data & Documents · score 75
Install: claude install-skill choxos/BiostatAgent
# Resampling Strategies ## Overview Comprehensive guide to resampling methods for model validation using the rsample package. Covers cross-validation, bootstrapping, and specialized resampling for time series and grouped data. ## Data Splitting ### Basic Train/Test Split ```r library(rsample) set.seed(123) # Simple split (75% training) split <- initial_split(data, prop = 0.75) # Stratified split (maintain outcome proportions) split <- initial_split(data, prop = 0.75, strata = outcome) # Stratified with breaking for continuous outcomes split <- initial_split(data, prop = 0.75, strata = outcome, breaks = 4) # Extract sets train <- training(split) test <- testing(split) ``` ### Three-Way Split (Train/Validation/Test) ```r # Single validation set split <- initial_validation_split(data, prop = c(0.6, 0.2)) train <- training(split) val <- validation(split) test <- testing(split) # Create validation set from resampling val_set <- validation_set(split) ``` ## Cross-Validation ### V-Fold Cross-Validation ```r # Basic 10-fold CV folds <- vfold_cv(train_data, v = 10) # Stratified CV folds <- vfold_cv(train_data, v = 10, strata = outcome) # Repeated CV folds <- vfold_cv(train_data, v = 10, repeats = 5, strata = outcome) # Access individual folds folds$splits[[1]] analysis(folds$splits[[1]]) # training fold assessment(folds$splits[[1]]) # validation fold ``` ### Leave-One-Out CV ```r # LOO CV (useful for small datasets) loo_folds <- loo_cv(train_data) ``` ### Mont