← ClaudeAtlas

ml-antipattern-validatorlisted

Prevents 30+ critical AI/ML mistakes including data leakage, evaluation errors, training pitfalls, and deployment issues. Use when working with ML training, testing, model evaluation, or deployment.
aiskillstore/marketplace · ★ 329 · AI & Automation · score 79
Install: claude install-skill aiskillstore/marketplace
# ML Antipattern Validator ## Overview AI/ML 개발에서 30+ 안티패턴을 감지하고 방지하는 스킬입니다. **Key Principle**: Honest evaluation > Impressive metrics. ## When to Activate **Automatic Triggers**: - ML training code (`train*.py`, model training) - Dataset preparation or splitting - Model evaluation or testing - Production deployment planning **Manual Triggers**: - `@validate-ml` - Full validation - `@check-leakage` - Data leakage detection - `@verify-eval` - Evaluation methodology --- ## Pre-Implementation Checklist ```python ✅ Requirements: □ Problem clearly defined with success metrics □ Train/test split strategy defined □ Evaluation methodology matches business objective ✅ Data Integrity: □ No temporal leakage (future → past) □ No target leakage (answer in features) □ No preprocessing leakage (fit on all data) □ No group leakage (related samples split) ✅ Evaluation Setup: □ Test set completely held out □ Metrics aligned with business objective □ Baseline models defined ``` --- ## Critical Antipatterns ### Category 1: Data Leakage 🚨 #### 1.1 Target Leakage ```python ❌ WRONG: Using "refund_issued" to predict "purchase_fraud" ✅ CORRECT: Only use features available at purchase time ``` #### 1.2 Temporal Leakage ```python ❌ WRONG: train = df[df['date'] > '2024-06-01'] # Future data ✅ CORRECT: train = df[df['date'] < '2024-06-01'] # Past for training ``` #### 1.3 Preprocessing Leakage ```python ❌ WRONG: X_scaled = scaler.fit_transform(X); train_test_split(X_scaled) ✅ CORRECT: