small-sample-analysislisted
Install: claude install-skill jiachengwang-punch/small-sample-analysis
# Small Sample Analysis
A complete methodology for building defensible predictive models on small datasets (typically n < 200, often n < 50).
## When this skill applies
Small-sample analysis differs fundamentally from standard ML workflows. The defaults that work on 100k+ rows actively harm models on small data:
- **XGBoost/LightGBM** — overfit catastrophically; CV-R² often negative
- **Single train/test split** — variance too high to draw conclusions
- **Stepwise feature selection** — picks noise as signal
- **Headline metric reporting (just R²)** — hides systematic bias
This skill captures a methodology that handles these pitfalls explicitly.
**Triggers:**
- Sample size mentioned as small (< 200, especially < 50)
- Feature-to-sample ratio is concerning (p/n > 0.1)
- User asks "why not XGBoost" or shows confusion about model choice
- User needs to justify decisions to non-technical stakeholders
- User uses words like "stores", "patients", "experiments", "cohorts" with limited counts
- Any predictive modeling task where the user needs interpretability + rigor
## Output language
Match the user's natural language for all deliverables (Notebook markdown, Word body, chart labels, slides). Code, math notation, and standard ML abbreviations (Ridge, SHAP, R², MAPE) stay in English regardless. For non-Latin scripts, set CJK-capable fonts in matplotlib (`Noto Sans CJK JP`) and docx (`Microsoft YaHei`) to prevent `□□□` rendering bugs.
## Core principles
Drill these into every