ml-experiment-reproducibilitylisted
Install: claude install-skill authenticfake/clike
# Skill: ML Experiment Reproducibility
## Intent
Ensure ML, data science, model evaluation, and experiment requirements are reproducible, measurable, and traceable.
This skill separates product code from experiments and prevents unverifiable model-quality claims.
## Use when
Use this skill when a REQ touches ML models, datasets, training, fine-tuning, feature engineering, evaluation metrics, model comparison, notebooks, pipelines, data quality, experiment tracking, or batch inference.
## Do not use when
Do not use this skill for generic LLM prompt/RAG work unless the REQ includes ML datasets, model metrics, training, offline evaluation, or experiment comparison.
## Signals
- The REQ mentions ML, model training, fine-tuning, dataset, feature, label, metric, accuracy, precision, recall, F1, ROC, drift, experiment, notebook, inference, pipeline, validation split, baseline, or model registry.
- Acceptance criteria include measurable model quality.
- Generated files include notebooks, data loaders, evaluation scripts, model wrappers, or dataset fixtures.
## Required behavior
- Define datasets, fixtures, or sample data boundaries explicitly.
- Define metrics and thresholds before implementation.
- Keep training/experiment code separate from production inference code when practical.
- Make evaluation commands reproducible.
- Record assumptions about data availability, privacy, sampling, and labels.
- Include deterministic smoke tests for data loading and metric computatio