run-pipelinelisted

Run the full data science pipeline: validate raw data, preprocess, engineer features, train model, and evaluate. Use this when you want to execute the end-to-end ML pipeline or re-run it after data or code changes.
morganmuli/metaskill · ★ 1 · AI & Automation · score 71

Install: claude install-skill morganmuli/metaskill

You are executing the full data science pipeline for this project. Run each stage sequentially, verifying success before proceeding to the next stage. Stop immediately if any stage fails and report the error clearly. ## Dynamic Context Current branch: !`git branch --show-current` Data directory contents: !`ls data/ 2>/dev/null || echo "No data/ directory found"` Available configs: !`ls configs/*.yaml 2>/dev/null || ls configs/*.toml 2>/dev/null || echo "No config files found"` Python environment: !`which python3 && python3 --version 2>/dev/null || echo "Python not found"` Recent changes: !`git diff --stat HEAD~3 2>/dev/null || echo "No recent commits"` ## Configuration If the user provided a config file as an argument, use it: `$ARGUMENTS` Otherwise, look for the default config at `configs/experiment.yaml` or `configs/experiment.toml`. ## Pipeline Stages Execute each stage in order. After each stage, check for errors and verify outputs exist before proceeding. ### Stage 1: Environment Check Verify the Python environment is ready: ```bash python3 -c "import torch; import pandas; import numpy; print(f'PyTorch {torch.__version__}, pandas {pandas.__version__}, NumPy {numpy.__version__}')" ``` If imports fail, report which packages are missing and suggest `pip install -r requirements.txt`. ### Stage 2: Data Validation Run data validation on the raw data: ```bash python3 -m src.data.validate --data-dir data/raw/ ``` If the validation script does not exist, look for al