← ClaudeAtlas

evaluate-modellisted

Load the latest model checkpoint, run evaluation on the test set, and generate a metrics report with confusion matrix. Use this after training to assess model performance or to re-evaluate a specific checkpoint.
morganmuli/metaskill · ★ 1 · AI & Automation · score 71
Install: claude install-skill morganmuli/metaskill
You are running model evaluation for this project. Your goal is to load a trained model checkpoint, evaluate it on the held-out test set, compute comprehensive metrics, and generate a structured report. ## Dynamic Context Current branch: !`git branch --show-current` Available checkpoints: !`ls checkpoints/*.pt checkpoints/*.pth 2>/dev/null || echo "No checkpoints found"` Test data: !`ls data/processed/test* data/features/test* 2>/dev/null || echo "No test data found"` Latest metrics: !`ls -t reports/*.json experiments/*.json 2>/dev/null | head -3 || echo "No previous metrics found"` Config files: !`ls configs/*.yaml configs/*.toml 2>/dev/null || echo "No configs found"` ## Checkpoint Selection If the user provided a checkpoint path as an argument, use it: `$ARGUMENTS` Otherwise, find the latest checkpoint: 1. Look for `checkpoints/best_model.pt` or `checkpoints/best_model.pth` 2. If not found, find the most recently modified `.pt` or `.pth` file in `checkpoints/` 3. If no checkpoints exist, report the error and stop ## Evaluation Process ### Step 1: Load and Verify Checkpoint Verify the checkpoint file exists and can be loaded: ```bash python3 -c " import torch ckpt = torch.load('$CHECKPOINT_PATH', map_location='cpu', weights_only=False) print('Checkpoint keys:', list(ckpt.keys())) print('Epoch:', ckpt.get('epoch', 'unknown')) print('Best metric:', ckpt.get('best_metric', 'unknown')) print('Config:', ckpt.get('config', 'not stored')) " ``` Report the checkpoint meta