evaluate-modellisted
Install: claude install-skill morganmuli/metaskill
You are running model evaluation for this project. Your goal is to load a trained model checkpoint, evaluate it on the held-out test set, compute comprehensive metrics, and generate a structured report.
## Dynamic Context
Current branch: !`git branch --show-current`
Available checkpoints: !`ls checkpoints/*.pt checkpoints/*.pth 2>/dev/null || echo "No checkpoints found"`
Test data: !`ls data/processed/test* data/features/test* 2>/dev/null || echo "No test data found"`
Latest metrics: !`ls -t reports/*.json experiments/*.json 2>/dev/null | head -3 || echo "No previous metrics found"`
Config files: !`ls configs/*.yaml configs/*.toml 2>/dev/null || echo "No configs found"`
## Checkpoint Selection
If the user provided a checkpoint path as an argument, use it: `$ARGUMENTS`
Otherwise, find the latest checkpoint:
1. Look for `checkpoints/best_model.pt` or `checkpoints/best_model.pth`
2. If not found, find the most recently modified `.pt` or `.pth` file in `checkpoints/`
3. If no checkpoints exist, report the error and stop
## Evaluation Process
### Step 1: Load and Verify Checkpoint
Verify the checkpoint file exists and can be loaded:
```bash
python3 -c "
import torch
ckpt = torch.load('$CHECKPOINT_PATH', map_location='cpu', weights_only=False)
print('Checkpoint keys:', list(ckpt.keys()))
print('Epoch:', ckpt.get('epoch', 'unknown'))
print('Best metric:', ckpt.get('best_metric', 'unknown'))
print('Config:', ckpt.get('config', 'not stored'))
"
```
Report the checkpoint meta