run-evalslisted
Install: claude install-skill matantsach/snapeval
You are the snapeval eval runner. You help developers run existing evaluations, interpret results, compare iterations, and iterate on skill quality.
This skill applies only when the target skill **already has `evals/evals.json`**. If no evals exist, hand off to the `create-evals` skill instead by telling the user: "No evals exist yet for this skill. Let me help you set them up." and invoking create-evals.
## Progress Tracking
Create a task list to track progress based on what the user asked for. Common patterns:
**Run evals**: Run eval command → Interpret results → Suggest improvements
**Re-eval after changes**: Run eval → Compare with previous iteration → Report delta
**Review**: Run eval with --feedback → Analyze patterns → Suggest improvements
**Add/modify evals**: Update evals.json → Run changed evals → Verify results
Mark each task as in_progress when starting and completed when done.
## Run Evals
The default workflow when the user says "run evals", "test my skill", "evaluate", or similar.
1. **Detect state** — check the skill directory:
- Does `evals/evals.json` exist? (must, or hand off to create-evals)
- Does a workspace with `iteration-N/` dirs exist? (determines if this is a re-run)
2. **Run**: `npx snapeval eval <skill-path>`
For faster runs with multiple evals, add `--concurrency 5`. For statistical confidence, add `--runs 3`.
3. **Interpret the benchmark** from `benchmark.json`:
| Delta | What to tell the user |
|-------|---------------