← ClaudeAtlas

eval-runlisted

Evaluate any output file against a structured evals.yaml assertions file and produce a score report with per-assertion pass/fail results. Activate when the Discovery Agent runs the Skill Optimize protocol to measure output quality or detect regressions after skill instruction changes.
Fr-e-d/GAAI-framework · ★ 147 · Data & Documents · score 82
Install: claude install-skill Fr-e-d/GAAI-framework
# Eval Run ## Purpose / When to Activate Activate when: - The Discovery Agent runs the Skill Optimize protocol and needs to score a skill output - A skill's instructions have been modified and a before/after quality comparison is needed - A baseline score is being established for a skill that has never been evaluated This skill is generic: it accepts any output file and any evals.yaml, regardless of skill domain. It follows the GAAI principle "skills never chain" — it evaluates the output it receives; it does not invoke the skill that produced the output. --- ## Process ### Step 1 — Load inputs 1. Read the `output_file` path. Confirm the file exists and is non-empty. If missing: FAIL immediately with error "output_file not found: {path}". 2. Read the `evals_file` path. Confirm the file exists and is valid YAML. If missing: FAIL immediately with error "evals_file not found: {path}". 3. Parse the `evals.yaml` structure. Validate: - `skill`, `version`, `description`, and `assertions` fields are present - `assertions` list is non-empty - Each assertion has `id`, `type`, and `description` fields - If any required field is missing: FAIL with error "evals.yaml validation error: {details}" For the full `evals.yaml` format spec, see `references/evals-format.md`. ### Step 2 — Run `code` assertions For each assertion where `type: code`: 1. Read the `check` field. Execute the corresponding mechanical verification: | `check` | Verification method | |---|---|