eval-result-interpreterlisted

Analyzes AI agent evaluation results - primarily from Copilot Studio (the worked example here, via its CSV export) but also from custom harnesses or any evaluator that produces per-case pass/fail rows - using Microsoft's Triage & Improvement Playbook. Returns a SHIP / ITERATE / BLOCK verdict with root cause classification, diagnostic triage, prioritized remediation, and pattern analysis.
varunk130/AI-Eval-Skills · ★ 1 · AI & Automation · score 74

Install: claude install-skill varunk130/AI-Eval-Skills

## Purpose This skill takes eval results - a Copilot Studio evaluation CSV file (the primary worked example), an export from your own evaluator/harness, a pasted summary, or a plain-English description of results - and produces a structured triage report. It is the final step in the eval lifecycle: plan → generate → run → **interpret**. The output tells you whether to ship, what broke, why it broke, and what to fix first. > **Platform context.** All the operational examples below use Microsoft Copilot Studio because its evaluation CSV format is well-documented and concrete. The triage framework itself (4 layers, 3 root cause types, SHIP/ITERATE/BLOCK verdict) is platform-agnostic - point it at any evaluator output with per-case results and the same analysis applies. This skill serves **Stages 2-4** of the [MS Learn 4-stage evaluation framework](https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/evaluation-checklist). In Stage 2 (Set Baseline & Iterate), it interprets your first eval results and guides fixes. In Stage 3 (Systematic Expansion), it identifies coverage gaps worth expanding into. In Stage 4 (Operationalize), it triages regression failures after agent updates. Use the [evaluation checklist template](https://github.com/microsoft/PowerPnPGuidanceHub/tree/main/guidance/agentevalguidancekit) to track which stage you are in and what to interpret next. **Knowledge source:** This skill's analysis framework is grounded in **Microsoft's Triage & Improve