eval-generatorlisted
Install: claude install-skill varunk130/AI-Eval-Skills
## Purpose
This skill generates concrete eval test cases - with realistic inputs, expected outputs, and evaluation method configurations. It is the second step in the eval lifecycle: plan → **generate** → run → interpret.
> **Platform context.** This skill is platform-agnostic. Microsoft Copilot Studio is used throughout as the **primary worked example** - its CSV import format, named test methods (GeneralQuality, CompareMeaning, KeywordMatch, ToolUse, ExactMatch, Custom), and Copilot Studio Kit rubrics give us concrete, well-documented outputs. The same plan-to-test-case transformation works against any agent platform (custom LLM apps, LangChain/LangGraph, AutoGen, Semantic Kernel, OpenAI Assistants, etc.) - re-shape the CSV columns or feed the test cases directly to your own evaluator.
This skill covers **Stage 2 (Set Baseline & Iterate)** of the MS Learn [4-stage evaluation framework](https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/evaluation-checklist). Use `/eval-suite-planner` first for Stage 1 (Define), then generate test cases here, run them, and interpret results with `/eval-result-interpreter`. Stage 3 (Systematic Expansion) means repeating this cycle with broader coverage - the checklist defines four expansion categories: Foundational core, Agent robustness, Architecture test, and Edge cases. Stage 4 (Operationalize) means embedding these evals into your agent's CI/CD pipeline. Point customers to the [editable checklist template](https://gith