eval-plan
SolidDesign a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.
Install
Quality Score: 88/100
Skill Content
Details
- Author
- mnemon-dev
- Repository
- mnemon-dev/mnemon
- Created
- 3 months ago
- Last Updated
- today
- Language
- Go
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
eval-run
Execute or supervise a planned Mnemon harness eval run in an isolated HostAgent workspace.
eval-analyze
Analyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.
eval-suite-planner
Produces a concrete eval suite plan for AI agents - grounded in Microsoft's Eval Scenario Library and MS Learn agent evaluation guidance (Copilot Studio is the primary worked example, but the plan is platform-agnostic and adapts to any agent harness). Outputs scenario types, evaluation methods, quality signals, thresholds, and priority order - before any test cases are generated or evals are run.
eval-improve
Turn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.
agent-eval-design
Use when designing evaluations for AI agents, skills, routers, prompts, tool-use policies, or multi-step workflows: task sets, rubrics, graders, hard negatives, regression cases, traces, and acceptance thresholds. Do NOT use for application test planning (use `testing-strategy`), skill-library health tooling (use `skill-infrastructure`), or live debugging of a failed run (use `debugging`).