eval-workflow

Install

View on GitHub

Quality Score: 90/100

Stars 20%

72

Recency 20%

100

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

# Workflow Evaluation Run automated evaluation tests against a multi-agent workflow. ## Research Foundation - **REF-001**: BP-9 - Continuous evaluation of agent performance - **REF-002**: KAMI benchmark methodology for real agentic task evaluation ## Usage ```bash /eval-workflow flow-security-review-cycle /eval-workflow flow-inception-to-elaboration --scenario distractor-test /eval-workflow flow-deploy-to-production --verbose --strict ``` ## Arguments | Argument | Required | Description | |----------|----------|-------------| | workflow-name | Yes | Workflow (flow command) to evaluate | ## Options | Option | Default | Description | |--------|---------|-------------| | --scenario | all | Specific scenario to run | | --verbose | false | Show detailed test output | | --output | stdout | Output file for results | | --strict | false | Fail on any test failure | | --timeout | 300 | Maximum seconds per scenario | ## What Gets Evaluated ### Orchestration Quality - **Agent coordination**: Parallel agents launched correctly in single message - **Handoff fidelity**: Artifacts pass correctly between phases - **Gate enforcement**: Phase gates checked before transition ### Archetype Resistance - `grounding-test` — Archetype 1: Premature action without reading state - `distractor-test` — Archetype 3: Context pollution from irrelevant artifacts - `recovery-test` — Archetype 4: Fragile execution when subagent fails ### Output Validation - Required artifacts created in correct ...

Details

Author: jmagly
Repository: jmagly/aiwg
Created: 10 months ago
Last Updated: today
Language: TypeScript
License: MIT

Install

Quality Score: 90/100

Skill Content

Details

Integrates with

Similar Skills

eval-agent

workflow-optimizer

dynamic-workflows