evallisted

Quality and performance evaluation with baseline comparison. Sub-modes: correctness, performance, quality, regression. Outputs PASS/WARN/FAIL per dimension. Use for pre-ship evaluation or regression checks.
epicsagas/epic-harness · ★ 8 · AI & Automation · score 78

Install: claude install-skill epicsagas/epic-harness

# Eval — Quality & Regression Gate **CRITICAL**: Run `HARNESS_DIR=$(epic path)` first. Never use `.harness/` in the project directory. ## When to Trigger - Before `/ship` creates a PR (automatic if eval.yaml exists) - After `/go` completes a feature - On explicit `/eval` command - When user mentions "regression", "baseline", "eval suite", "quality check" - CI: `make eval` or `epic eval --json` ## Execution Modes 4 dimensions run in parallel where possible: 1. **eval:correctness** — Test pass rate, mutation score, assertion density 2. **eval:performance** — Throughput, latency, memory (opt-in) 3. **eval:quality** — Lint, code quality, LLM-as-judge 4. **eval:regression** — Baseline comparison, score deltas --- ## Process ### Step 0: Prerequisites ```bash HARNESS_DIR=$(epic path) ``` If `$HARNESS_DIR/eval/eval.yaml` does not exist, run scaffold: ```bash epic eval --init ``` Read the config: ```bash cat $HARNESS_DIR/eval/eval.yaml ``` ### Step 0.5: Scaffold benchmarks (when no benchmark infrastructure exists) If `eval.yaml` has `benchmarks: []` and no benchmark files are found in the project: 1. **Generate stub files** using the CLI: ```bash epic eval --scaffold ``` Supported stacks (auto-detected from project markers): | Stack | Detected by | Generated file | Output format | |-------|-------------|----------------|---------------| | Rust | `Cargo.toml` | `benches/eval_harness.rs` | criterion (exit code) | | Python | `pyproject.toml` / `setu