eval-loop-builderlisted
Install: claude install-skill Luis247911/universal-ai-workspace-foundation
# eval-loop-builder
Builds the load-bearing feedback loop of the harness: **dataset + typed assertions + runner + threshold gate**. Without an eval you are guessing; with one, every prompt or model change is measured and a regression blocks the merge.
## When to use
- Starting any agent feature — write the eval *first* (evals-first), then make it pass.
- "Is the new prompt/model actually better?" → encode the answer as a scored suite.
- Wiring a CI gate that must fail when quality drops below a threshold.
- Adding cases for a bug you just fixed so it never regresses silently.
## Run it
```
# scaffold a runnable starter suite, then run it
python -m harness.eval scaffold --out my.suite.json
python -m harness.eval run --suite my.suite.json
# from a fresh clone (no install): use the bundled shim
python .claude/skills/eval-loop-builder/scripts/run.py run --suite my.suite.json --threshold 0.9
```
`run` exits non-zero when the weighted score is below the threshold — that exit code is what makes it a CI gate.
## Suite format (JSON; YAML works with the `[yaml]` extra)
```json
{
"suite": "name", "threshold": 0.8,
"cases": [
{ "id": "case-1",
"input": { "kind": "inline", "output": "the text under test" },
"assertions": [ { "type": "contains", "value": "hello", "weight": 2.0 } ] }
]
}
```
- **input kinds**: `inline` (output given directly), `file` (read a file relative to the suite's `base`), `cmd` (stdout of a subprocess; args as a list, never a shell str