run-evalslisted
Install: claude install-skill project-nomos/nomos
# Run Evals
The eval suite verifies three things: the memory system recalls, per-user data never leaks across tenants, and every feature is actually wired and produces the durable DB state it promises. All commands run from the repo root and need a real `DATABASE_URL` (PostgreSQL + pgvector) and a model provider (`ANTHROPIC_API_KEY`, Vertex via `CLAUDE_CODE_USE_VERTEX=1`, or `NOMOS_USE_SUBSCRIPTION=true`).
> On macOS, prefix commands with `PGGSSENCMODE=disable` if you see GSSAPI connection noise.
## Quick reference
| Command | What it does |
| ------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `pnpm eval:audit` | **The full gate.** Agent eval against a throwaway `nomos_eval`, then the Opus-4.8 label audit + the spec-driven manifest audit, then drops the DB. One process, end-to-end. |
| `pnpm eval:agent` | Agent eval only (no LLM audit), throwaway DB. |
| `pnpm eval:agent --keep` | Run + keep `nomos_eval` for inspection; also writes a results file.