write-evallisted
Install: claude install-skill dzhng/duet-agent
# Write an Eval
An eval is only trustworthy if you have seen it both **fail for the right reason** and **pass for the right reason**. Writing the assertions, watching them go green once, and moving on is how you ship an eval that passes whether or not the feature works. The standard operating procedure is: pick the outermost entry point, design the assertion so it can only hold when the behavior is present, watch it go green, then **falsify** — break the production code, confirm the eval goes red with a diagnostic that points at the real path, and restore.
This is the flow used to land `evals/state-machine-slash-skill-expansion.eval.ts`; read it as the reference implementation for a **deterministic wiring** eval (the feature either injects the right context or it doesn't).
When the behavior under test is a **model tendency** rather than deterministic wiring — "the planner doesn't over-reach into implementation", "the sub-agent doesn't drift into chat mode", anything a prompt layer nudges but cannot guarantee — the single-run flow is not enough, because one run is a coin flip. Read `evals/state-machine-agent-stays-in-state-scope.eval.ts` as the reference for that shape, and follow §6 below in addition to §1–5.
## 1. Drive the outermost entry point
Per AGENTS.md and the review skill (§13): test behavior through the surface a user actually hits, not internal helpers.
- A unit test on the pure function (e.g. `test/skill-context-resolve.test.ts` for `resolveSlashSkillPrompt`