design-memory-evallisted

Use when adding or changing Pattern Oracle, stale-memory detection, verification-gated memory, or write-verification. Forces an eval-first, fixture-driven test with a false-positive guard, so trust-critical behavior is proven, not asserted in prose.
Sisuthros/claude-amplifier · ★ 1 · AI & Automation · score 72

Install: claude install-skill Sisuthros/claude-amplifier

# Design a Memory Eval The trust-critical paths — Pattern Oracle risk scoring, stale-memory detection, verification-gated promotion, write-verification — are exactly the places where a silent regression does the most damage (a wrong risk score, a missed stale day, a hallucinated-success that slips through). Changes here must be proven by a deterministic test, not described in a commit message. ## When to use You are adding or modifying any of: - the **Pattern Oracle** (pre-task risk scan / scoring), - **stale-memory detection** (`amplify_audit_freshness`, promote-from-memory), - **verification-gated memory** (claim → evidence → confirmed, 5× weighting), - **write-verification** (read-back, `AmplifierWriteError`). ## Procedure 1. **Write the failing scenario first.** Before the implementation, add a test that encodes the behavior you want and currently fails (red). This proves the test actually exercises the new behavior rather than passing vacuously. 2. **Use deterministic fixture data.** No wall-clock, no randomness, no network. Seed an in-memory or temp SQLite store with fixed rows; pin dates as literal strings. The same input must always produce the same score/verdict so the test can't flake. (The existing `oracle.test.js`, `freshness.test.js`, and `write_verification.test.js` are the templates — match their style.) 3. **Assert both layers.** Where the feature returns both human-readable text and structured data, assert **both**: the structured f