twin-testlisted

GAN-style identity verification -- tests clone fidelity by comparing clone responses against real user messages. Run /twin-test to start a blind taste test, or /twin-test score to see your fidelity score over time.
project-nomos/nomos · ★ 22 · AI & Automation · score 84

Install: claude install-skill project-nomos/nomos

# Twin Test -- Adversarial Clone Fidelity Check A blind taste test where the clone generates responses to the same contexts as real user messages, then a discriminator identifies which is real and which is the clone. Specific style corrections feed back into the user model. ## How It Works 1. **Sample** -- Pull 3-5 real sent messages from memory (the "ground truth") 2. **Generate** -- For each message, generate a clone response to the same context 3. **Discriminate** -- A separate agent compares pairs and identifies the real message 4. **Score** -- Calculate fidelity (% of times the discriminator is fooled) 5. **Correct** -- Extract specific style corrections from discriminator feedback ## Backend tools (use these, don't improvise) - **`twin_test_sample`** (`mcp__nomos-think__twin_test_sample`) -- pulls N real messages + their contexts to test against. Use this for the Sample phase. - **`twin_test_record`** (`mcp__nomos-think__twin_test_record`) -- after you've discriminated each pair, pass `results` (per pair: true = the judge spotted the real message, false = fooled). It computes the fidelity score with the documented formula and PERSISTS it to the DB for the trend. - **`twin_test_history`** (`mcp__nomos-think__twin_test_history`) -- the stored score history. Use for `/twin-test score` (do not keep scores in chat memory). ## Commands - `/twin-test` -- Run a full twin test session (3-5 message pairs) - `/twin-test score` -- Show fidelity score history ## Se