replay-shadowlisted
Install: claude install-skill a-canary/arc-agents
# replay-shadow — Capture-Replay-Diff for Confidence Before Promotion
A general dev practice for A/B-testing a candidate build against a live baseline *without* mirroring the full system. Pick one unit of work (a worker turn, a request, a job), freeze its execution, replay against the candidate, diff. Repeat across a corpus until variance stabilizes.
Replay-shadow is the unit-test analogue of live shadowing. Cheap, deterministic, runnable on every prompt change. Live shadow is the integration-test analogue — expensive, run before major cutovers. This skill is the cheap one.
## When to use
- Prompt iteration (system prompt, frame templates, skill set).
- Model swap (cost/quality tradeoff between two models for the same workload).
- API/SDK upgrade (does new client behave the same on real captured inputs?).
- Config change with non-obvious downstream effects (timeouts, retry policy, tool allowlist).
## When NOT to use
- Concurrency bugs, race conditions, factory backpressure — those need live load, not replay.
- UX integration regressions — those need the real transport.
- Anything where the input distribution itself is what changed.
Replay tests *behavior on past inputs*. It does not test the future.
## The three verbs
Three steps, three artifacts. The skill prescribes the contract; the agent wires it to the system at hand.
### 1. capture — freeze a real execution into a fixture
A **fixture** is the minimum reproducible context for one unit of work. Four parts:
| P