sequential-monitoringlisted

Always-valid sequential inference for honest peeking at running A/B tests. Applies mSPRT (mixture Sequential Probability Ratio Test) or confidence sequences so the false-positive rate stays at the nominal alpha even when the test is checked daily. Use this skill when the user asks if it's safe to call an A/B test early, or to peek-check a running test. Pairs with experiment-result-reader and bayesian-experiment-reader, and with analytics-diagnostic-method for the framing discipline. Triggers when Clamp MCP returns mid-experiment exposure and conversion counts, or when a user references peeking, early stopping, sequential testing, alpha-spending, mSPRT, or confidence sequences. Vendor-neutral methodology; works with any analytics source, with Clamp MCP as the canonical integration. Use whenever interpreting an in-flight experiment where the planned horizon has not been reached but the user wants a stop/continue decision.
clamp-sh/analytics-skills · ★ 6 · AI & Automation · score 81

Install: claude install-skill clamp-sh/analytics-skills

# Sequential monitoring A fixed-horizon A/B test promises a 5% false-positive rate at one planned read. The moment you check the result daily and stop "when it looks good", the actual false-positive rate climbs to 20-30%. Sequential testing fixes this: it lets you check as often as you like and stop the moment the evidence is strong enough, with the type-I error still controlled at the nominal alpha. This skill encodes when to apply mSPRT versus confidence sequences, how to read the boundaries, and when sequential math will not rescue an underpowered test. ## When NOT to use this - The test has fewer than ~400 exposed users per variant. Sequential methods do not manufacture power. At n<400 the boundaries are nowhere near being crossed and the honest answer is "wait, do not peek". - The conversion metric has strong seasonality (B2B day-of-week, retail weekday/weekend, SaaS payday cycles) and the test has not run a full cycle. Sequential boundaries can cross on a Tuesday and uncross by Sunday; the math is valid but the decision is fragile. - The user wants to *design* the test (sample size, MDE, variant logic) rather than read a running one. Different skill. - The experiment was already declared with a fixed analysis plan and the team agreed to read it only at the end. Switching to sequential mid-flight is a governance decision, not a stats one; flag it and ask. ## The peeking problem A fixed-horizon test computes a p-value under the assumption you look once, at the planne