data-warehouse-experimentation

Featured

Running experiments out of the data warehouse instead of via dedicated experiment platforms. SQL-based assignment, exposure logging discipline, metric definitions in dbt models, statistical analysis in SQL or Python, variance reduction with CUPED, sequential testing, and the operational tradeoffs vs platforms like Statsig and Optimizely. Triggers on warehouse-native experimentation, run experiments in BigQuery, run experiments in Snowflake, dbt experiments, SQL t-test, CUPED variance reduction, exposure log, sample ratio mismatch, sequential testing, mSPRT, doubly robust estimation, build vs buy experimentation. Also triggers when the team is choosing between platform and warehouse, building warehouse-native experiment infrastructure, auditing one, or running an experiment with a custom metric the platform cannot handle.

AI & Automation 501 stars 64 forks Updated 6 days ago MIT

Install

View on GitHub

Quality Score: 97/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Data Warehouse Experimentation A senior data scientist's playbook for running experiments natively out of BigQuery, Snowflake, or any modern data warehouse, with metric definitions in dbt and statistical analysis in SQL or Python. Most companies that run experiments at scale use a dedicated platform. Statsig, Optimizely, LaunchDarkly with experimentation, PostHog, Amplitude Experiment. The platforms are good. They handle assignment, instrumentation, and analysis in one product, and the SQL-savvy data team does not have to reinvent the variance reduction wheel. There is a different operational model that mature data teams increasingly choose: warehouse-native experimentation. Assignment happens in code or via feature flags. Exposure events fire to the warehouse like any other event. Metrics are defined as dbt models. Statistical analysis runs as SQL or in a Python notebook against warehouse data. The "experiment platform" is just your existing data stack. This skill covers when warehouse-native is the right call, the architecture, and the specific techniques that make it work: assignment patterns, exposure logging discipline, metric definitions in dbt, t-tests and CUPED in SQL, sequential testing, and the pitfalls that take down homegrown setups. When to use this skill: deciding between platform vs warehouse-native, building a warehouse-native experiment infrastructure, auditing an existing one, or running a specific experiment when the platform of record cannot handle ...

Details

Author: rampstackco
Repository: rampstackco/claude-skills
Created: 3 months ago
Last Updated: 6 days ago
Language: Python
License: MIT

Integrates with

Anthropic · AI Google Cloud · Cloud PostHog · Analytics

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

experiment-design

A discipline for designing experiments (A/B tests, multivariate, holdouts) so the results actually answer the question you asked. Hypothesis writing, sample size, duration, segment analysis, interpretation, decision-making, and the common failure modes that produce confidently wrong shipping decisions.

501 Updated 6 days ago

rampstackco

AI & Automation Featured

experimentation-platform-orchestrator

A platform decision framework for experimentation. When to use Statsig vs PostHog vs GrowthBook vs Optimizely vs Amplitude vs Eppo vs Kameleoon. How to migrate between them. How to coordinate when multi-platform is genuinely warranted. The decisions that compound for years and the ones you can defer. Triggers on which experimentation platform, choose Statsig vs PostHog, evaluate experimentation tools, switch experimentation platform, migrate from Optimizely, consolidate experimentation tools, multi-platform experimentation, experimentation platform decision, ab test platform selection, feature flag platform vs experiment platform, warehouse-native experiments, vendor lock-in experimentation. Also triggers when a team is asking about cost, governance, or migration cost across experimentation tools, or when an evaluation is starting.

501 Updated 6 days ago

rampstackco

AI & Automation Featured

experimentation-analytics

How to read experiment results without fooling yourself. Confidence intervals, p-values, multiple testing, sequential testing, CUPED, heterogeneous treatment effects, ratio metrics, network effects, dashboard reconciliation, and the interpretation failures that produce confidently wrong shipping decisions.

501 Updated 6 days ago

rampstackco