data-migrationlisted
Install: claude install-skill Methasit-Pun/data_engineer_claude_skills
# Data Migration Patterns
## The Core Risk
Migrations fail in two ways: data loss (records that existed in the old system don't appear in the new one) and data corruption (records appear but with wrong values). Both are subtle and can go undetected for weeks if you don't build explicit validation into the migration plan.
The second risk is business disruption — users or downstream systems depend on the old system and can't tolerate a gap in availability.
Treat a migration like a deployment: plan for rollback from day one, validate at every stage, and don't cut over until you've proven the new system matches the old one.
---
## Migration Strategies
### Big bang vs. incremental
| Approach | When to use | Risk |
|---|---|---|
| **Big bang** | Small datasets (<1M rows), can tolerate downtime, low-stakes system | If something is wrong, you find out post-cutover with users affected |
| **Incremental** | Large datasets, live production systems, zero-downtime requirement | More complex, but problems are caught before cutover |
| **Dual-write** | Can't afford any data loss, need rollback capability | Double writes for a period; reconciliation required |
For anything that users or downstream pipelines depend on, default to incremental + dual-write.
---
## Phase 1: Backfill Historical Data
Load historical data into the new system before switching live traffic. This is the bulk of the work — do it offline, under no time pressure.
```python
def backfill_in_chunks(source_conn,