data-migrationlisted

Moving data between systems safely — cutover planning, backfill strategies, dual-write patterns, validation, rollback procedures, and zero-downtime migration techniques. Use this skill whenever the team is migrating from one database or warehouse to another (MySQL → Snowflake, Redshift → BigQuery, on-prem → cloud), replacing a legacy pipeline, doing a major schema change on a live table, or planning a cutover that cannot have downtime. Also trigger when the user asks about dual-write, shadow reads, data validation across systems, incremental vs. full migration, or how to safely retire an old system. If the phrase "migrate", "move data", "cutover", "legacy system", or "replace the old pipeline" appears, this skill should be active.
Methasit-Pun/data_engineer_claude_skills · ★ 1 · API & Backend · score 62

Install: claude install-skill Methasit-Pun/data_engineer_claude_skills

# Data Migration Patterns ## The Core Risk Migrations fail in two ways: data loss (records that existed in the old system don't appear in the new one) and data corruption (records appear but with wrong values). Both are subtle and can go undetected for weeks if you don't build explicit validation into the migration plan. The second risk is business disruption — users or downstream systems depend on the old system and can't tolerate a gap in availability. Treat a migration like a deployment: plan for rollback from day one, validate at every stage, and don't cut over until you've proven the new system matches the old one. --- ## Migration Strategies ### Big bang vs. incremental | Approach | When to use | Risk | |---|---|---| | **Big bang** | Small datasets (<1M rows), can tolerate downtime, low-stakes system | If something is wrong, you find out post-cutover with users affected | | **Incremental** | Large datasets, live production systems, zero-downtime requirement | More complex, but problems are caught before cutover | | **Dual-write** | Can't afford any data loss, need rollback capability | Double writes for a period; reconciliation required | For anything that users or downstream pipelines depend on, default to incremental + dual-write. --- ## Phase 1: Backfill Historical Data Load historical data into the new system before switching live traffic. This is the bulk of the work — do it offline, under no time pressure. ```python def backfill_in_chunks(source_conn,