databricks-migration-deep-dive

Featured

Execute comprehensive platform migrations to Databricks from legacy systems. Use when migrating from on-premises Hadoop, other cloud platforms, or legacy data warehouses to Databricks. Trigger with phrases like "migrate to databricks", "hadoop migration", "snowflake to databricks", "legacy migration", "data warehouse migration".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Databricks Migration Deep Dive ## Overview Comprehensive migration strategies for moving to Databricks from Hadoop, Snowflake, Redshift, Synapse, or legacy data warehouses. Covers discovery and assessment, schema conversion, data migration with batching and validation, ETL/pipeline conversion, and cutover planning with rollback procedures. ## Prerequisites - Access to source and target systems - Databricks workspace with Unity Catalog enabled - Understanding of current data architecture and dependencies - Stakeholder alignment on migration timeline ## Migration Patterns | Source | Pattern | Complexity | Timeline | |--------|---------|------------|----------| | Hive Metastore (same workspace) | SYNC / CTAS / DEEP CLONE | Low | Days | | On-prem Hadoop/HDFS | Lift-and-shift to cloud storage + UC | High | 6-12 months | | Snowflake | Parallel run + cutover | Medium | 3-6 months | | AWS Redshift | Unload to S3 + Auto Loader | Medium | 3-6 months | | Legacy DW (Oracle/Teradata) | Full rebuild with JDBC extraction | High | 12-18 months | ## Instructions ### Step 1: Discovery and Assessment Inventory all source tables with metadata for migration planning. ```python from pyspark.sql import SparkSession from dataclasses import dataclass spark = SparkSession.builder.getOrCreate() @dataclass class TableInventory: database: str table: str table_type: str format: str row_count: int size_mb: float columns: int partitions: list[str] def assess_hive_...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

snowflake-migration-deep-dive

Execute migration to Snowflake from Redshift, BigQuery, or on-prem databases with data transfer, schema conversion, and validation strategies. Use when migrating to Snowflake from another platform, planning data transfers, or re-platforming existing data warehouses to Snowflake. Trigger with phrases like "migrate to snowflake", "snowflake migration", "redshift to snowflake", "bigquery to snowflake", "snowflake replatform".

2,266 Updated today
jeremylongshore
AI & Automation Featured

databricks-upgrade-migration

Upgrade Databricks runtime versions and migrate between features. Use when upgrading DBR versions, migrating to Unity Catalog, or updating deprecated APIs and features. Trigger with phrases like "databricks upgrade", "DBR upgrade", "databricks migration", "unity catalog migration", "hive to unity".

2,266 Updated today
jeremylongshore
API & Backend Listed

data-migration

Moving data between systems safely — cutover planning, backfill strategies, dual-write patterns, validation, rollback procedures, and zero-downtime migration techniques. Use this skill whenever the team is migrating from one database or warehouse to another (MySQL → Snowflake, Redshift → BigQuery, on-prem → cloud), replacing a legacy pipeline, doing a major schema change on a live table, or planning a cutover that cannot have downtime. Also trigger when the user asks about dual-write, shadow reads, data validation across systems, incremental vs. full migration, or how to safely retire an old system. If the phrase "migrate", "move data", "cutover", "legacy system", or "replace the old pipeline" appears, this skill should be active.

0 Updated 4 days ago
Methasit-Pun
AI & Automation Featured

databricks-core-workflow-a

Execute Databricks primary workflow: Delta Lake ETL pipelines. Use when building data ingestion pipelines, implementing medallion architecture, or creating Delta Lake transformations. Trigger with phrases like "databricks ETL", "delta lake pipeline", "medallion architecture", "databricks data pipeline", "bronze silver gold".

2,266 Updated today
jeremylongshore
AI & Automation Featured

clickhouse-migration-deep-dive

Execute ClickHouse schema migrations — ALTER TABLE operations, data migration between engines, versioned migration runners, and zero-downtime schema changes. Use when modifying ClickHouse schemas, migrating data between tables, or implementing versioned migration workflows. Trigger: "clickhouse migration", "clickhouse ALTER TABLE", "clickhouse schema change", "migrate clickhouse", "clickhouse add column", "clickhouse schema migration".

2,266 Updated today
jeremylongshore