databricks-core-workflow-a

Featured

Execute Databricks primary workflow: Delta Lake ETL pipelines. Use when building data ingestion pipelines, implementing medallion architecture, or creating Delta Lake transformations. Trigger with phrases like "databricks ETL", "delta lake pipeline", "medallion architecture", "databricks data pipeline", "bronze silver gold".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Databricks Core Workflow A: Delta Lake ETL ## Overview Build production Delta Lake ETL pipelines using the medallion architecture (Bronze > Silver > Gold). Uses Auto Loader (`cloudFiles`) for incremental ingestion, `MERGE INTO` for upserts, and Delta Live Tables for declarative pipelines. ## Prerequisites - Completed `databricks-install-auth` setup - Unity Catalog enabled with catalogs/schemas created - Access to cloud storage for raw data (S3, ADLS, GCS) ## Architecture ``` Raw Sources (S3/ADLS/GCS) │ Auto Loader (cloudFiles) ▼ Bronze (raw + metadata) │ Cleanse, deduplicate, type-cast ▼ Silver (conformed) │ Aggregate, join, feature engineer ▼ Gold (analytics-ready) ``` ## Instructions ### Step 1: Bronze Layer — Raw Ingestion with Auto Loader Auto Loader (`cloudFiles` format) incrementally processes new files as they arrive. It handles schema inference, evolution, and scales to millions of files. ```python from pyspark.sql import SparkSession from pyspark.sql.functions import current_timestamp, input_file_name, lit spark = SparkSession.builder.getOrCreate() # Streaming ingestion with Auto Loader bronze_stream = ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.schemaLocation", "/checkpoints/bronze/orders/schema") .option("cloudFiles.inferColumnTypes", "true") .option("cloudFiles.schemaEvolutionMode", "addNewColumns") .load("s3://data-lake/raw/orders/") ) # Add ing...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

databricks-core-workflow-b

Execute Databricks secondary workflow: MLflow model training and deployment. Use when building ML pipelines, training models, or deploying to production. Trigger with phrases like "databricks ML", "mlflow training", "databricks model", "feature store", "model registry".

2,266 Updated today
jeremylongshore
AI & Automation Featured

databricks-data-handling

Implement Delta Lake data management patterns including GDPR, PII handling, and data lifecycle. Use when implementing data retention, handling GDPR requests, or managing data lifecycle in Delta Lake. Trigger with phrases like "databricks GDPR", "databricks PII", "databricks data retention", "databricks data lifecycle", "delete user data".

2,266 Updated today
jeremylongshore
AI & Automation Solid

senior-data-engineer

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

16,642 Updated yesterday
alirezarezvani
AI & Automation Featured

databricks-reference-architecture

Implement Databricks reference architecture with best-practice project layout. Use when designing new Databricks projects, reviewing architecture, or establishing standards for Databricks applications. Trigger with phrases like "databricks architecture", "databricks best practices", "databricks project structure", "how to organize databricks", "databricks layout".

2,266 Updated today
jeremylongshore
AI & Automation Featured

snowflake-core-workflow-a

Execute Snowflake primary workflow: data loading via stages and COPY INTO. Use when loading data from S3/GCS/Azure into Snowflake tables, setting up Snowpipe for continuous ingestion, or bulk loading files. Trigger with phrases like "snowflake load data", "snowflake COPY INTO", "snowflake stage", "snowflake ingest", "snowflake S3 load", "snowpipe".

2,266 Updated today
jeremylongshore