← ClaudeAtlas

transforming-datalisted

Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.
ancoleman/ai-design-components · ★ 368 · Data & Documents · score 80
Install: claude install-skill ancoleman/ai-design-components
# Data Transformation Transform raw data into analytical assets using modern transformation patterns, frameworks, and orchestration tools. ## Purpose Select and implement data transformation patterns across the modern data stack. Transform raw data into clean, tested, and documented analytical datasets using SQL (dbt), Python DataFrames (pandas, polars, PySpark), and pipeline orchestration (Airflow, Dagster, Prefect). ## When to Use Invoke this skill when: - Choosing between ETL and ELT transformation patterns - Building dbt models (staging, intermediate, marts) - Implementing incremental data loads and merge strategies - Migrating pandas code to polars for performance improvements - Orchestrating data pipelines with dependencies and retries - Adding data quality tests and validation - Processing large datasets with PySpark - Creating production-ready transformation workflows ## Quick Start: Common Patterns ### dbt Incremental Model ```sql {{ config( materialized='incremental', unique_key='order_id' ) }} select order_id, customer_id, order_created_at, sum(revenue) as total_revenue from {{ ref('int_order_items_joined') }} group by 1, 2, 3 {% if is_incremental() %} where order_created_at > (select max(order_created_at) from {{ this }}) {% endif %} ``` ### polars High-Performance Transformation ```python import polars as pl result = ( pl.scan_csv('large_dataset.csv') .filter(pl.col('year') == 2024) .with_columns([(pl.col('quantity') * pl