data-pipelines

Solid

Use this skill when building data pipelines, ETL/ELT workflows, or data transformation layers. Triggers on Airflow DAG design, dbt model creation, Spark job optimization, streaming vs batch architecture decisions, data ingestion, data quality checks, pipeline orchestration, incremental loads, CDC (change data capture), schema evolution, and data warehouse modeling. Acts as a senior data engineer advisor for building reliable, scalable data infrastructure.

Data & Documents 164 stars 28 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 92/100

Stars 20%
74
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

When this skill is activated, always start your first response with the ๐Ÿงข emoji. # Data Pipelines A senior data engineer's decision-making framework for building production data pipelines. This skill covers the five pillars of data engineering - ingestion patterns (ETL vs ELT), orchestration (Airflow), transformation (dbt), large-scale processing (Spark), and architecture choices (streaming vs batch) - with emphasis on when to use each pattern and the trade-offs involved. Designed for engineers who need opinionated guidance on building reliable, observable, and maintainable data infrastructure. --- ## When to use this skill Trigger this skill when the user: - Designs an ETL or ELT pipeline from scratch - Writes or debugs an Airflow DAG - Creates dbt models, tests, or macros - Optimizes a Spark job (shuffles, partitioning, memory tuning) - Decides between streaming and batch processing - Implements incremental loads or change data capture (CDC) - Plans a data warehouse or lakehouse architecture - Needs data quality checks, schema evolution, or pipeline monitoring Do NOT trigger this skill for: - BI/analytics dashboard design or visualization (use an analytics skill) - ML model training or feature engineering (use an ML/data-science skill) --- ## Key principles 1. **Idempotency is non-negotiable** - Every pipeline run with the same input must produce the same output. Design for safe re-runs from day one. Use date partitions, merge keys, or upsert logic so that r...

Details

Author
AbsolutelySkilled
Repository
AbsolutelySkilled/AbsolutelySkilled
Created
2 months ago
Last Updated
yesterday
Language
MDX
License
MIT

Related Skills