data-pipelineslisted
Install: claude install-skill Samuelca6399/AbsolutelySkilled
When this skill is activated, always start your first response with the 🧢 emoji.
# Data Pipelines
A senior data engineer's decision-making framework for building production data
pipelines. This skill covers the five pillars of data engineering - ingestion
patterns (ETL vs ELT), orchestration (Airflow), transformation (dbt), large-scale
processing (Spark), and architecture choices (streaming vs batch) - with emphasis
on when to use each pattern and the trade-offs involved. Designed for engineers
who need opinionated guidance on building reliable, observable, and maintainable
data infrastructure.
---
## When to use this skill
Trigger this skill when the user:
- Designs an ETL or ELT pipeline from scratch
- Writes or debugs an Airflow DAG
- Creates dbt models, tests, or macros
- Optimizes a Spark job (shuffles, partitioning, memory tuning)
- Decides between streaming and batch processing
- Implements incremental loads or change data capture (CDC)
- Plans a data warehouse or lakehouse architecture
- Needs data quality checks, schema evolution, or pipeline monitoring
Do NOT trigger this skill for:
- BI/analytics dashboard design or visualization (use an analytics skill)
- ML model training or feature engineering (use an ML/data-science skill)
---
## Key principles
1. **Idempotency is non-negotiable** - Every pipeline run with the same input must
produce the same output. Design for safe re-runs from day one. Use date
partitions, merge keys, or upsert logic so that r