data-pipelines

Install

View on GitHub

Quality Score: 92/100

Stars 20%

74

Recency 20%

100

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

When this skill is activated, always start your first response with the 🧢 emoji. # Data Pipelines A senior data engineer's decision-making framework for building production data pipelines. This skill covers the five pillars of data engineering - ingestion patterns (ETL vs ELT), orchestration (Airflow), transformation (dbt), large-scale processing (Spark), and architecture choices (streaming vs batch) - with emphasis on when to use each pattern and the trade-offs involved. Designed for engineers who need opinionated guidance on building reliable, observable, and maintainable data infrastructure. --- ## When to use this skill Trigger this skill when the user: - Designs an ETL or ELT pipeline from scratch - Writes or debugs an Airflow DAG - Creates dbt models, tests, or macros - Optimizes a Spark job (shuffles, partitioning, memory tuning) - Decides between streaming and batch processing - Implements incremental loads or change data capture (CDC) - Plans a data warehouse or lakehouse architecture - Needs data quality checks, schema evolution, or pipeline monitoring Do NOT trigger this skill for: - BI/analytics dashboard design or visualization (use an analytics skill) - ML model training or feature engineering (use an ML/data-science skill) --- ## Key principles 1. **Idempotency is non-negotiable** - Every pipeline run with the same input must produce the same output. Design for safe re-runs from day one. Use date partitions, merge keys, or upsert logic so that r...

Details

Author: AbsolutelySkilled
Repository: AbsolutelySkilled/AbsolutelySkilled
Created: 2 months ago
Last Updated: yesterday
Language: MDX
License: MIT

Install

Quality Score: 92/100

Skill Content

Details

Related Skills

burpsuite-project-parser

data-storytelling

documentation