data-engineering-data-pipeline

Featured

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

Data & Documents 39,227 stars 6374 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Data Pipeline Architecture You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing. ## Use this skill when - Working on data pipeline architecture tasks or workflows - Needing guidance, best practices, or checklists for data pipeline architecture ## Do not use this skill when - The task is unrelated to data pipeline architecture - You need a different domain or tool outside this scope ## Requirements $ARGUMENTS ## Core Capabilities - Design ETL/ELT, Lambda, Kappa, and Lakehouse architectures - Implement batch and streaming data ingestion - Build workflow orchestration with Airflow/Prefect - Transform data using dbt and Spark - Manage Delta Lake/Iceberg storage with ACID transactions - Implement data quality frameworks (Great Expectations, dbt tests) - Monitor pipelines with CloudWatch/Prometheus/Grafana - Optimize costs through partitioning, lifecycle policies, and compute optimization ## Instructions ### 1. Architecture Design - Assess: sources, volume, latency requirements, targets - Select pattern: ETL (transform before load), ELT (load then transform), Lambda (batch + speed layers), Kappa (stream-only), Lakehouse (unified) - Design flow: sources → ingestion → processing → storage → serving - Add observability touchpoints ### 2. Ingestion Implementation **Batch** - Incremental loading with watermark columns - Retry logic with exponential backoff - Schema validati...

Details

Author
sickn33
Repository
sickn33/antigravity-awesome-skills
Created
4 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Listed

data-engineering-data-pipeline

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

335 Updated today
aiskillstore
Data & Documents Listed

pipeline-architect

Designs and implements data pipelines: ETL/ELT, streaming, batch processing, schema migrations, and data warehouse architecture. Covers Kafka, Airflow, dbt, Spark, ClickHouse, BigQuery, Snowflake, Redis Streams, and more. Use this skill when the user asks about data pipelines, ETL jobs, data transformation, streaming setup, data warehouse design, CDC, schema migrations, data quality checks, or anything involving moving data from source to target. Also triggers on "build a pipeline," "migrate data from X to Y," "set up streaming," "design my data warehouse," or "data quality is bad, help me fix it."

1 Updated 2 days ago
mturac
Data & Documents Featured

data-engineer

Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.

39,227 Updated today
sickn33
Data & Documents Featured

data-engineer

Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.

27,681 Updated today
davila7
Data & Documents Listed

data-engineer

Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms. Use PROACTIVELY for data pipeline design, analytics infrastructure, or modern data stack implementation.

335 Updated today
aiskillstore