pipeline-designlisted

Use when designing data pipelines for moving, transforming, and delivering data. Covers ETL vs ELT pattern selection, orchestration tool choice, batch vs streaming trade-offs, idempotency guarantees, data quality checkpoints, and lineage tracking. Do not use for schema modeling (use schema-evaluation) or ML workflows (use ml-workflow).
dtsong/my-claude-setup · ★ 5 · Data & Documents · score 76

Install: claude install-skill dtsong/my-claude-setup

# Pipeline Design ## Purpose Design data pipelines that reliably move, transform, and deliver data from source systems to consumption layers. Covers ETL vs ELT pattern selection, orchestration tool choice, batch vs streaming trade-offs, idempotency guarantees, data quality checkpoints, and lineage tracking. ## Scope Constraints Reads pipeline configurations, DAG definitions, orchestration manifests, and infrastructure specs for analysis. Does not execute pipelines, deploy infrastructure, or modify production configurations. ## Inputs - Source systems and their data formats (databases, APIs, event streams, files) - Destination systems (warehouse, lake, feature store, BI tool) - Data volume and velocity (rows/day, events/second, payload size) - Freshness requirements (real-time, near-real-time, hourly, daily) - Existing infrastructure (cloud provider, orchestration tools, current pipelines) - Team size and expertise (SQL-heavy? Python-heavy? Platform team available?) ## Input Sanitization No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets. ## Procedure ### Progress Checklist - [ ] Step 1: Map source-to-destination flows - [ ] Step 2: Choose ETL vs ELT pattern - [ ] Step 3: Select batch vs streaming - [ ] Step 4: Design for idempotency - [ ] Step 5: Define data quality checkpoints - [ ] Step 6: Plan lineage and observability - [ ] Step 7: Select orchestration tool ### Step 1: Map Source-to-Destination Flows