pipeline-designlisted
Install: claude install-skill dtsong/my-claude-setup
# Pipeline Design
## Purpose
Design data pipelines that reliably move, transform, and deliver data from source systems to consumption layers. Covers ETL vs ELT pattern selection, orchestration tool choice, batch vs streaming trade-offs, idempotency guarantees, data quality checkpoints, and lineage tracking.
## Scope Constraints
Reads pipeline configurations, DAG definitions, orchestration manifests, and infrastructure specs for analysis. Does not execute pipelines, deploy infrastructure, or modify production configurations.
## Inputs
- Source systems and their data formats (databases, APIs, event streams, files)
- Destination systems (warehouse, lake, feature store, BI tool)
- Data volume and velocity (rows/day, events/second, payload size)
- Freshness requirements (real-time, near-real-time, hourly, daily)
- Existing infrastructure (cloud provider, orchestration tools, current pipelines)
- Team size and expertise (SQL-heavy? Python-heavy? Platform team available?)
## Input Sanitization
No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.
## Procedure
### Progress Checklist
- [ ] Step 1: Map source-to-destination flows
- [ ] Step 2: Choose ETL vs ELT pattern
- [ ] Step 3: Select batch vs streaming
- [ ] Step 4: Design for idempotency
- [ ] Step 5: Define data quality checkpoints
- [ ] Step 6: Plan lineage and observability
- [ ] Step 7: Select orchestration tool
### Step 1: Map Source-to-Destination Flows