orchestration-patternslisted
Install: claude install-skill Methasit-Pun/data_engineer_claude_skills
# Orchestration Patterns for Data Pipelines
## When this skill applies
Orchestration is the layer that turns a collection of scripts into a reliable, observable pipeline. Reach for these patterns any time you're wiring up tasks that depend on each other, need to run on a schedule, or must recover gracefully from failures. The right design here saves enormous debugging time downstream.
---
## Choosing an Orchestrator
| Orchestrator | Best fit | Watch out for |
|---|---|---|
| **Airflow** | Large teams, mature ecosystem, lots of operators | Heavy setup; scheduler can be a bottleneck |
| **Prefect** | Python-native, quick start, hybrid deployments | Smaller operator ecosystem than Airflow |
| **Dagster** | Asset-centric thinking, strong type system, great UI | Steeper learning curve for teams new to assets |
The biggest split is **task-centric** (Airflow, Prefect) vs **asset-centric** (Dagster). Asset-centric thinking — "what data does this job produce?" rather than "what does this job do?" — makes lineage and freshness checks natural. If you're starting fresh and the team can learn, Dagster is worth the investment.
---
## DAG Structure Principles
### One task, one responsibility
Each task should do exactly one thing that can be individually retried, skipped, or monitored. Avoid "god tasks" that extract, transform, and load in one function — a failure anywhere forces the whole thing to rerun.
```python
# Airflow — split extract/transform/load into separate operators
e