orchestration-patternslisted

Airflow/Prefect/Dagster DAG design — task dependencies, retries, SLAs, backfill strategies, sensors, and failure recovery. Use this skill whenever the user is building or debugging a scheduled pipeline with multiple steps, asking how to handle task failures, setting up retries or alerts, designing a DAG structure, choosing between orchestrators, or dealing with backfill/reprocessing of historical data. Also trigger when the user mentions Airflow operators, Prefect flows, Dagster assets, task queues, or pipeline scheduling — even if they don't say "orchestration" explicitly. If a pipeline has more than two steps and needs to run on a schedule, this skill should be active.
Methasit-Pun/data_engineer_claude_skills · ★ 1 · Data & Documents · score 62

Install: claude install-skill Methasit-Pun/data_engineer_claude_skills

# Orchestration Patterns for Data Pipelines ## When this skill applies Orchestration is the layer that turns a collection of scripts into a reliable, observable pipeline. Reach for these patterns any time you're wiring up tasks that depend on each other, need to run on a schedule, or must recover gracefully from failures. The right design here saves enormous debugging time downstream. --- ## Choosing an Orchestrator | Orchestrator | Best fit | Watch out for | |---|---|---| | **Airflow** | Large teams, mature ecosystem, lots of operators | Heavy setup; scheduler can be a bottleneck | | **Prefect** | Python-native, quick start, hybrid deployments | Smaller operator ecosystem than Airflow | | **Dagster** | Asset-centric thinking, strong type system, great UI | Steeper learning curve for teams new to assets | The biggest split is **task-centric** (Airflow, Prefect) vs **asset-centric** (Dagster). Asset-centric thinking — "what data does this job produce?" rather than "what does this job do?" — makes lineage and freshness checks natural. If you're starting fresh and the team can learn, Dagster is worth the investment. --- ## DAG Structure Principles ### One task, one responsibility Each task should do exactly one thing that can be individually retried, skipped, or monitored. Avoid "god tasks" that extract, transform, and load in one function — a failure anywhere forces the whole thing to rerun. ```python # Airflow — split extract/transform/load into separate operators e