creating-bauplan-pipelineslisted

Creates bauplan data pipeline projects with SQL and Python models. Use when starting a new pipeline, defining DAG transformations, writing models, or setting up bauplan project structure from scratch.
aiskillstore/marketplace · ★ 329 · Data & Documents · score 82

Install: claude install-skill aiskillstore/marketplace

# Creating a New Bauplan Data Pipeline This skill guides you through creating a new bauplan data pipeline project from scratch, including the project configuration and SQL/Python transformation models. ## CRITICAL: Branch Safety > **NEVER run pipelines on `main` branch.** Always use a development branch. Branch naming convention: `<username>.<branch_name>` (e.g., `john.feature-pipeline`). Get your username with `bauplan info`. See [Workflow Checklist](#workflow-checklist) for exact commands. ## Prerequisites Before creating the pipeline, verify that: 1. **You have a development branch** (not `main`) 2. Source tables exist in the bauplan lakehouse (the default namespace is `bauplan`) 3. You understand the schema of the source tables ## Pipeline as a DAG A bauplan pipeline is a DAG of functions (models). Key rules: 1. **Models**: SQL or Python functions that transform data 2. **Source Tables**: Existing lakehouse tables - entry points to your DAG 3. **Inputs**: Each model can take **multiple tables** via `bauplan.Model()` references 4. **Outputs**: Each model produces **exactly one table**: - SQL: output name = filename (`trips.sql` → `trips`) - Python: output name = function name (`def clean_trips()` → `clean_trips`) 5. **Topology**: Implicitly defined by input references - bauplan determines execution order **Expectations**: Data quality functions that take tables as input and return a **boolean**. ### Example DAG ``` [lakehouse: taxi_fhvhv] ──→ [trips.sql]