creating-bauplan-pipelineslisted
Install: claude install-skill aiskillstore/marketplace
# Creating a New Bauplan Data Pipeline
This skill guides you through creating a new bauplan data pipeline project from scratch, including the project configuration and SQL/Python transformation models.
## CRITICAL: Branch Safety
> **NEVER run pipelines on `main` branch.** Always use a development branch.
Branch naming convention: `<username>.<branch_name>` (e.g., `john.feature-pipeline`). Get your username with `bauplan info`. See [Workflow Checklist](#workflow-checklist) for exact commands.
## Prerequisites
Before creating the pipeline, verify that:
1. **You have a development branch** (not `main`)
2. Source tables exist in the bauplan lakehouse (the default namespace is `bauplan`)
3. You understand the schema of the source tables
## Pipeline as a DAG
A bauplan pipeline is a DAG of functions (models). Key rules:
1. **Models**: SQL or Python functions that transform data
2. **Source Tables**: Existing lakehouse tables - entry points to your DAG
3. **Inputs**: Each model can take **multiple tables** via `bauplan.Model()` references
4. **Outputs**: Each model produces **exactly one table**:
- SQL: output name = filename (`trips.sql` → `trips`)
- Python: output name = function name (`def clean_trips()` → `clean_trips`)
5. **Topology**: Implicitly defined by input references - bauplan determines execution order
**Expectations**: Data quality functions that take tables as input and return a **boolean**.
### Example DAG
```
[lakehouse: taxi_fhvhv] ──→ [trips.sql]