developing-incremental-modelslisted

Develops and troubleshoots dbt incremental models. Use when working with incremental materialization for: (1) Creating new incremental models (choosing strategy, unique_key, partition) (2) Task mentions "incremental", "append", "merge", "upsert", or "late arriving data" (3) Troubleshooting incremental failures (merge errors, partition pruning, schema drift) (4) Optimizing incremental performance or deciding table vs incremental Guides through strategy selection, handles common incremental gotchas.
AltimateAI/data-engineering-skills · ★ 102 · AI & Automation · score 86

Install: claude install-skill AltimateAI/data-engineering-skills

# dbt Incremental Model Development **Choose the right strategy. Design the unique_key carefully. Handle edge cases.** ## When to Use Incremental | Scenario | Recommendation | |----------|----------------| | Source data < 10M rows | Use `table` (simpler, full refresh is fast) | | Source data > 10M rows | Consider `incremental` | | Source data updated in place | Use `incremental` with `merge` strategy | | Append-only source (logs, events) | Use `incremental` with `append` strategy | | Partitioned warehouse data | Use `insert_overwrite` if supported | **Default to `table` unless you have a clear performance reason for incremental.** ## Critical Rules 1. **ALWAYS test with `--full-refresh` first** before relying on incremental logic 2. **ALWAYS verify unique_key is truly unique** in both source and target 3. **If merge fails 3+ times**, check unique_key for duplicates 4. **Run full refresh periodically** to prevent data drift ## Workflow ### 1. Confirm Incremental is Needed ```bash # Check source table size dbt show --inline "select count(*) from {{ source('schema', 'table') }}" ``` If count < 10 million, consider using `table` instead. Incremental adds complexity. ### 2. Understand the Source Data Pattern Before choosing a strategy, answer: - **Is data append-only?** (new rows added, never updated) - **Are existing rows updated?** (need merge/upsert) - **Is there a reliable timestamp?** (for filtering new data) - **What's the unique identifier?** (for merge matching