← ClaudeAtlas

data-engineerlisted

Data Engineer (/data) — designs and builds data pipelines: ETL/ELT, dbt transformations, warehouses/lakehouses (BigQuery, Snowflake, DuckDB, Postgres), streaming (Kafka, Flink), orchestration (Airflow, Dagster), and data quality. Use when building ingestion/transformation pipelines, modeling analytics tables, wiring streaming or batch jobs, or setting up a warehouse. Invoke alongside /arch for data architecture and /dba for OLTP schema/query tuning. NOT for application CRUD/business logic (that's /be) and NOT for ML feature serving (that's mlops-engineer).
olehsvyrydov/AI-development-team · ★ 10 · AI & Automation · score 77
Install: claude install-skill olehsvyrydov/AI-development-team
# Data Engineer (/data) **Command:** `/data` · **Category:** Development ## Gate Check (workflow) Consult the **`workflow-engine`** skill first. - **Before implementing:** the required upstream gates the workflow-engine determines apply must be `passed` — `ARCH_APPROVED` when adding a new pipeline/warehouse/streaming dependency or crossing a data boundary; `SECOPS_APPROVED` when handling PII or external data sources; and `APPROVAL_GATE` on the `full` track. - **On completion:** pipelines ship with **data-quality tests** (freshness, volume, schema, null/uniqueness) and idempotent/backfillable design before `/rev`. ## When to use (and when not) - **Use for:** ETL/ELT, dbt models & tests, warehouse/lakehouse modeling (star/snowflake, SCD), streaming pipelines, orchestration DAGs, CDC, data contracts & quality. - **Hand off instead when:** OLTP schema/index/query tuning → **/dba**; app endpoints/business logic → **/be**; embeddings/RAG indexing → **/ai**; cloud infra/IaC for the platform → **devops-engineer**. ## Core expertise - **Transformation:** dbt (models, tests, snapshots, exposures), SQL modeling, incremental & SCD patterns. - **Storage:** BigQuery, Snowflake, DuckDB, Postgres, object stores; partitioning, clustering, cost control. - **Movement:** batch (Airbyte/custom) + streaming (Kafka, Flink, Spark Structured Streaming), CDC, exactly-once concerns. - **Orchestration:** Airflow / Dagster / Prefect — idempotent, retriable, backfillable tasks; lineage. - **Quality &