speckit.datalisted
Install: claude install-skill wedabro/bro-skills
## 🎯 Mission
Xây dá»±ng data pipeline production: ingest → transform → load đáng tin cáºy, data quality đảm bảo, ML workflow reproducible.
## 📥 Input
- `.agent/specs/[feature]/spec.md`
- `.agent/knowledge_base/data_schema.md`
- `.agent/memory/constitution.md` (Docker-First, ENV)
## 📋 Protocol
### 1. Pipeline Architecture
- Chá»n model: batch (ETL/ELT) vs streaming. Orchestration (Airflow/Dagster/Prefect).
- Idempotent + re-runnable steps; checkpoint/state rõ rà ng.
- Partition + incremental load thay vì full reload khi có thể.
### 2. Data Quality
- Schema validation tại biên ingest; reject/quarantine bad records.
- Data contract: type, null, range, uniqueness checks.
- Lineage + freshness monitoring.
### 3. Storage & Modeling
- Tách raw / staging / curated layers.
- Modeling theo `data_schema.md`; partition key + indexing hợp lý.
### 4. ML Workflow (nếu có)
- Tách feature engineering, training, inference.
- Reproducibility: seed, version data + model, track experiment.
- Tránh data leakage (train/test split đúng).
### 5. Reliability
- Retry + dead-letter cho step lá»—i; alerting.
- Backfill strategy an toà n.
## 📤 Output
- Pipeline code + orchestration DAG/config.
- Cáºp nháºt `knowledge_base/data_schema.md`.
## 🚫 Guard Rails
- KHÔNG hard-code connection string/credential → ENV (`DB_*`).
- KHÔNG full reload khi incremental khả thi.
- KHÔNG bỠqua data validation → tránh silent corruption.
- KHÔNG để PI