etl-pipeline-builder

Solid

Build and manage ETL pipelines for data migration with transformation, CDC, and monitoring

Data & Documents 814 stars 53 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
97
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# ETL Pipeline Builder Skill Builds and manages ETL (Extract, Transform, Load) pipelines for data migration, supporting incremental loads, CDC, and comprehensive monitoring. ## Purpose Enable data pipeline creation for: - Source-to-target mapping - Transformation definition - Incremental load setup - CDC configuration - Pipeline monitoring ## Capabilities ### 1. Source-to-Target Mapping - Define column mappings - Handle schema differences - Configure data type conversions - Manage derived columns ### 2. Transformation Definition - Data type transformations - Value mappings - Aggregations - Lookups and enrichments ### 3. Incremental Load Setup - Define watermarks - Configure incremental columns - Handle deletes - Manage merge logic ### 4. CDC Configuration - Log-based CDC - Trigger-based CDC - Timestamp-based CDC - Full load comparison ### 5. Error Handling - Define retry policies - Configure dead letter queues - Handle data quality issues - Implement alerting ### 6. Pipeline Monitoring - Track pipeline metrics - Monitor data volumes - Alert on failures - Generate SLA reports ## Tool Integrations | Tool | Type | Integration Method | |------|------|-------------------| | Apache Airflow | Orchestration | Python | | dbt | Transformation | CLI | | Airbyte | Data integration | API | | Fivetran | SaaS ETL | API | | AWS DMS | Cloud migration | CLI | | Debezium | CDC | Config | ## Output Schema ```json { "pipelineId": "string", "timestamp": "ISO8601", "pipeline": {...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Related Skills