data-lineage-mapper

Featured

Extracts and maps data lineage from various sources including SQL, dbt, Airflow, and Spark, generating comprehensive lineage graphs for impact analysis.

Data & Documents 809 stars 52 forks Updated today MIT

Install

View on GitHub

Quality Score: 98/100

Stars 20%
97
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Data Lineage Mapper Extracts and maps data lineage from various sources to provide comprehensive data flow visibility. ## Overview This skill parses and extracts data lineage information from SQL queries, dbt projects, Airflow DAGs, and Spark jobs. It generates comprehensive lineage graphs showing data flow from source to destination, enabling impact analysis and data governance. ## Capabilities - **SQL parsing for lineage extraction** - Parse SELECT, INSERT, MERGE statements - **dbt lineage integration** - Extract lineage from manifest.json - **Airflow task lineage mapping** - Map data flows across DAG tasks - **Spark job lineage extraction** - Parse Spark SQL and DataFrame operations - **Cross-system lineage connection** - Connect lineage across different tools - **Column-level lineage tracing** - Track individual column transformations - **Impact analysis** - Downstream/upstream impact assessment - **Lineage graph generation** - Visual and machine-readable lineage - **Integration with data catalogs** - Export to DataHub, Amundsen, Alation ## Input Schema ```json { "sources": { "type": "array", "required": true, "items": { "type": { "type": "string", "enum": ["sql", "dbt", "airflow", "spark", "file"] }, "content": { "type": "string|object", "description": "SQL string, file path, or manifest object" }, "metadata": { "type": "object", "properties": { "database": "str...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

airflow-dag-analyzer

Analyzes, validates, and optimizes Apache Airflow DAGs for reliability, performance, and best practices adherence.

809 Updated today
a5c-ai
AI & Automation Solid

ai-data-engineering

Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).

367 Updated 5 months ago
ancoleman
Data & Documents Featured

sf-datacloud-connect

Salesforce Data Cloud Connect phase. TRIGGER when: user manages Data Cloud connections, connectors, connector metadata, tests a connection, browses source objects or databases, or sets up a new source system. DO NOT TRIGGER when: the task is about data streams or DLOs (use sf-datacloud-prepare), DMOs or identity resolution (use sf-datacloud-harmonize), retrieval/search (use sf-datacloud-retrieve), or STDM telemetry (use sf-ai-agentforce-observability).

417 Updated 4 weeks ago
Jaganpro
Data & Documents Solid

messydata

Use this skill when working with MessyData — a synthetic dirty data generator. Covers writing and validating YAML configs, using the CLI, and the Python API. Trigger on: "generate synthetic data", "messydata config", "create a dataset schema", "add anomalies", "fake data", "dirty data".

34 Updated 2 months ago
sodadata
Web & Frontend Solid

ai-tools

Provides guidance for integrating AI tools and components into the Family Tree App, including knowledge graphs, computer vision, and natural language processing. Invoke when working on AI-related features or when needing AI integration advice.

620 Updated today
bage2014