ai-data-engineering

Solid

Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).

AI & Automation 367 stars 55 forks Updated 5 months ago MIT

Install

View on GitHub

Quality Score: 80/100

Stars 20%
85
Recency 20%
50
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# AI Data Engineering ## Purpose Build data infrastructure for AI/ML systems including RAG pipelines, feature stores, and embedding generation. Provides architecture patterns, orchestration workflows, and evaluation metrics for production AI applications. ## When to Use **Use this skill when:** - Building RAG (Retrieval-Augmented Generation) pipelines - Implementing semantic search or vector databases - Setting up ML feature stores for real-time serving - Creating embedding generation pipelines - Evaluating RAG quality with RAGAS metrics - Orchestrating data workflows for AI systems - Integrating with frontend skills (ai-chat, search-filter) **Skip this skill if:** - Building traditional CRUD applications (use databases-relational) - Simple key-value storage (use databases-nosql) - No AI/ML components in the application ## RAG Pipeline Architecture RAG pipelines have 5 distinct stages. Understanding this architecture is critical for production implementations. ``` ┌─────────────────────────────────────────────────────────────┐ │ RAG Pipeline (5 Stages) │ ├─────────────────────────────────────────────────────────────┤ │ │ │ 1. INGESTION → Load documents (PDF, DOCX, Markdown) │ │ 2. INDEXING → Chunk (512 tokens) + Embed + Store │ │ 3. RETRIEVAL → Query embedding + Vector search + Filters │ │ 4. GENERATION → Context injection + LLM streaming │...

Details

Author
ancoleman
Repository
ancoleman/ai-design-components
Created
6 months ago
Last Updated
5 months ago
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

ml-ops-engineer

Expert MLOps engineering covering model deployment, ML pipelines, model monitoring, feature stores, and infrastructure automation. Use when deploying models to production, building training pipelines, setting up drift detection, configuring feature stores, or automating ML CI/CD workflows.

183 Updated 3 days ago
borghei
AI & Automation Featured

airflow-dag-analyzer

Analyzes, validates, and optimizes Apache Airflow DAGs for reliability, performance, and best practices adherence.

809 Updated today
a5c-ai
Web & Frontend Solid

ai-tools

Provides guidance for integrating AI tools and components into the Family Tree App, including knowledge graphs, computer vision, and natural language processing. Invoke when working on AI-related features or when needing AI integration advice.

620 Updated today
bage2014
AI & Automation Featured

chunking-strategy

Provides chunking strategies for RAG systems. Generates chunk size recommendations (256-1024 tokens), overlap percentages (10-20%), and semantic boundary detection methods. Validates semantic coherence and evaluates retrieval precision/recall metrics. Use when building retrieval-augmented generation systems, vector databases, or processing large documents.

253 Updated 3 days ago
giuseppe-trisciuoglio
AI & Automation Featured

aws-agentic-ai

AWS Bedrock AgentCore comprehensive expert for deploying and managing all AgentCore services. Use when working with Gateway, Runtime, Memory, Identity, or any AgentCore component. Covers MCP target deployment, credential management, schema optimization, runtime configuration, memory management, and identity services.

290 Updated 1 months ago
zxkane