← ClaudeAtlas

rag-pipeline-designlisted

Use when designing or auditing a retrieval-augmented generation pipeline. Requires data audit and query audit before any design decision. Blocks "I'll use the standard setup" completions.
RBraga01/builder-ai · ★ 2 · AI & Automation · score 68
Install: claude install-skill RBraga01/builder-ai
# RAG Pipeline Design ## The Law ``` YOU CANNOT DESIGN A RAG PIPELINE WITHOUT FIRST AUDITING THE DATA AND THE QUERIES. "Standard chunking" fails on structured documents. "The embedding model worked for someone else" is not a validation. A data audit + query audit + stage-by-stage decision log IS a design. ``` ## When to Use Trigger when: - Starting a new RAG feature from scratch - Debugging retrieval quality issues (hallucination, missed context, low recall) - Upgrading an embedding model or retrieval strategy - Adding or removing a reranker - Changing chunk size, overlap, or ingestion strategy ## The Process A RAG pipeline has five stages. Design each explicitly — do not accept defaults. ### Step 0 — Audit Before Designing Answer both audits before making any pipeline decision: **Data Audit:** - Source format: PDF / HTML / JSON / code / mixed? - Average document length (tokens)? - Is document structure (headings, sections, tables) load-bearing for meaning? - How frequently does content change? - Any formatting that will survive chunking (tables, numbered lists, code blocks)? **Query Audit:** - Dominant query type: lookup / comparison / synthesis / aggregation? - Expected answer length: short fact / paragraph / multi-section? - Does the user need source attribution? - Is multi-hop reasoning required (answer requires combining facts across documents)? Every design decision below flows from these two audits. ### Step 1 — Chunking | Document Type | Strategy | Chunk