rag-pipeline-designlisted
Install: claude install-skill RBraga01/builder-ai
# RAG Pipeline Design
## The Law
```
YOU CANNOT DESIGN A RAG PIPELINE WITHOUT FIRST AUDITING THE DATA AND THE QUERIES.
"Standard chunking" fails on structured documents.
"The embedding model worked for someone else" is not a validation.
A data audit + query audit + stage-by-stage decision log IS a design.
```
## When to Use
Trigger when:
- Starting a new RAG feature from scratch
- Debugging retrieval quality issues (hallucination, missed context, low recall)
- Upgrading an embedding model or retrieval strategy
- Adding or removing a reranker
- Changing chunk size, overlap, or ingestion strategy
## The Process
A RAG pipeline has five stages. Design each explicitly — do not accept defaults.
### Step 0 — Audit Before Designing
Answer both audits before making any pipeline decision:
**Data Audit:**
- Source format: PDF / HTML / JSON / code / mixed?
- Average document length (tokens)?
- Is document structure (headings, sections, tables) load-bearing for meaning?
- How frequently does content change?
- Any formatting that will survive chunking (tables, numbered lists, code blocks)?
**Query Audit:**
- Dominant query type: lookup / comparison / synthesis / aggregation?
- Expected answer length: short fact / paragraph / multi-section?
- Does the user need source attribution?
- Is multi-hop reasoning required (answer requires combining facts across documents)?
Every design decision below flows from these two audits.
### Step 1 — Chunking
| Document Type | Strategy | Chunk