← ClaudeAtlas

rag-poisoninglisted

Expert methodology for attacking Retrieval-Augmented Generation (RAG) pipelines through document poisoning, index corruption, adversarial queries, and retrieval manipulation. For authorized red team assessments of AI search and Q&A systems.
sunilgentyala/OmniRed · ★ 0 · AI & Automation · score 63
Install: claude install-skill sunilgentyala/OmniRed
# RAG Poisoning ## Attack Surface Retrieval-Augmented Generation (RAG) pipelines retrieve documents from a vector database and inject them into the LLM's context before generation. This creates two exploitable surfaces: 1. **The index** — documents stored in the vector database (write access or upload path = direct poisoning) 2. **The retrieval mechanism** — the embedding and similarity search that determines what gets retrieved ## References ``` references/ vector-db-targets.md Common vector DBs (Chroma, Pinecone, Weaviate, Qdrant) and their APIs ``` ## Attack Variants | Attack | Target | Required access | |---|---|---| | Document poisoning | Index content | Write to index / document upload | | Query manipulation | Retrieval ranking | User input | | Adversarial embedding | Vector similarity | Index write or query | | Cross-encoder exploitation | Re-ranking stage | User input | | Chunk boundary injection | Document chunking | Index write | ## Methodology ### Phase 1 — Pipeline reconnaissance Map the RAG pipeline components: ``` 1. Identify the embedding model (OpenAI ada-002, BGE, E5, etc.) 2. Identify the vector database (Chroma, Pinecone, Weaviate, Qdrant, pgvector) 3. Determine chunk size and overlap settings 4. Identify if there is a re-ranking stage (cross-encoder) 5. Find the document upload/ingestion path 6. Determine if metadata filtering is applied at retrieval time ``` **Fingerprinting queries:** ``` What is your knowledge base? Where does your info