← ClaudeAtlas

embedding-attackslisted

Adversarial embedding manipulation techniques for attacking vector search, semantic similarity systems, and embedding-based security controls. Covers nearest-neighbour poisoning, semantic collision, and bypass of embedding-based filters.
sunilgentyala/OmniRed · ★ 0 · AI & Automation · score 63
Install: claude install-skill sunilgentyala/OmniRed
# Embedding Attacks ## Attack Surface Embedding models convert text to dense vectors for semantic search, similarity comparison, and classification. Attacks against embedding models affect: - RAG retrieval ranking (what gets retrieved) - Embedding-based input filters (safety classifiers, topic filters) - Semantic deduplication (bypass dedup to inject duplicate malicious content) - User identity / session binding based on semantic similarity ## Attack Variants ### 1. Nearest-Neighbour Poisoning Craft text that embeds close to a target document's vector without semantic similarity to a human reader. ```python # Adversarial suffix method (GCG-style) # Append a learned suffix to arbitrary text to move its embedding toward the target import torch from transformers import AutoTokenizer, AutoModel def find_adversarial_suffix(model, tokenizer, source_text, target_embedding, steps=500): """Find a suffix that moves source_text's embedding toward target_embedding.""" suffix = torch.randn(20, model.config.hidden_size, requires_grad=True) optimizer = torch.optim.Adam([suffix], lr=0.01) for step in range(steps): source_emb = embed(model, tokenizer, source_text + decode(suffix)) loss = cosine_distance(source_emb, target_embedding) optimizer.zero_grad() loss.backward() optimizer.step() return decode(suffix) ``` **Use cases:** - Make a malicious document retrieve instead of a legitimate one - Cause benign queries to retriev