embedding-attackslisted
Install: claude install-skill sunilgentyala/OmniRed
# Embedding Attacks
## Attack Surface
Embedding models convert text to dense vectors for semantic search, similarity comparison, and classification. Attacks against embedding models affect:
- RAG retrieval ranking (what gets retrieved)
- Embedding-based input filters (safety classifiers, topic filters)
- Semantic deduplication (bypass dedup to inject duplicate malicious content)
- User identity / session binding based on semantic similarity
## Attack Variants
### 1. Nearest-Neighbour Poisoning
Craft text that embeds close to a target document's vector without semantic similarity to a human reader.
```python
# Adversarial suffix method (GCG-style)
# Append a learned suffix to arbitrary text to move its embedding toward the target
import torch
from transformers import AutoTokenizer, AutoModel
def find_adversarial_suffix(model, tokenizer, source_text, target_embedding, steps=500):
"""Find a suffix that moves source_text's embedding toward target_embedding."""
suffix = torch.randn(20, model.config.hidden_size, requires_grad=True)
optimizer = torch.optim.Adam([suffix], lr=0.01)
for step in range(steps):
source_emb = embed(model, tokenizer, source_text + decode(suffix))
loss = cosine_distance(source_emb, target_embedding)
optimizer.zero_grad()
loss.backward()
optimizer.step()
return decode(suffix)
```
**Use cases:**
- Make a malicious document retrieve instead of a legitimate one
- Cause benign queries to retriev