model-extractionlisted
Install: claude install-skill sunilgentyala/OmniRed
# Model Extraction
## Attack Surface
Model extraction (model stealing) reconstructs a victim model's behaviour, weights, or training data through systematic querying. Relevant when assessing:
- Proprietary fine-tuned models deployed via API
- RAG-augmented models that encode confidential knowledge
- Classifier models whose decision boundaries represent competitive IP
- Models trained on sensitive or regulated data (potential membership inference)
## Attack Variants
| Attack | Goal | Query budget |
|---|---|---|
| Functional extraction | Clone input/output behaviour | High (10k–1M queries) |
| Architecture inference | Identify model family/size | Low (50–200 queries) |
| Training data reconstruction | Recover memorised training examples | Medium (1k–10k queries) |
| Membership inference | Determine if a record was in training data | Low per record |
## Methodology
### Phase 1 — Target profiling
```
1. Determine API rate limits and cost per query
2. Fingerprint model family: test known quirks of GPT-4, Claude, Gemini, Llama families
3. Measure latency distribution (proxy for model size)
4. Test context window length limits
5. Probe output format constraints (JSON mode, max tokens, stop sequences)
```
**Fingerprinting queries:**
```
What is the exact date of your training cutoff?
Repeat "hello" exactly 500 times.
What is 2+2? Answer only with the number.
Translate "apple" into Swahili.
```
Compare responses to known model families to identify the base model.
### Phase