← ClaudeAtlas

model-extractionlisted

Query-based model extraction and intellectual property theft methodology for authorized assessments. Covers functional extraction, architecture inference, and training data reconstruction.
sunilgentyala/OmniRed · ★ 0 · AI & Automation · score 63
Install: claude install-skill sunilgentyala/OmniRed
# Model Extraction ## Attack Surface Model extraction (model stealing) reconstructs a victim model's behaviour, weights, or training data through systematic querying. Relevant when assessing: - Proprietary fine-tuned models deployed via API - RAG-augmented models that encode confidential knowledge - Classifier models whose decision boundaries represent competitive IP - Models trained on sensitive or regulated data (potential membership inference) ## Attack Variants | Attack | Goal | Query budget | |---|---|---| | Functional extraction | Clone input/output behaviour | High (10k–1M queries) | | Architecture inference | Identify model family/size | Low (50–200 queries) | | Training data reconstruction | Recover memorised training examples | Medium (1k–10k queries) | | Membership inference | Determine if a record was in training data | Low per record | ## Methodology ### Phase 1 — Target profiling ``` 1. Determine API rate limits and cost per query 2. Fingerprint model family: test known quirks of GPT-4, Claude, Gemini, Llama families 3. Measure latency distribution (proxy for model size) 4. Test context window length limits 5. Probe output format constraints (JSON mode, max tokens, stop sequences) ``` **Fingerprinting queries:** ``` What is the exact date of your training cutoff? Repeat "hello" exactly 500 times. What is 2+2? Answer only with the number. Translate "apple" into Swahili. ``` Compare responses to known model families to identify the base model. ### Phase