knowledge-distillation

Featured

Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.

AI & Automation 27,705 stars 2858 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Knowledge Distillation: Compressing LLMs ## When to Use This Skill Use Knowledge Distillation when you need to: - **Compress models** from 70B → 7B while retaining 90%+ performance - **Transfer capabilities** from proprietary models (GPT-4) to open-source (LLaMA, Mistral) - **Reduce inference costs** by deploying smaller student models - **Create specialized models** by distilling domain-specific knowledge - **Improve small models** using synthetic data from large teachers **Key Techniques**: Temperature scaling, soft targets, reverse KLD (MiniLLM), logit distillation, response distillation **Papers**: Hinton et al. 2015 (arXiv 1503.02531), MiniLLM (arXiv 2306.08543), KD Survey (arXiv 2402.13116) ## Installation ```bash # Standard transformers pip install transformers datasets accelerate # For training pip install torch deepspeed wandb # Optional: MiniLLM implementation git clone https://github.com/microsoft/LMOps cd LMOps/minillm pip install -e . ``` ## Quick Start ### Basic Knowledge Distillation ```python import torch import torch.nn.functional as F from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments # 1. Load teacher (large) and student (small) models teacher = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-70b-hf", # Large teacher torch_dtype=torch.float16, device_map="auto" ) student = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b-hf", # Small student torch_dtype=torch.fl...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

knowledge-distillation

Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.

9,182 Updated 1 months ago
Orchestra-Research
AI & Automation Featured

bdistill-knowledge-extraction

Extract structured domain knowledge from AI models in-session or from local open-source models via Ollama. No API key needed.

39,350 Updated today
sickn33
AI & Automation Solid

model-pruning

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

9,182 Updated 1 months ago
Orchestra-Research
AI & Automation Featured

model-pruning

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

27,705 Updated today
davila7
AI & Automation Solid

implementing-llms-litgpt

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

9,182 Updated 1 months ago
Orchestra-Research