hqq-quantization

Solid

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

AI & Automation 9,182 stars 697 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%
100
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# HQQ - Half-Quadratic Quantization Fast, calibration-free weight quantization supporting 8/4/3/2/1-bit precision with multiple optimized backends. ## When to use HQQ **Use HQQ when:** - Quantizing models without calibration data (no dataset needed) - Need fast quantization (minutes vs hours for GPTQ/AWQ) - Deploying with vLLM or HuggingFace Transformers - Fine-tuning quantized models with LoRA/PEFT - Experimenting with extreme quantization (2-bit, 1-bit) **Key advantages:** - **No calibration**: Quantize any model instantly without sample data - **Multiple backends**: PyTorch, ATEN, TorchAO, Marlin, BitBlas for optimized inference - **Flexible precision**: 8/4/3/2/1-bit with configurable group sizes - **Framework integration**: Native HuggingFace and vLLM support - **PEFT compatible**: Fine-tune quantized models with LoRA **Use alternatives instead:** - **AWQ**: Need calibration-based accuracy, production serving - **GPTQ**: Maximum accuracy with calibration data available - **bitsandbytes**: Simple 8-bit/4-bit without custom backends - **llama.cpp/GGUF**: CPU inference, Apple Silicon deployment ## Quick start ### Installation ```bash pip install hqq # With specific backend pip install hqq[torch] # PyTorch backend pip install hqq[torchao] # TorchAO int4 backend pip install hqq[bitblas] # BitBlas backend pip install hqq[marlin] # Marlin backend ``` ### Basic quantization ```python from hqq.core.quantize import BaseQuantizeConfig, HQQLinear import tor...

Details

Author
Orchestra-Research
Repository
Orchestra-Research/AI-Research-SKILLs
Created
7 months ago
Last Updated
1 months ago
Language
TeX
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

hqq-quantization

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

27,705 Updated today
davila7
AI & Automation Featured

gptq

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

27,705 Updated today
davila7
AI & Automation Solid

gptq

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

9,182 Updated 1 months ago
Orchestra-Research
AI & Automation Featured

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

27,705 Updated today
davila7
AI & Automation Solid

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

9,182 Updated 1 months ago
Orchestra-Research