hqq-quantization

Solid

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

AI & Automation 9,182 stars 697 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%

100

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# HQQ - Half-Quadratic Quantization Fast, calibration-free weight quantization supporting 8/4/3/2/1-bit precision with multiple optimized backends. ## When to use HQQ **Use HQQ when:** - Quantizing models without calibration data (no dataset needed) - Need fast quantization (minutes vs hours for GPTQ/AWQ) - Deploying with vLLM or HuggingFace Transformers - Fine-tuning quantized models with LoRA/PEFT - Experimenting with extreme quantization (2-bit, 1-bit) **Key advantages:** - **No calibration**: Quantize any model instantly without sample data - **Multiple backends**: PyTorch, ATEN, TorchAO, Marlin, BitBlas for optimized inference - **Flexible precision**: 8/4/3/2/1-bit with configurable group sizes - **Framework integration**: Native HuggingFace and vLLM support - **PEFT compatible**: Fine-tune quantized models with LoRA **Use alternatives instead:** - **AWQ**: Need calibration-based accuracy, production serving - **GPTQ**: Maximum accuracy with calibration data available - **bitsandbytes**: Simple 8-bit/4-bit without custom backends - **llama.cpp/GGUF**: CPU inference, Apple Silicon deployment ## Quick start ### Installation ```bash pip install hqq # With specific backend pip install hqq[torch] # PyTorch backend pip install hqq[torchao] # TorchAO int4 backend pip install hqq[bitblas] # BitBlas backend pip install hqq[marlin] # Marlin backend ``` ### Basic quantization ```python from hqq.core.quantize import BaseQuantizeConfig, HQQLinear import tor...

Details

Author: Orchestra-Research
Repository: Orchestra-Research/AI-Research-SKILLs
Created: 7 months ago
Last Updated: 1 months ago
Language: TeX
License: MIT

Integrates with

Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured