gguf-quantization

Featured

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# GGUF - Quantization Format for llama.cpp The GGUF (GPT-Generated Unified Format) is the standard file format for llama.cpp, enabling efficient inference on CPUs, Apple Silicon, and GPUs with flexible quantization options. ## When to use GGUF **Use GGUF when:** - Deploying on consumer hardware (laptops, desktops) - Running on Apple Silicon (M1/M2/M3) with Metal acceleration - Need CPU inference without GPU requirements - Want flexible quantization (Q2_K to Q8_0) - Using local AI tools (LM Studio, Ollama, text-generation-webui) **Key advantages:** - **Universal hardware**: CPU, Apple Silicon, NVIDIA, AMD support - **No Python runtime**: Pure C/C++ inference - **Flexible quantization**: 2-8 bit with various methods (K-quants) - **Ecosystem support**: LM Studio, Ollama, koboldcpp, and more - **imatrix**: Importance matrix for better low-bit quality **Use alternatives instead:** - **AWQ/GPTQ**: Maximum accuracy with calibration on NVIDIA GPUs - **HQQ**: Fast calibration-free quantization for HuggingFace - **bitsandbytes**: Simple integration with transformers library - **TensorRT-LLM**: Production NVIDIA deployment with maximum speed ## Quick start ### Installation ```bash # Clone llama.cpp git clone https://github.com/ggml-org/llama.cpp cd llama.cpp # Build (CPU) make # Build with CUDA (NVIDIA) make GGML_CUDA=1 # Build with Metal (Apple Silicon) make GGML_METAL=1 # Install Python bindings (optional) pip install llama-cpp-python ``` ### Convert model to GGUF ```bas...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category