optimizing-attention-flash

Featured

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Flash Attention - Fast Memory-Efficient Attention ## Quick start Flash Attention provides 2-4x speedup and 10-20x memory reduction for transformer attention through IO-aware tiling and recomputation. **PyTorch native (easiest, PyTorch 2.2+)**: ```python import torch import torch.nn.functional as F q = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16) # [batch, heads, seq, dim] k = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16) v = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16) # Automatically uses Flash Attention if available out = F.scaled_dot_product_attention(q, k, v) ``` **flash-attn library (more features)**: ```bash pip install flash-attn --no-build-isolation ``` ```python from flash_attn import flash_attn_func # q, k, v: [batch, seqlen, nheads, headdim] out = flash_attn_func(q, k, v, dropout_p=0.0, causal=True) ``` ## Common workflows ### Workflow 1: Enable in existing PyTorch model Copy this checklist: ``` Flash Attention Integration: - [ ] Step 1: Check PyTorch version (≥2.2) - [ ] Step 2: Enable Flash Attention backend - [ ] Step 3: Verify speedup with profiling - [ ] Step 4: Test accuracy matches baseline ``` **Step 1: Check PyTorch version** ```bash python -c "import torch; print(torch.__version__)" # Should be ≥2.2.0 ``` If <2.2, upgrade: ```bash pip install --upgrade torch ``` **Step 2: Enable Flash Attention backend** Replace standard attention: ```python # Before (standard attention) attn_weights...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category