long-context

Featured

Extend context windows of transformer models using RoPE, YaRN, ALiBi, and position interpolation techniques. Use when processing long documents (32k-128k+ tokens), extending pre-trained models beyond original context limits, or implementing efficient positional encodings. Covers rotary embeddings, attention biases, interpolation methods, and extrapolation strategies for LLMs.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Long Context: Extending Transformer Context Windows ## When to Use This Skill Use Long Context techniques when you need to: - **Process long documents** (32k, 64k, 128k+ tokens) with transformer models - **Extend context windows** of pre-trained models (LLaMA, Mistral, etc.) - **Implement efficient positional encodings** (RoPE, ALiBi) - **Train models** with length extrapolation capabilities - **Deploy models** that handle variable-length inputs efficiently - **Fine-tune** existing models for longer contexts with minimal compute **Key Techniques**: RoPE (Rotary Position Embeddings), YaRN, ALiBi (Attention with Linear Biases), Position Interpolation **Papers**: RoFormer (arXiv 2104.09864), YaRN (arXiv 2309.00071), ALiBi (arXiv 2108.12409), Position Interpolation (arXiv 2306.15595) ## Installation ```bash # HuggingFace Transformers (includes RoPE, YaRN support) pip install transformers torch # For custom implementations pip install einops # Tensor operations pip install rotary-embedding-torch # Standalone RoPE # Optional: FlashAttention for efficiency pip install flash-attn --no-build-isolation ``` ## Quick Start ### RoPE (Rotary Position Embeddings) ```python import torch import torch.nn as nn class RotaryEmbedding(nn.Module): """Rotary Position Embeddings (RoPE).""" def __init__(self, dim, max_seq_len=8192, base=10000): super().__init__() # Compute inverse frequencies inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() /...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed