transformer-lens-interpretability

Solid

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%
100
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# TransformerLens: Mechanistic Interpretability for Transformers TransformerLens is the de facto standard library for mechanistic interpretability research on GPT-style language models. Created by Neel Nanda and maintained by Bryce Meyer, it provides clean interfaces to inspect and manipulate model internals via HookPoints on every activation. **GitHub**: [TransformerLensOrg/TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) (2,900+ stars) ## When to Use TransformerLens **Use TransformerLens when you need to:** - Reverse-engineer algorithms learned during training - Perform activation patching / causal tracing experiments - Study attention patterns and information flow - Analyze circuits (e.g., induction heads, IOI circuit) - Cache and inspect intermediate activations - Apply direct logit attribution **Consider alternatives when:** - You need to work with non-transformer architectures → Use **nnsight** or **pyvene** - You want to train/analyze Sparse Autoencoders → Use **SAELens** - You need remote execution on massive models → Use **nnsight** with NDIF - You want higher-level causal intervention abstractions → Use **pyvene** ## Installation ```bash pip install transformer-lens ``` For development version: ```bash pip install git+https://github.com/TransformerLensOrg/TransformerLens ``` ## Core Concepts ### HookedTransformer The main class that wraps transformer models with HookPoints on every activation: ```python from transformer_lens import...

Details

Author
Orchestra-Research
Repository
Orchestra-Research/AI-Research-SKILLs
Created
7 months ago
Last Updated
1 months ago
Language
TeX
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category