esmlisted
Install: claude install-skill BioTender-max/awesome-bio-agent-skills
# ESM2 Protein Language Model
## Prerequisites
| Requirement | Minimum | Recommended |
|-------------|---------|-------------|
| Python | 3.8+ | 3.10 |
| PyTorch | 1.10+ | 2.0+ |
| CUDA | 11.0+ | 11.7+ |
| GPU VRAM | 8GB | 24GB (A10G) |
| RAM | 16GB | 32GB |
## How to run
> **First time?** See [Installation Guide](../../docs/installation.md) to set up Modal and biomodals.
### Option 1: Modal
```bash
cd biomodals
modal run modal_esm2_predict_masked.py \
--input-faa sequences.fasta \
--out-dir embeddings/
```
**GPU**: A10G (24GB) | **Timeout**: 300s default
### Option 2: Python API (recommended)
```python
import torch
import esm
# Load model
model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
batch_converter = alphabet.get_batch_converter()
model = model.eval().cuda()
# Process sequences
data = [("seq1", "MKTAYIAKQRQISFVK...")]
batch_labels, batch_strs, batch_tokens = batch_converter(data)
with torch.no_grad():
results = model(batch_tokens.cuda(), repr_layers=[33])
# Get embeddings
embeddings = results["representations"][33]
```
## Key parameters
### ESM2 Models
| Model | Parameters | Speed | Quality |
|-------|------------|-------|---------|
| esm2_t6_8M | 8M | Fastest | Fast screening |
| esm2_t12_35M | 35M | Fast | Good |
| esm2_t33_650M | 650M | Medium | Better |
| esm2_t36_3B | 3B | Slow | Best |
## Output format
```
embeddings/
├── embeddings.npy # (N, 1280) array
├── pll_scores.csv # PLL for each sequence
└── metadata.json #