esmlisted

ESM2 protein language model for embeddings and sequence scoring. Use this skill when: (1) Computing pseudo-log-likelihood (PLL) scores, (2) Getting protein embeddings for clustering, (3) Filtering designs by sequence plausibility, (4) Zero-shot variant effect prediction, (5) Analyzing sequence-function relationships. For structure prediction, use chai or boltz. For QC thresholds, use protein-qc.
BioTender-max/awesome-bio-agent-skills · ★ 58 · AI & Automation · score 80

Install: claude install-skill BioTender-max/awesome-bio-agent-skills

# ESM2 Protein Language Model ## Prerequisites | Requirement | Minimum | Recommended | |-------------|---------|-------------| | Python | 3.8+ | 3.10 | | PyTorch | 1.10+ | 2.0+ | | CUDA | 11.0+ | 11.7+ | | GPU VRAM | 8GB | 24GB (A10G) | | RAM | 16GB | 32GB | ## How to run > **First time?** See [Installation Guide](../../docs/installation.md) to set up Modal and biomodals. ### Option 1: Modal ```bash cd biomodals modal run modal_esm2_predict_masked.py \ --input-faa sequences.fasta \ --out-dir embeddings/ ``` **GPU**: A10G (24GB) | **Timeout**: 300s default ### Option 2: Python API (recommended) ```python import torch import esm # Load model model, alphabet = esm.pretrained.esm2_t33_650M_UR50D() batch_converter = alphabet.get_batch_converter() model = model.eval().cuda() # Process sequences data = [("seq1", "MKTAYIAKQRQISFVK...")] batch_labels, batch_strs, batch_tokens = batch_converter(data) with torch.no_grad(): results = model(batch_tokens.cuda(), repr_layers=[33]) # Get embeddings embeddings = results["representations"][33] ``` ## Key parameters ### ESM2 Models | Model | Parameters | Speed | Quality | |-------|------------|-------|---------| | esm2_t6_8M | 8M | Fastest | Fast screening | | esm2_t12_35M | 35M | Fast | Good | | esm2_t33_650M | 650M | Medium | Better | | esm2_t36_3B | 3B | Slow | Best | ## Output format ``` embeddings/ ├── embeddings.npy # (N, 1280) array ├── pll_scores.csv # PLL for each sequence └── metadata.json #