← ClaudeAtlas

biopython-molecular-biologylisted

Molecular biology toolkit: sequence manipulation, FASTA/GenBank/PDB I/O, NCBI Entrez, BLAST automation, pairwise/MSA alignment, Bio.PDB, phylogenetic trees. Use for batch processing, custom pipelines, format conversion, PubMed/GenBank queries. For quick gene lookups use gget; for multi-service REST APIs use bioservices.
jaechang-hits/SciAgent-Skills · ★ 183 · AI & Automation · score 81
Install: claude install-skill jaechang-hits/SciAgent-Skills
# Biopython: Computational Molecular Biology Toolkit ## Overview Biopython is the standard open-source Python library for computational molecular biology, providing modular APIs for sequence handling, biological file parsing, NCBI database access, BLAST searches, protein structure analysis, and phylogenetics. It supports Python 3 and requires NumPy. ## When to Use - Parse and convert biological file formats (FASTA, GenBank, FASTQ, PDB, mmCIF, PHYLIP) - Fetch sequences or publications from NCBI databases (GenBank, PubMed, Protein) programmatically - Run and parse BLAST searches (remote NCBI or local BLAST+) - Perform pairwise or multiple sequence alignments with custom scoring - Analyze 3D protein structures — distances, angles, DSSP, superimposition - Build and visualize phylogenetic trees from sequence alignments - Calculate sequence statistics (GC content, molecular weight, melting temperature) - Batch-process thousands of sequences with custom filtering logic - Use `pysam` instead for reading SAM/BAM/CRAM alignment files and working with mapped reads; use `scikit-bio` instead for advanced ecological diversity metrics ## Prerequisites - **Python packages**: `biopython`, `numpy`, `matplotlib` (for tree visualization) - **Data requirements**: Sequence files (FASTA, GenBank, FASTQ) or accession IDs for NCBI access - **Environment**: Python 3.8+; NCBI Entrez requires email registration ```bash pip install biopython numpy matplotlib ``` ## Quick Start ```python from Bio