biopython-sequence-analysislisted
Install: claude install-skill jaechang-hits/SciAgent-Skills
# Biopython: Sequence Analysis Toolkit
## Overview
Biopython provides a comprehensive suite of modules for sequence-centric bioinformatics: reading and writing every major biological file format (FASTA, FASTQ, GenBank, GFF), querying NCBI databases programmatically, running BLAST searches and parsing results, aligning sequences pairwise or in multiple-sequence alignments, and building and visualizing phylogenetic trees. This skill focuses on analysis workflows — from NCBI data retrieval through alignment to phylogenetic inference.
For PCR primer design, restriction enzyme digestion, cloning simulation, protein structure analysis (Bio.PDB), and molecular weight/Tm calculations, see **biopython-molecular-biology**.
## When to Use
- Download a gene family from NCBI Nucleotide/Protein, align sequences, and construct a phylogenetic tree
- Parse GenBank or GFF3 annotation files and extract CDS sequences for a set of features
- Run a BLAST search against NCBI `nt` or `nr`, filter significant hits, and fetch their full sequences
- Compute pairwise sequence identities or score alignments with BLOSUM62/PAM250 matrices
- Index a large multi-FASTA or FASTQ file with `SeqIO.index()` for random-access retrieval without loading all sequences into RAM
- Convert between sequence formats (FASTA ↔ GenBank ↔ FASTQ ↔ PHYLIP) in a single call
- Traverse, root, prune, and annotate a Newick or Nexus phylogenetic tree programmatically
- Use **pysam** instead when working with SAM/BAM/CRAM alignm