pysam-genomic-fileslisted
Install: claude install-skill jaechang-hits/SciAgent-Skills
# Pysam — Genomic File Toolkit
## Overview
Pysam provides a Pythonic interface to htslib for reading, manipulating, and writing genomic data files. It handles SAM/BAM/CRAM alignments, VCF/BCF variants, and FASTA/FASTQ sequences with efficient region-based random access. Also exposes samtools and bcftools as callable Python functions.
## When to Use
- Reading and querying BAM/CRAM alignment files (region extraction, read filtering)
- Analyzing VCF/BCF variant files (genotype access, variant filtering, annotation)
- Extracting reference sequences from indexed FASTA files
- Calculating per-base coverage and pileup statistics
- Building custom bioinformatics pipelines that combine alignment + variant + sequence data
- Quality control of NGS data (mapping quality, flag filtering, coverage)
- For **alignment from FASTQ** (read mapping), use STAR, BWA, or minimap2 instead
- For **variant calling from BAM**, use GATK or DeepVariant instead
## Prerequisites
```bash
pip install pysam
```
**Note**: Requires htslib C library (bundled with pip install on most platforms). On some Linux systems, may need `libhts-dev` or equivalent. Index files (`.bai`, `.tbi`, `.fai`) required for random access — create with `pysam.index()`, `pysam.tabix_index()`, or `pysam.faidx()`.
## Quick Start
```python
import pysam
# Read BAM file, fetch reads in a region
with pysam.AlignmentFile("sample.bam", "rb") as bam:
for read in bam.fetch("chr1", 1000, 2000):
print(f"{read.query_name}: pos