bakta-genome-annotationlisted
Install: claude install-skill jaechang-hits/SciAgent-Skills
# Bakta Genome Annotation
## Overview
Bakta is a command-line pipeline for rapid, standardized annotation of bacterial and archaeal genomes and plasmids. It combines Prodigal for CDS prediction, tRNAscan-SE/Aragorn/Barrnap/Infernal for non-coding RNA, PILER-CR/PILERCR for CRISPR detection, and a tiered DIAMOND/HMM search against a curated UniRef100 + IPS/UPS database to assign gene names, EC numbers, GO terms, and COG categories. Bakta produces NCBI-compatible outputs (GFF3, GenBank, EMBL, INSDC-formatted FASTA, plus a JSON summary and a circular Circos plot) for a typical 5 Mb genome in 5–15 minutes on 8 CPUs.
## When to Use
- Annotating bacterial or archaeal genome assemblies (Illumina, PacBio, Nanopore) with NCBI-compatible locus tags and product names
- Annotating plasmids and other circular replicons separately with `--plasmid` and `--complete` flags
- Producing JSON-structured annotation outputs that can be parsed without GenBank or GFF3 detours
- Generating a publication-ready circular genome plot via the bundled `bakta_plot` command
- Annotating MAGs (metagenome-assembled genomes) with `--meta` to disable Prodigal training
- Use **Prokka** instead when you need viral/mitochondrial kingdoms or when you must reproduce a legacy Prokka pipeline exactly
- Use **PGAP** instead when submitting to NCBI GenBank with full standards compliance
- Use **Bakta** when you want faster runs, regularly updated UniRef-derived databases, AMRFinderPlus integration, and a JSON summary