bakta-genome-annotation

Solid

Annotate bacterial and archaeal genomes and plasmids with Bakta's Prodigal/HMM/diamond pipeline. Identifies CDS, ncRNA, tRNA, rRNA, tmRNA, sORFs, CRISPR arrays, oriC/oriV/oriT, and gaps against a curated UniRef-derived database. Produces NCBI-compatible GFF3, GenBank, EMBL, JSON, FASTA, TSV, and a circular genome plot. Use Prokka for legacy pipelines or non-bacterial kingdoms; PGAP for NCBI GenBank submission.

Data & Documents 286 stars 26 forks Updated 4 days ago NOASSERTION

Install

View on GitHub

Quality Score: 82/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Bakta Genome Annotation ## Overview Bakta is a command-line pipeline for rapid, standardized annotation of bacterial and archaeal genomes and plasmids. It combines Prodigal for CDS prediction, tRNAscan-SE/Aragorn/Barrnap/Infernal for non-coding RNA, PILER-CR/PILERCR for CRISPR detection, and a tiered DIAMOND/HMM search against a curated UniRef100 + IPS/UPS database to assign gene names, EC numbers, GO terms, and COG categories. Bakta produces NCBI-compatible outputs (GFF3, GenBank, EMBL, INSDC-formatted FASTA, plus a JSON summary and a circular Circos plot) for a typical 5 Mb genome in 5–15 minutes on 8 CPUs. ## When to Use - Annotating bacterial or archaeal genome assemblies (Illumina, PacBio, Nanopore) with NCBI-compatible locus tags and product names - Annotating plasmids and other circular replicons separately with `--plasmid` and `--complete` flags - Producing JSON-structured annotation outputs that can be parsed without GenBank or GFF3 detours - Generating a publication-ready circular genome plot via the bundled `bakta_plot` command - Annotating MAGs (metagenome-assembled genomes) with `--meta` to disable Prodigal training - Use **Prokka** instead when you need viral/mitochondrial kingdoms or when you must reproduce a legacy Prokka pipeline exactly - Use **PGAP** instead when submitting to NCBI GenBank with full standards compliance - Use **Bakta** when you want faster runs, regularly updated UniRef-derived databases, AMRFinderPlus integration, and a JSON summary ...

Details

Author: jaechang-hits
Repository: jaechang-hits/SciAgent-Skills
Created: 5 months ago
Last Updated: 4 days ago
Language: Python
License: NOASSERTION

Bundled in these plugins

sciagent-skills

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

prokka-genome-annotation

Annotate prokaryotic genomes (bacteria, archaea, viruses) via Prokka's BLAST/HMM pipeline. Identifies CDS, rRNA, tRNA, tmRNA, signal peptides against Pfam, TIGRFAMs, RefSeq. Outputs GFF3, GenBank, FASTA, TSV. Use PGAP for NCBI GenBank submission; Bakta for faster NCBI-compatible annotation.

286 Updated 4 days ago

jaechang-hits

AI & Automation Solid

roary-pangenome

Compute the bacterial pan-genome from Prokka/Bakta GFF3 annotations with Roary's CD-HIT + BLAST + MCL clustering pipeline. Builds gene presence/absence matrices, core/soft-core/shell/cloud partitions, multi-FASTA core gene alignments (with `-e`), and a pan-genome reference. Use Panaroo for higher-accuracy pan-genomes from highly fragmented assemblies, PIRATE for paralog-aware clustering, or PPanGGOLiN for graph-based partitioning.

286 Updated 4 days ago

jaechang-hits

AI & Automation Listed

vivarium-prep

Get genomes ready for comparative analysis: assess assembly quality (contigs, N50, GC, length, completeness), and annotate genes and function (Prokka, eggNOG, dbCAN/CAZy). Use whenever the user wants genome statistics or QC, to check how good an assembly is, to assemble long reads, to annotate a genome, to call genes, or to get COG/KEGG/CAZy function tables before comparing genomes. Triggers on phrases like "genome stats / QC", "what's the N50 / GC / contig count", "how good is this assembly", "annotate this genome", "run Prokka / eggNOG / dbCAN", "call genes", "assemble these reads", "基因组质控/统计", "N50/GC/contig 数", "组装质量怎么样", "注释这个基因组", "跑 Prokka/eggNOG/dbCAN", "CAZy/COG/KEGG 注释". Light QC runs locally in the bio_tools conda env; heavy steps (assembly, eggNOG, dbCAN) are scaffolded as ready-to-run commands. Part of the vivarium comparative-genomics skill set.

1 Updated 1 weeks ago

Jason-0409-G