Welcome to genbank_to’s documentation!

Edwards Lab DOI License: MIT PyPi

genbank_to is a straightforward Python application and library for converting NCBI GenBank format files to a variety of other formats commonly used in bioinformatics workflows.

Overview

The tool reads NCBI GenBank format files and converts them to various output formats including:

  • FASTA nucleotide sequences (genome, ORFs)

  • FASTA amino acid sequences (proteins)

  • GFF3 format

  • NCBI PTT format

  • Function tables

  • Bakta JSON format

  • AMRFinderPlus format

  • Phage Finder format

Both command-line and library interfaces are provided, making it easy to integrate into scripts and pipelines.

Key Features

  • Multiple output formats: Convert to numerous formats in a single command

  • Flexible input: Handles single and multi-record GenBank files

  • Python library: Import and use functions in your own scripts

  • Well-tested: Comprehensive test suite ensures reliability

  • Active development: Maintained by the Edwards Lab at Flinders University

Quick Example

Convert a GenBank file to multiple formats:

genbank_to -g genome.gbk \
    -n genome.fna \
    -a proteins.faa \
    -o orfs.fna \
    --gff3 genome.gff3

Use as a Python library:

from GenBankToLib import genbank_to_faa, genbank_to_fna

# Extract protein sequences
for seqid, protid, sequence in genbank_to_faa('genome.gbk'):
    print(f">{protid}\n{sequence}")

Indices and tables