Welcome to genbank_to’s documentation!

genbank_to is a straightforward Python application and library for converting NCBI GenBank format files to a variety of other formats commonly used in bioinformatics workflows.

Overview

The tool reads NCBI GenBank format files and converts them to various output formats including:

FASTA nucleotide sequences (genome, ORFs)
FASTA amino acid sequences (proteins)
GFF3 format
NCBI PTT format
Function tables
Bakta JSON format
AMRFinderPlus format
Phage Finder format

Both command-line and library interfaces are provided, making it easy to integrate into scripts and pipelines.

Key Features

Multiple output formats: Convert to numerous formats in a single command
Flexible input: Handles single and multi-record GenBank files
Python library: Import and use functions in your own scripts
Well-tested: Comprehensive test suite ensures reliability
Active development: Maintained by the Edwards Lab at Flinders University

Contents:

Quick Example

Convert a GenBank file to multiple formats:

genbank_to -g genome.gbk \
    -n genome.fna \
    -a proteins.faa \
    -o orfs.fna \
    --gff3 genome.gff3

Use as a Python library:

from GenBankToLib import genbank_to_faa, genbank_to_fna

# Extract protein sequences
for seqid, protid, sequence in genbank_to_faa('genome.gbk'):
    print(f">{protid}\n{sequence}")

Welcome to genbank_to’s documentation!

Overview

Key Features

Quick Example

Indices and tables