Command-Line Usage
This page provides detailed documentation for all command-line options available in genbank_to.
Basic Syntax
genbank_to [OPTIONS]
Required Arguments
-g,--genbankFILENAMEPath to the input GenBank file (required).
genbank_to -g genome.gbk -n output.fna
Output Format Options
Nucleotide Outputs
-n,--nucleotideFILENAMEOutput the complete nucleotide sequence(s) from the GenBank file (e.g., the genome sequence).
genbank_to -g genome.gbk -n genome.fna
-o,--orfsFILENAMEOutput the DNA sequences of all open reading frames (ORFs/CDS features).
genbank_to -g genome.gbk -o orfs.fna
Protein Outputs
-a,--aminoacidsFILENAMEOutput the amino acid (protein) sequences for all coding sequences.
genbank_to -g genome.gbk -a proteins.faa
Complex Format Outputs
-p,--pttFILENAMEOutput in NCBI PTT (Protein Table) format. This is a somewhat deprecated NCBI format from their genome downloads.
genbank_to -g genome.gbk -p genome.ptt
-f,--functionsFILENAMEOutput a tab-separated table with protein ID and function (product) columns.
genbank_to -g genome.gbk -f functions.tsv
--gff3FILENAMEOutput in GFF3 (Generic Feature Format version 3) format.
genbank_to -g genome.gbk --gff3 genome.gff3
--amrBASENAMEOutput files in the format required by NCBI AMRFinderPlus.
Creates three files:
BASENAME.gff- GFF format annotationBASENAME.faa- Amino acid sequencesBASENAME.fna- Nucleotide sequences
genbank_to -g genome.gbk --amr genome_amr
--phage_finderFILENAMEOutput in the format required by phage_finder.
genbank_to -g phage.gbk --phage_finder phage.pf
--bakta-jsonFILENAMEOutput JSON format similar to that created by Bakta.
genbank_to -g genome.gbk --bakta-json genome.json
Bakta JSON Metadata Options
These options are only valid when --bakta-json is specified:
--bakta-versionSTRINGBakta version string (default: NA). For recording which version of Bakta was used.
--db-versionSTRINGDatabase version string (default: NA). For recording the annotation database version.
--genusSTRINGGenus name. Overrides the genus from GenBank annotation.
--speciesSTRINGSpecies name. Overrides the species from GenBank annotation.
--strainSTRINGStrain designation. Overrides the strain from GenBank annotation.
--gram{+,-}Gram stain result (+ for Gram-positive, - for Gram-negative). If not provided, the tool will attempt to determine this from the genus name.
--translation-tableNUMBERNCBI translation table number (default: 11). Specify if you used a non-standard genetic code.
Example:
genbank_to -g genome.gbk --bakta-json genome.json \
--genus Escherichia \
--species coli \
--strain K-12 \
--gram - \
--translation-table 11
Output Modifiers
-c,--complexUse complex/detailed identifier lines in the output. Includes additional information such as organism name, location, and product description in the FASTA headers.
genbank_to -g genome.gbk -a proteins.faa --complex
--pseudoInclude pseudogenes in the output. By default, pseudogenes are skipped because they often cause BioPython errors. Use this flag to attempt including them.
genbank_to -g genome.gbk -a proteins.faa --pseudo
-i,--seqidIDOnly output specific sequence ID(s). Can be specified multiple times to select multiple sequences. Automatically enables
--separate.# Extract a single sequence genbank_to -g multi.gbk -i NC_001417 -n output.fna # Extract multiple sequences genbank_to -g multi.gbk -i NC_001417 -i NC_001418 -n output.fna
--separateSeparate multi-record GenBank files into individual output files. Each sequence gets its own file with the sequence ID in the filename.
# Creates output.NC_001417.fna, output.NC_001418.fna, etc. genbank_to -g multi.gbk --separate -n output # With no other options, outputs separate GenBank files genbank_to -g multi.gbk --separate
-z,--zipCompress the output using gzip. Experimental feature that may not work with all output formats.
genbank_to -g genome.gbk -f functions.tsv --zip
Logging and Debugging
--logFILENAMESpecify the log file location (default:
genbank_to.log).genbank_to -g genome.gbk -n output.fna --log my_log.txt
-d,--debugEnable debug-level logging for troubleshooting.
genbank_to -g genome.gbk -n output.fna --debug
Other Options
-v,--versionShow the version number and exit.
genbank_to --version-h,--helpShow help message and exit.
genbank_to --help
Complete Example
Here’s a comprehensive example using multiple options:
genbank_to \
--genbank genome.gbk \
--nucleotide genome.fna \
--aminoacids proteins.faa \
--orfs orfs.fna \
--functions functions.tsv \
--gff3 genome.gff3 \
--bakta-json genome.json \
--genus Escherichia \
--species coli \
--strain K-12 \
--gram - \
--complex \
--log conversion.log \
--debug
This command will:
Read the GenBank file
genome.gbkOutput the genome sequence to
genome.fnaOutput protein sequences to
proteins.faawith complex headersOutput ORF sequences to
orfs.fnaOutput a function table to
functions.tsvOutput GFF3 format to
genome.gff3Output Bakta JSON to
genome.jsonwith custom metadataWrite debug logs to
conversion.log