Quick Start
This guide will get you up and running with genbank_to quickly.
Basic Usage
The most common use case is converting a GenBank file to FASTA format:
genbank_to -g input.gbk -n output.fna
This reads input.gbk and writes the nucleotide sequence to output.fna.
Common Conversions
Extract Genome Sequence
genbank_to -g genome.gbk -n genome.fna
Extract Protein Sequences
genbank_to -g genome.gbk -a proteins.faa
Extract ORF Sequences (DNA)
genbank_to -g genome.gbk -o orfs.fna
Generate GFF3 File
genbank_to -g genome.gbk --gff3 genome.gff3
Multiple Outputs at Once
You can request multiple output formats in a single command:
genbank_to -g genome.gbk \
-n genome.fna \
-a proteins.faa \
-o orfs.fna \
--gff3 genome.gff3
Using as a Python Library
You can also use genbank_to as a library in your Python scripts:
Extract All Protein Sequences
from GenBankToLib import genbank_to_faa
for seqid, protid, sequence in genbank_to_faa('genome.gbk'):
print(f">{protid}")
print(sequence)
Extract Genome Sequences
from GenBankToLib import genbank_to_fna
for seqid, sequence in genbank_to_fna('genome.gbk'):
print(f">{seqid}")
print(sequence)
Extract Functions
from GenBankToLib import genbank_to_functions
for protid, function in genbank_to_functions('genome.gbk'):
print(f"{protid}\t{function}")
Convert to JSON Format
from GenBankToLib import genbank_to_json
import json
genome_info = {'gram': None, 'translation_table': 11}
json_data = genbank_to_json('genome.gbk', genome_info)
# Save to file
with open('genome.json', 'w') as f:
json.dump(json_data, f, indent=2)
Example Workflow
Here’s a complete example workflow for analyzing a phage genome:
Download a phage genome from NCBI:
# Example: Enterobacteria phage phiX174
# (This is the test file in the repository)
wget https://raw.githubusercontent.com/linsalrob/genbank_to/main/test/NC_001417.gbk
Extract multiple formats:
genbank_to -g NC_001417.gbk \
-n NC_001417.fna \
-a NC_001417.faa \
-o NC_001417_orfs.fna \
-f NC_001417_functions.tsv \
--gff3 NC_001417.gff3
View the results:
# View genome sequence
head NC_001417.fna
# Count proteins
grep -c ">" NC_001417.faa
# View functions
head NC_001417_functions.tsv
Working with Multi-GenBank Files
If your GenBank file contains multiple sequences (separated by //), you can split them:
genbank_to -g multi.gbk --separate -n output
This creates separate files: output.seqid1.fna, output.seqid2.fna, etc.
Filtering by Sequence ID
Extract only specific sequences:
genbank_to -g multi.gbk -i NC_001417 -n output.fna
Next Steps
Read the Command-Line Usage guide for detailed command-line options
Explore the Output Formats for all available formats
Check the Examples for more complex use cases
Review the API Reference documentation for library usage