Command-Line Usage ================== This page provides detailed documentation for all command-line options available in genbank_to. Basic Syntax ------------ .. code-block:: bash genbank_to [OPTIONS] Required Arguments ------------------ ``-g``, ``--genbank`` FILENAME Path to the input GenBank file (required). .. code-block:: bash genbank_to -g genome.gbk -n output.fna Output Format Options --------------------- Nucleotide Outputs ~~~~~~~~~~~~~~~~~~ ``-n``, ``--nucleotide`` FILENAME Output the complete nucleotide sequence(s) from the GenBank file (e.g., the genome sequence). .. code-block:: bash genbank_to -g genome.gbk -n genome.fna ``-o``, ``--orfs`` FILENAME Output the DNA sequences of all open reading frames (ORFs/CDS features). .. code-block:: bash genbank_to -g genome.gbk -o orfs.fna Protein Outputs ~~~~~~~~~~~~~~~ ``-a``, ``--aminoacids`` FILENAME Output the amino acid (protein) sequences for all coding sequences. .. code-block:: bash genbank_to -g genome.gbk -a proteins.faa Complex Format Outputs ~~~~~~~~~~~~~~~~~~~~~~~ ``-p``, ``--ptt`` FILENAME Output in NCBI PTT (Protein Table) format. This is a somewhat deprecated NCBI format from their genome downloads. .. code-block:: bash genbank_to -g genome.gbk -p genome.ptt ``-f``, ``--functions`` FILENAME Output a tab-separated table with protein ID and function (product) columns. .. code-block:: bash genbank_to -g genome.gbk -f functions.tsv ``--gff3`` FILENAME Output in GFF3 (Generic Feature Format version 3) format. .. code-block:: bash genbank_to -g genome.gbk --gff3 genome.gff3 ``--amr`` BASENAME Output files in the format required by `NCBI AMRFinderPlus `_. Creates three files: - ``BASENAME.gff`` - GFF format annotation - ``BASENAME.faa`` - Amino acid sequences - ``BASENAME.fna`` - Nucleotide sequences .. code-block:: bash genbank_to -g genome.gbk --amr genome_amr ``--phage_finder`` FILENAME Output in the format required by `phage_finder `_. .. code-block:: bash genbank_to -g phage.gbk --phage_finder phage.pf ``--bakta-json`` FILENAME Output JSON format similar to that created by `Bakta `_. .. code-block:: bash genbank_to -g genome.gbk --bakta-json genome.json Bakta JSON Metadata Options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These options are only valid when ``--bakta-json`` is specified: ``--bakta-version`` STRING Bakta version string (default: NA). For recording which version of Bakta was used. ``--db-version`` STRING Database version string (default: NA). For recording the annotation database version. ``--genus`` STRING Genus name. Overrides the genus from GenBank annotation. ``--species`` STRING Species name. Overrides the species from GenBank annotation. ``--strain`` STRING Strain designation. Overrides the strain from GenBank annotation. ``--gram`` {+,-} Gram stain result (+ for Gram-positive, - for Gram-negative). If not provided, the tool will attempt to determine this from the genus name. ``--translation-table`` NUMBER NCBI translation table number (default: 11). Specify if you used a non-standard genetic code. Example: .. code-block:: bash genbank_to -g genome.gbk --bakta-json genome.json \ --genus Escherichia \ --species coli \ --strain K-12 \ --gram - \ --translation-table 11 Output Modifiers ---------------- ``-c``, ``--complex`` Use complex/detailed identifier lines in the output. Includes additional information such as organism name, location, and product description in the FASTA headers. .. code-block:: bash genbank_to -g genome.gbk -a proteins.faa --complex ``--pseudo`` Include pseudogenes in the output. By default, pseudogenes are skipped because they often cause BioPython errors. Use this flag to attempt including them. .. code-block:: bash genbank_to -g genome.gbk -a proteins.faa --pseudo ``-i``, ``--seqid`` ID Only output specific sequence ID(s). Can be specified multiple times to select multiple sequences. Automatically enables ``--separate``. .. code-block:: bash # Extract a single sequence genbank_to -g multi.gbk -i NC_001417 -n output.fna # Extract multiple sequences genbank_to -g multi.gbk -i NC_001417 -i NC_001418 -n output.fna ``--separate`` Separate multi-record GenBank files into individual output files. Each sequence gets its own file with the sequence ID in the filename. .. code-block:: bash # Creates output.NC_001417.fna, output.NC_001418.fna, etc. genbank_to -g multi.gbk --separate -n output # With no other options, outputs separate GenBank files genbank_to -g multi.gbk --separate ``-z``, ``--zip`` Compress the output using gzip. Experimental feature that may not work with all output formats. .. code-block:: bash genbank_to -g genome.gbk -f functions.tsv --zip Logging and Debugging --------------------- ``--log`` FILENAME Specify the log file location (default: ``genbank_to.log``). .. code-block:: bash genbank_to -g genome.gbk -n output.fna --log my_log.txt ``-d``, ``--debug`` Enable debug-level logging for troubleshooting. .. code-block:: bash genbank_to -g genome.gbk -n output.fna --debug Other Options ------------- ``-v``, ``--version`` Show the version number and exit. .. code-block:: bash genbank_to --version ``-h``, ``--help`` Show help message and exit. .. code-block:: bash genbank_to --help Complete Example ---------------- Here's a comprehensive example using multiple options: .. code-block:: bash genbank_to \ --genbank genome.gbk \ --nucleotide genome.fna \ --aminoacids proteins.faa \ --orfs orfs.fna \ --functions functions.tsv \ --gff3 genome.gff3 \ --bakta-json genome.json \ --genus Escherichia \ --species coli \ --strain K-12 \ --gram - \ --complex \ --log conversion.log \ --debug This command will: 1. Read the GenBank file ``genome.gbk`` 2. Output the genome sequence to ``genome.fna`` 3. Output protein sequences to ``proteins.faa`` with complex headers 4. Output ORF sequences to ``orfs.fna`` 5. Output a function table to ``functions.tsv`` 6. Output GFF3 format to ``genome.gff3`` 7. Output Bakta JSON to ``genome.json`` with custom metadata 8. Write debug logs to ``conversion.log``