Quick Start =========== This guide will get you up and running with genbank_to quickly. Basic Usage ----------- The most common use case is converting a GenBank file to FASTA format: .. code-block:: bash genbank_to -g input.gbk -n output.fna This reads ``input.gbk`` and writes the nucleotide sequence to ``output.fna``. Common Conversions ------------------ Extract Genome Sequence ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash genbank_to -g genome.gbk -n genome.fna Extract Protein Sequences ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash genbank_to -g genome.gbk -a proteins.faa Extract ORF Sequences (DNA) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash genbank_to -g genome.gbk -o orfs.fna Generate GFF3 File ~~~~~~~~~~~~~~~~~~ .. code-block:: bash genbank_to -g genome.gbk --gff3 genome.gff3 Multiple Outputs at Once ~~~~~~~~~~~~~~~~~~~~~~~~~ You can request multiple output formats in a single command: .. code-block:: bash genbank_to -g genome.gbk \ -n genome.fna \ -a proteins.faa \ -o orfs.fna \ --gff3 genome.gff3 Using as a Python Library -------------------------- You can also use genbank_to as a library in your Python scripts: Extract All Protein Sequences ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from GenBankToLib import genbank_to_faa for seqid, protid, sequence in genbank_to_faa('genome.gbk'): print(f">{protid}") print(sequence) Extract Genome Sequences ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from GenBankToLib import genbank_to_fna for seqid, sequence in genbank_to_fna('genome.gbk'): print(f">{seqid}") print(sequence) Extract Functions ~~~~~~~~~~~~~~~~~ .. code-block:: python from GenBankToLib import genbank_to_functions for protid, function in genbank_to_functions('genome.gbk'): print(f"{protid}\t{function}") Convert to JSON Format ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from GenBankToLib import genbank_to_json import json genome_info = {'gram': None, 'translation_table': 11} json_data = genbank_to_json('genome.gbk', genome_info) # Save to file with open('genome.json', 'w') as f: json.dump(json_data, f, indent=2) Example Workflow ---------------- Here's a complete example workflow for analyzing a phage genome: 1. **Download a phage genome from NCBI**: .. code-block:: bash # Example: Enterobacteria phage phiX174 # (This is the test file in the repository) wget https://raw.githubusercontent.com/linsalrob/genbank_to/main/test/NC_001417.gbk 2. **Extract multiple formats**: .. code-block:: bash genbank_to -g NC_001417.gbk \ -n NC_001417.fna \ -a NC_001417.faa \ -o NC_001417_orfs.fna \ -f NC_001417_functions.tsv \ --gff3 NC_001417.gff3 3. **View the results**: .. code-block:: bash # View genome sequence head NC_001417.fna # Count proteins grep -c ">" NC_001417.faa # View functions head NC_001417_functions.tsv Working with Multi-GenBank Files --------------------------------- If your GenBank file contains multiple sequences (separated by ``//``), you can split them: .. code-block:: bash genbank_to -g multi.gbk --separate -n output This creates separate files: ``output.seqid1.fna``, ``output.seqid2.fna``, etc. Filtering by Sequence ID ~~~~~~~~~~~~~~~~~~~~~~~~~ Extract only specific sequences: .. code-block:: bash genbank_to -g multi.gbk -i NC_001417 -n output.fna Next Steps ---------- - Read the :doc:`usage` guide for detailed command-line options - Explore the :doc:`output_formats` for all available formats - Check the :doc:`examples` for more complex use cases - Review the :doc:`api` documentation for library usage