NCBI Genome Download: A Powerful Script for Bacterial Genome Retrieval
How to use ncbi-genome-download to download bacterial genomes from NCBI
If you are interested in downloading bacterial genomes from the National Center for Biotechnology Information (NCBI) FTP servers, you might find the ncbi-genome-download tool very useful. This tool is a Python script that allows you to download genomes from NCBI by various criteria, such as taxonomic name, assembly accession, assembly level, refseq category, and more. You can also choose the formats and files to download, such as GenBank, FASTA, protein, assembly report, etc. In this article, we will show you how to install and use ncbi-genome-download to download bacterial genomes from NCBI.
What is ncbi-genome-download and why use it?
A brief introduction to ncbi-genome-download
ncbi-genome-download is a Python script that was created by Kai Blin, a bioinformatician and software developer at the Novo Nordisk Foundation Center for Biosustainability. The idea was inspired by Mick Watson's Kraken downloader scripts, which are written in Perl and specific to building a Kraken database. However, ncbi-genome-download focuses on the actual genome downloading and supports different formats and criteria. The tool is open source and available on GitHub .
ncbi-genome-download bacteria
Download Zip: https://urlin.us/2vBtlM
The benefits of using ncbi-genome-download
There are several benefits of using ncbi-genome-download over other methods of downloading genomes from NCBI. Some of them are:
It is easy to install and use. You can install it using pip or conda, and run it from the command line with simple options.
It is flexible and customizable. You can download genomes by different criteria, such as taxonomic name, assembly accession, assembly level, refseq category, genera, species, etc. You can also choose the formats and files to download, such as GenBank, FASTA, protein, assembly report, etc.
It is fast and efficient. You can run multiple downloads in parallel using the --parallel option. You can also resume interrupted downloads using the --resume option.
It is updated and maintained. The tool is regularly updated to reflect the changes in the NCBI FTP servers and the available genome data. You can also report issues or suggest features on GitHub .
How to install ncbi-genome-download
Using pip
If you have Python installed on your system, you can use pip to install ncbi-genome-download. Pip is a package manager for Python that allows you to install packages from PyPI , the Python Package Index. To install ncbi-genome-download using pip, run the following command:
pip install ncbi-genome-download
If this fails on older versions of Python, try updating your pip tool first:
pip install --upgrade pip
and then rerun the ncbi-genome-download install.
Using conda
If you prefer to use conda, a package manager for Python and other languages that allows you to install packages from various channels , you can also install ncbi-genome-download using conda. Conda is part of Anaconda [^12^ ), a distribution of Python and other tools for data science and machine learning. To install ncbi-genome-download using conda, run the following command:
ncbi-genome-download bacterial refseq genomes
ncbi-genome-download microbial genomes ftp
ncbi-genome-download prokaryotic genomes
ncbi-genome-download archaea genomes
ncbi-genome-download bacteria genera
ncbi-genome-download bacterial reference genomes
ncbi-genome-download microbial genomes refseq
ncbi-genome-download prokaryotes ftp
ncbi-genome-download archaeal genomes
ncbi-genome-download bacteria assembly levels
ncbi-genome-download bacterial genbank format
ncbi-genome-download microbial genomes pipeline
ncbi-genome-download prokaryotic annotation
ncbi-genome-download archaea refseq categories
ncbi-genome-download bacteria species
ncbi-genome-download bacterial fasta format
ncbi-genome-download microbial genomes blast
ncbi-genome-download prokaryotic genome submission
ncbi-genome-download archaeal genbank format
ncbi-genome-download bacteria strains
ncbi-genome-download bacterial gff format
ncbi-genome-download microbial genomes bioproject
ncbi-genome-download prokaryotic genome assembly
ncbi-genome-download archaea fasta format
ncbi-genome-download bacteria taxid
ncbi-genome-download bacterial protein sequences
ncbi-genome-download microbial genomes biosample
ncbi-genome-download prokaryotic genome analysis
ncbi-genome-download archaea gff format
ncbi-genome-download bacteria accession numbers
ncbi-genome-download bacterial rna sequences
ncbi-genome-download microbial genomes annotation tools
ncbi-genome-download prokaryotic genome browser
ncbi-genome-download archaea protein sequences
ncbi-genome-download bacteria download script
ncbi-genome-download bacterial genome size
ncbi-genome-download microbial genomes contact and outreach
ncbi-genome-download prokaryotic genome comparison
ncbi-genome-download archaea rna sequences
ncbi-genome-download bacteria download directory
conda install -c bioconda ncbi-genome-download
This will install ncbi-genome-download from the bioconda channel , which is a community-driven channel that provides bioinformatics packages for conda.
How to download bacterial genomes by different criteria
Using taxonomic name or ID
One of the most common ways to download bacterial genomes from NCBI is by using the taxonomic name or ID of the group of interest. For example, if you want to download all the genomes of the phylum Firmicutes, you can use the following command:
ncbi-genome-download --section refseq --group bacteria --taxon firmicutes
This will download all the genomes of the Firmicutes phylum from the refseq section of the NCBI FTP servers. The refseq section contains curated and annotated genomes that are considered reference sequences . You can also use the --section genbank option to download genomes from the genbank section, which contains all the genomes submitted to NCBI . However, note that some genomes may be duplicated or incomplete in the genbank section.
You can also use the taxonomic ID instead of the name, if you know it. For example, the taxonomic ID of Firmicutes is 1239, so you can use the following command:
ncbi-genome-download --section refseq --group bacteria --taxid 1239
This will download the same genomes as before. You can find the taxonomic ID of any group by using the NCBI Taxonomy Browser .
Using assembly accession or BioProject accession
If you want to download a specific genome or a set of genomes by their assembly accession or BioProject accession, you can use the --assembly-accessions or --bioprojects options. For example, if you want to download the genome of Escherichia coli K-12 MG1655, which has the assembly accession GCF_000005845.2 and the BioProject accession PRJNA57779, you can use either of these commands:
ncbi-genome-download --section refseq --group bacteria --assembly-accessions GCF_000005845.2
ncbi-genome-download --section refseq --group bacteria --bioprojects PRJNA57779
This will download only the genome of E. coli K-12 MG1655 from the refseq section. You can find the assembly accession and BioProject accession of any genome by using the NCBI Assembly Database or the NCBI BioProject Database .
Using assembly level or refseq category
If you want to filter the genomes by their assembly level or refseq category, you can use the --assembly-level or --refseq-category options. The assembly level indicates how complete and contiguous a genome assembly is, and it can be one of these values: complete, chromosome, scaffold, or contig . The refseq category indicates how representative and reliable a genome sequence is, and it can be one of these values: reference, representative, or na . For example, if you want to download only the complete genomes of bacteria that are reference sequences from the refseq section, you can use this command:
ncbi-genome-download --section refseq --group bacteria --assembly-level complete --refseq-category reference
This will download only the genomes that meet both criteria. You can also use multiple values for each option by separating them with commas. For example, if you want to download all the genomes of bacteria that are either complete or chromosome level assemblies from either the refseq or genbank sections, you can use this command:
ncbi-genome-download --section refseq,genbank --group bacteria --assembly-level complete,chromosome
This will download all the genomes that meet either criterion.
Using genera or species name
If you want to download genomes by their genera or species name, you can use the --genera or --species options. For example, if you want to download all the genomes of bacteria that belong to the genus Bacillus from the refseq section, you can use this command:
ncbi-genome-download --section refseq --group bacteria --genera Bacillus
This will download all the genomes of Bacillus species from the refseq section. You can also use the species name instead of the genus name, if you know it. For example, if you want to download only the genome of Bacillus subtilis 168, which is a model organism for bacterial genetics and physiology, you can use this command:
ncbi-genome-download --section refseq --group bacteria --species "Bacillus subtilis 168"
This will download only the genome of B. subtilis 168 from the refseq section.