1. Accessing viral data that are organized in individual RefSeq assemblies
Assembly records aggregate all segments of segmented viral genomes as a single genome assembly. On the web, search the Assembly database for all viral entries or for a smaller taxonomic group (example):
- On the search results page, select Latest RefSeq within the Status facet on the left side of the screen.
- Use additional facets/filters to narrow your search results to the set that you want. (Tip: A statement above the records will indicate which filters are activated and allow you to Clear all before a new search/selection)
- Use the blue Download Assemblies button at the top of the page and select the format of your choice.
- Note the estimated size of the data (uncompressed). The data will download as a file with tar compression.
2. Accessing individual RefSeq genome records for viruses (not organized in individual assemblies)
NCBI creates an individual RefSeq sequence record for each viral segment. Use the links under the Explore Viral Genome Sequences section of the Viral Genomes page (a part of Genome resource) for convenient access and selection of the data that you want:
- Select a browser, for example the Viral genome browser.
- (If desired, narrow your selection by a taxonomy node
- Use the top part of the browser and click on the node that you want.
- In the resulting menu, select Complete genomes to reload the page accordingly.)
- To obtain RefSeq nucleotide sequence records:
- Use the Retrieve sequences menu in the top right corner of the browser's page and select RefSeq Nucleotides to display the records in the Nucleotide database.
- Download the records in the format that you want (see downloading instructions).
- To obtain sequence records for proteins that are annotated on RefSeq genomes:
- Use the Retrieve sequences menu in the top right corner of the page and select RefSeq Proteins to display the records in the Protein database.
- Download the records as described.
- (Another option in that you will see in the Retrieve sequences menu is Neighbor Nucleotides that will retrieve GenBank (INSDC) records for complete viral genomes.)
The Viral Genomes resource page also provides the direct link (under the Download Viral Genome Data section) to the Complete RefSeq release of viral and viroid sequences:
- RefSeq collection releases occur every two months
- There is no archive of previous releases
- You can update the records between releases by parsing the daily files