Assembly accession numbers are distinctly-formatted sequence accession numbers that NCBI staff assign to individual genomic assemblies. Unlike other sequence accession numbers, assembly accessions do not represent a single sequence record, but rather the collection of sequence records that comprise an individual genomic assembly.

The format for GenBank (primary) assembly accessions is: [ GCA ][ _ ][nine digits][.][version number]
The format for RefSeq (NCBI-derived) assembly accessions is: [ GCF ][ _ ][nine digits][.][version number]

You will find these accessions within the Assembly database records that NCBI staff generate for each genomic assembly. A single Assembly record represents both the GenBank and its corresponding RefSeq assembly (if available). For example, the northern white-cheeked gibbon assembly (Nleu_3.0) has both accessions: GCA_000146795.3 (GenBank) and GCF_000146795.2 (RefSeq). The version numbers indicate that the GenBank assembly (comprised of the underlying sequence records) was updated twice, while the RefSeq assembly was updated once.

The Assembly record also reports the relationship between the GenBank and RefSeq assembly. In general, the RefSeq assembly is a copy of the GenBank data. Therefore, the two assemblies are often identical, but they may diverge as RefSeq curation progresses.  For Nleu_3.0, the RefSeq version lacks 8 unlocalized scaffolds that the RefSeq staff determined to belong to the mitochondrial genome. 

The collection of sequence records in the GenBank gibbon assembly GCA_000146795.3 is comprised of 26 assembled chromosome records, and you will find these listed under the Assembly Definition tab of the table in the record. Additionally, there are many unlocalized sequences for which we know the chromosome locations, but they are not part of the assembled chromosomes. Their counts are listed in the last column of the table. For example, there are 89 sequence records for unplaced scaffolds of chromosome 1.  Moreover, there are 15,567 sequence pieces that resulted from the sequencing of this organism, but they are not placed.  All these sequences are part of the assembly — 17,492 records in total— and they are uniquely designated/represented by the GCA_000146795.3 accession number.

Comments (0)