Contigs are the first level in the hierarchy of a genomic assembly. The next step is to build scaffolds (supercontigs). To build a scaffold, researchers place several contigs in the correct order and orientation. To make a scaffold a single sequence unit (a single sequence record), they represent sequencing gaps between the contigs in the scaffold with series of NNN’s (instead of DNA sequence of A, T, G, and C). Assemblies at the scaffold level will generally have a number of scaffold records plus a number of contigs records.
The next step is to have the scaffolds that belong to the same chromosome properly ordered, oriented, and assembled into the chromosome sequence. Again, researchers represent any sequencing gaps in an assembled chromosome with NNN's. An assembly at the chromosome level will generally have one record for each chromosome. Unlocalized* and unplaced** contigs and scaffolds records may accompany the chromosome records and together constitute the primary assembly. In addition to the primary assembly, a genome assembly may contain other sequence records, such as patches (to correct regions of the primary assembly) and/or alternative loci (to offer alternative models for highly variable regions of chromosomes). Finally, a non-nuclear genome (such as mitochondrion) may complete the collection for the assembly.
Complete assemblies will have no sequencing gaps in the assembled chromosomes. While it is possible to generate complete assemblies for prokaryotes — where genomes are single circular chromosomes — and lower eukaryotes (such as yeast), this level is currently unattainable for the complex genomes of higher eukaryotes (including human).
Note that researchers may halt their sequencing/assembling efforts once they gather the information they need. Therefore, many assemblies stay indefinitely as collections of contigs and/or scaffolds regardless of the organism's genomic complexity.
For more in-depth explanations see:
- A primer on genome assembly methods
- NCBI Genome Assembly Model
- The Nucleic Acids research Article about the NCBI Assembly database
- A series of NCBI blogs on the human-genome related topics
**chromosome location is not determined