A major strategy to generate an assembly involves (1) isolation of genomic DNA from a biological sample and (2) fragmentation of DNA into small pieces that are then sequenced individually. Once the sequences of the small pieces — called reads — are obtained, researchers assemble these like tiny pieces of a giant puzzle into progressively larger contiguous sequence pieces (called contigs). This approach is termed Whole Genome Shotgun (WGS) sequencing.

Contigs are the first level in the hierarchy of a genomic assembly. The next step is to build scaffolds (supercontigs). To build a scaffold, researchers place several contigs in the correct order and orientation. To make a scaffold a single sequence unit (a single sequence record), they represent sequencing gaps between the contigs in the scaffold with series of NNN’s (instead of DNA sequence of A, T, G, and C). Assemblies at the scaffold level will generally have a number of scaffold records plus a number of contigs records.

The next step is to have the scaffolds that belong to the same chromosome properly ordered, oriented, and assembled into the chromosome sequence. Again, researchers represent any sequencing gaps in an assembled chromosome with NNN's. An assembly at the chromosome level will generally have one record for each chromosome. Unlocalized* and unplaced** contigs and scaffolds records may accompany the chromosome records and together constitute the primary assembly. In addition to the primary assembly, a genome assembly may contain other sequence records, such as patches (to correct regions of the primary assembly) and/or alternative loci (to offer alternative models for highly variable regions of chromosomes). Finally, a non-nuclear genome (such as mitochondrion) may complete the collection for the assembly.

Complete assemblies will have no sequencing gaps in the assembled chromosomes. While it is possible to generate complete assemblies for prokaryotes — where genomes are single circular chromosomes — and lower eukaryotes (such as yeast), this level is currently unattainable for the complex genomes of higher eukaryotes (including human).

Note that researchers may halt their sequencing/assembling efforts once they gather the information they need. Therefore, many assemblies stay indefinitely as collections of contigs and/or scaffolds regardless of the organism's genomic complexity.

For more in-depth explanations see: *contigs or scaffolds are known to belong to a certain chromosome, but they are not a part of the assembled chromosome
**chromosome location is not determined
Comments (0)