Coding regions (CDS) can be on the plus strand or the minus (reverse complement) strand of a genomic sequence. Nucleotide BLAST (blastn) can help you to determine the correct coding strand by using the CDS feature display on the BLAST search results page. See the article on blastn and CDS feature set up.
To determine the correct coding strand:
- Perform a blastn search.
- On the search result page, click the Alignments tab to view pairwise alignments.
- Check the CDS feature box to display the CDS feature on the alignments.
- Select an alignment to view.
- For a selected alignment, check the Strand report above the alignment. It will be one of these two:
- Click the GenBank link in the Range row above the alignment. The link will display only the aligned region of the Subject record.
- To learn more about the above steps, see the article on interpreting pairwise alignments.
- Check the strand for the coding region (CDS) in the Subject record. It will be one of these two:
- Plus strand will display only the nucleotide interval (for example: 115..448 or <1..321)
- Minus strand will include the term “complement” as part of the reported nucleotide interval (for example: complement(99..1160))
- Pair the strand report on the BLAST page with the CDS annotation in the record.
These four combinations can help you determine the coding strand for your sequence (Query):
See Figures 1 and 2 for an illustrated example. The sequence in the example represents the reverse complement of the coding strand.
Figure 1: A pairwise BLAST alignment of a 250 bp Query sequence to the MF398235.1 (Subject) sequence. Query and Subject represent the same strand. BLAST reports it as Plus/Plus Strand (purple rectangle). The GenBank link in the Range row (yellow rectangle) displays the aligned region of the Subject record. The record (Figure 2) shows CDS on the minus (reverse complement) strand. Since the Query and Subject have the same strands, the Query also represents the reverse complement of the coding strand. Check the alignment to see these clues for the minus strand:
-Positions for the amino-acid residues on the Query (blue boxes) drop while those for nucleotides rise.
-The codons read in the opposite direction. For example, the “CAT” triplet (red oval) represents the reverse complement of the methionine codon (M), “ATG”.
-The tilde symbols (~~~~) mark an intron. Its “AC” ending bases (orange oval) represent the reverse complement of “GT”. “GT” indicates the 5’ splice site (the beginning rather than the end) of an intron.
Figure 2: The part of the MF398235.1 (Subject) sequence that aligns with the 250-bp Query sequence from Figure 1. The record shows CDS on the minus (reverse complement) strand (yellow box).