If you are submitting sequences of protein-coding genes in Submission Portal GenBank (SP-GenBank) you will mostly be required to annotate the coding region (CDS) on such sequences. GenBank curation staff can't verify your sequences without the correct CDS annotation.
You may get warnings and/or errors on the SP-GenBank Features page if you:
- Entered incorrect genome location earlier at the Sequences page (e.g. entering mitochondrion for a gene located in the nuclear eukaryotic genome)
- Enter incorrect CDS locations
- Select the wrong coding strand (for example the plus strand where it should be minus)
- Select a wrong reading frame (codon_start) for 5’ partial CDS
- Have a sequence with poor quality
You can avoid annotation problems if you analyze your sequences prior to submission with Nucleotide BLAST (blastn). The blastn search result page offers the CDS feature display. This option shows protein translations of coding regions. It can help you find the correct locations, reading frame, and strand for your CDS. In the same analysis, you can check for any sequencing errors.
See these two articles on how to:
- Set up Nucleotide BLAST (blastn) and the CDS feature display
- Interpret pairwise alignments with the CDS feature display
See these articles for blastn methods to determine CDS properties:
- CDS locations: prokaryotic/intronless genes
- CDS locations: eukaryotic genes (intron/exon structure)
- The coding strand (plus or minus)
- The reading frame for translation: for 5’ partial CDS
See these articles for blastn methods to check for sequencing errors that you need to fix before you can properly add CDS annotation:
- Sequence errors: frameshifts in CDS
- Sequence errors: wrong bases in CDS
- Sequence errors: poor quality ends
Try these other BLAST tools if your blastn results are not adequate:
- blastx to check for frameshifts in CDS
- genomic blast for organisms with annotated assemblies
