Why would we be able to detect more genetic variation by blasting with nucleotide sequences?

We'll have a blast, I promise! But there's one little thing we need to discuss first...

I want to explain why I'm going to use nucleotide sequences for the blast search. (I used protein the other day). It's not just because someone told me too, there is a solid rational reason for this.

The reason is the redundancy in the genetic code.

Okay, that probably didn't make any sense to those of you who didn't already know the answer. Here it is.

i-39185d84268023fb77b43bbf9dba06c7-standard genetic code.png

 The picture above shows the human genetic code (there are at least 16 variations on this, but that's another story). Each middle cell in the table shows the codons. Those are the groups of three bases in the left most column. Then, reading from left to right, we have the three letter and one letter abbreviations for the amino acids they encode. In every case, except for tryptophan (W), an amino acid can be encoded by multiple codons. (That's what we mean when we say the code is redundant. Oops, I said it again!)

Well, this means that we can have the same amino acid in a protein, but different codons in the mRNA. 

So a protein sequence like this: FLAKEY

Could be encoded by the DNA sequence: TTTCTTGCCAAATAT
                               or the DNA sequenceTTCCTAGCAAAGTAC

These two sequences are only 70% identical but they code for amino acid sequences that are 100% identical.

Thus, you can see more variation at the level of the nucleotides. 

One other thing, you might be wondering why there are T's in these sequences when the virus is made of RNA.  Well, one reason is we usually make a DNA copy of RNA before we do any sequencing.  The other reason, is that we store almost all sequences in the form of a DNA sequence, even when the sequences really did come from RNA. 

More like this

Pim van Meurs has a blog post at The Panda's Thumb about the recent paper on translational selection on a synonymous polymorphic site in a eukaryotic gene (DOI link). He points out that this was predicted in a paper from 1987. In short, the rate of translation depends on the tRNA pool -- amino…
Almost every living thing shares an identical genetic code, with three nucleic acids in an RNA sequence coding for a single amino acid in the translated protein sequence. While there are 64 three-letter RNA sequences, there are only 20 amino acids and degeneracy in the code allows some amino acids…
At the level of biomolecules, life boils down to two basic principles: sequence and folding. We know, for example, that the sequence of nucleotides in the DNA contains our genetic blueprint, but the way that our DNA is folded and wrapped up in each chromosome helps determine which genes are easily…
Living things, from bacteria to humans, depend on a workforce of proteins to carry out essential tasks within their cells. Proteins are chains of amino acids that are strung together according to instructions encoded within that most important of molecules - DNA. The string of "letters" that make…

um could u please post the whole human genetic code because i cant find it and i need it for some research. it would be nice though if they made a chart that shows which part of the human genetic code does which job :/ but that's unlikely it would make work allot easier though.

please reply.

By zach barker (not verified) on 12 Apr 2012 #permalink

Hi Zach,

The entire human genome sequence is way, way too large to post here. It wouldn't be very interesting reading either, since it's all A's, G's, T's, and C's.

There are charts though that do what different parts of the genome do. This book has a lot of information about where genes map in human genome and what they do: http://www.ncbi.nlm.nih.gov/books/NBK22183/

I think it would be a good place to start.