Silent Mutations Continue to Speak Up

Pim van Meurs has a blog post at The Panda's Thumb about the recent paper on translational selection on a synonymous polymorphic site in a eukaryotic gene (DOI link). He points out that this was predicted in a paper from 1987. In short, the rate of translation depends on the tRNA pool -- amino acids encoded by more abundant tRNA anti-codons will be incorporated more quickly than amino acids with rare tRNAs. Because protein folding begins during translation, codon usage can influence protein secondary structure. That's because rare codons could stall translation, allowing for protein conformations that would not be possible were a common codon used. What kind of interesting analyses could we perform given this recent finding?

The genetic code is redundant -- most amino acids are encoded by at least two (and as many as six) different tri-nucleotide codons. In order for those nucleotide sequences to be translated into proteins they must first be transcribed into messenger RNA (mRNA). When the mRNA is fed through a ribosome, transfer RNA (tRNA) matching each codon carries the appropriate amino acid to the ribosome, and the amino acid is incorporated into the growing polypeptide chain. There are unique tRNAs for each codon, so different tRNAs may be 'charged' with the same protein amino acid -- this is part of the redundancy of the genetic code.

The different tRNAs encoding the same amino acid may be present at different concentrations within a cell. It has been observed in bacteria and yeast that tRNA abundance is correlated with codon usage in protein coding genes; more abundant tRNAs correspond to more commonly used codons ('major codons'). This observation provides the foundation for all further work on codon usage. In yeast, for example, highly expressed genes tend to use major codons more than genes with lower expression levels. That's because selection for both speedy and more accurate translation is greater in genes that produce more transcripts.

The relationship between codon usage and expression represents an average across all genes in a genome. Are there any exceptions? The abstract from the 1987 paper by Purvis et al makes the following prediction:

We propose that the way in which some proteins fold is affected by the rates at which regions of their polypeptide chains are translated in vivo. Furthermore, we suggest that their gene sequences have evolved to control the rate of translational elongation such that the synthesis of defined portions of their polypeptide chains is separated temporally.

There are multiple ways to "control the rate of translational elongation" (codon usage being one, and mRNA secondary structure being another), but codon usage seems to be the easiest of these to measure on a genome wide scale. In this case we would be interested in intragenic (within genes) variation in codon usage, whereas the relationships between codon usage and gene expression were between genes.

It would be quite interesting to identify sequences that have been under selection for translation rate. If we consider only codon usage, here are some suggestions for identifying genes with signatures of that type of selection:

  • Look for conserved non-major codon usage between taxa. Now that we have many genome sequences from relatively closely related taxa (mammals and Drosophila come to mind, but there are probably others), we can see if any amino acid positions (or gene regions) contain an over abundance of non-major codons that are conserved between species. This would probably have to be performed on genes with high major codon usage. Someone who's crafty with statistics could devise a nifty likelihood algorithm to search for these codons.

  • Codon by codon McDonald-Kreitman analysis. The availability of polymorphism data (human SNP and resequencing data, and assorted sequences from other species) allows us to perform tests for selection that compare polymorphism and divergence in protein coding regions. A similar analysis could be performed on the conserved non-major codons identified as described above. If a non-major codon is selectively favored because of effects on translation rate, we would expect the major codon to be found at low frequencies in natural populations. This analysis would add extra rigor to any claims made about conserved non-major codons.

  • Codon usage in proteins that fold incorrectly in vitro. Purvis et al point out that some proteins fold incorrectly in vitro but are fine when translated in vivo -- they argue that controls on the rate of translation in vivo allow for proper folding. A detailed analysis of codon usage in proteins with different secondary structures depending on whether they are translated in vivo or in vitro may reveal insights into the nature of selection on translation rate.

Do you have any further suggestions for computation analyses that can be performed to study the role of codon usage on translational selection on a genomic scale?

More like this

As we all know, the genetic code is redundant. Within protein coding regions, substitutions at silent sites do not affect the amino acid sequence of the encoded protein. Because of this property, these synonymous substitutions (so-called because they result in the same amino acid) are often used to…
Here is the third BIO101 lecture (from May 08, 2006). Again, I'd appreciate comments on the correctness as well as suggestions for improvement. --------------------------------------------------BIO101 - Bora Zivkovic - Lecture 1 - Part 3 The DNA code DNA is a long double-stranded molecule residing…
Well two weeks ago in Science, two reports came out about yet another species of small RNA ... rasiRNA ... uhm ... piRNA (OK they haven't harmonized their nomenclature yet). So here is a brief review of the types of RNA: - mRNA (messenger RNA). These are the RNAs that encode polypeptide chains. -…
Almost every living thing shares an identical genetic code, with three nucleic acids in an RNA sequence coding for a single amino acid in the translated protein sequence. While there are 64 three-letter RNA sequences, there are only 20 amino acids and degeneracy in the code allows some amino acids…

Error alert: Last sentence of the second paragraph: different tRNAs may be 'charged' with the same amino acid --