Pim van Meurs has a blog post at The Panda’s Thumb about the recent paper on translational selection on a synonymous polymorphic site in a eukaryotic gene (DOI link). He points out that this was predicted in a paper from 1987. In short, the rate of translation depends on the tRNA pool — amino acids encoded by more abundant tRNA anti-codons will be incorporated more quickly than amino acids with rare tRNAs. Because protein folding begins during translation, codon usage can influence protein secondary structure. That’s because rare codons could stall translation, allowing for protein conformations that would not be possible were a common codon used. What kind of interesting analyses could we perform given this recent finding?
The genetic code is redundant — most amino acids are encoded by at least two (and as many as six) different tri-nucleotide codons. In order for those nucleotide sequences to be translated into proteins they must first be transcribed into messenger RNA (mRNA). When the mRNA is fed through a ribosome, transfer RNA (tRNA) matching each codon carries the appropriate amino acid to the ribosome, and the amino acid is incorporated into the growing polypeptide chain. There are unique tRNAs for each codon, so different tRNAs may be ‘charged’ with the same
protein amino acid — this is part of the redundancy of the genetic code.
The different tRNAs encoding the same amino acid may be present at different concentrations within a cell. It has been observed in bacteria and yeast that tRNA abundance is correlated with codon usage in protein coding genes; more abundant tRNAs correspond to more commonly used codons (‘major codons’). This observation provides the foundation for all further work on codon usage. In yeast, for example, highly expressed genes tend to use major codons more than genes with lower expression levels. That’s because selection for both speedy and more accurate translation is greater in genes that produce more transcripts.
The relationship between codon usage and expression represents an average across all genes in a genome. Are there any exceptions? The abstract from the 1987 paper by Purvis et al makes the following prediction:
We propose that the way in which some proteins fold is affected by the rates at which regions of their polypeptide chains are translated in vivo. Furthermore, we suggest that their gene sequences have evolved to control the rate of translational elongation such that the synthesis of defined portions of their polypeptide chains is separated temporally.
There are multiple ways to “control the rate of translational elongation” (codon usage being one, and mRNA secondary structure being another), but codon usage seems to be the easiest of these to measure on a genome wide scale. In this case we would be interested in intragenic (within genes) variation in codon usage, whereas the relationships between codon usage and gene expression were between genes.
It would be quite interesting to identify sequences that have been under selection for translation rate. If we consider only codon usage, here are some suggestions for identifying genes with signatures of that type of selection:
Look for conserved non-major codon usage between taxa. Now that we have many genome sequences from relatively closely related taxa (mammals and Drosophila come to mind, but there are probably others), we can see if any amino acid positions (or gene regions) contain an over abundance of non-major codons that are conserved between species. This would probably have to be performed on genes with high major codon usage. Someone who’s crafty with statistics could devise a nifty likelihood algorithm to search for these codons.
Codon by codon McDonald-Kreitman analysis. The availability of polymorphism data (human SNP and resequencing data, and assorted sequences from other species) allows us to perform tests for selection that compare polymorphism and divergence in protein coding regions. A similar analysis could be performed on the conserved non-major codons identified as described above. If a non-major codon is selectively favored because of effects on translation rate, we would expect the major codon to be found at low frequencies in natural populations. This analysis would add extra rigor to any claims made about conserved non-major codons.
Codon usage in proteins that fold incorrectly in vitro. Purvis et al point out that some proteins fold incorrectly in vitro but are fine when translated in vivo — they argue that controls on the rate of translation in vivo allow for proper folding. A detailed analysis of codon usage in proteins with different secondary structures depending on whether they are translated in vivo or in vitro may reveal insights into the nature of selection on translation rate.
Do you have any further suggestions for computation analyses that can be performed to study the role of codon usage on translational selection on a genomic scale?