Polymorphism and Divergence
This is the eighth of multiple postings I plan to write about detecting natural selection using molecular data (ie, DNA sequences). The introduction can be found here. The first post described the organization of the genome, and the second described the organization of genes. The third post described codon based models for detecting selection, and the fourth detailed how relative rates can be used to detect changes in selective pressure. The fifth post dealt with classical population genetics methods for detecting selection using allele and genotype frequencies. The sixth post described how to calculate nucleotide sequence polymorphism, and the seventh explained how we can use measures of polymorphism to detect signatures of selection. In this entry we'll review how polymorphism and divergence can be used to detect natural selection (more below the fold).
We have previously discussed how nucleotide divergence between species and polymorphism within populations can be used to detect natural selection on DNA sequences. This entry will detail how divergence can be used to estimate the expected polymorphism at a locus (and vice-versa) if it is evolving according to a neutral model. The approaches I will describe here are based on contingency tables, in this case 2x2 tables. I will describe two approaches, one that is designed for examining two separate loci, and another that is designed for two types of sites within a single region.
Assuming the selective constraint on a particular sequence has been constant since the divergence of two species, the amount of divergence between the two species at that locus should be a good predictor of the amount of polymorphism at the locus. As we discussed previously, positive selection is expected to decrease the amount of polymorphism, while balancing selection will elevate the amount of polymorphism. We will use a locus that we think is not under positive or balancing selection to determine the expected relationship between polymorphism and divergence. The amount of polymorphism and divergence at that locus will be compared to that at another locus to determine whether the polymorphism at the second locus is on par with that expected under neutrality. An example of such data is shown below.
|Sequence 1||Sequence 2|
Number of polymorphic sites (S) and divergent sites along with the number of nucleotide sites compared for two D. melanogaster loci. Data taken from Hudson et al (1987).
We can then compare the polymorphism and divergence at the two loci to determine if the locus we are interested in (Sequence 2 in the table above) is evolving in a similar manner as the control locus (Sequence 1). If there is a deficiency of polymorphism in Sequence 2, we have evidence for a recent selective sweep within or near this region. The data shown above indicate excess polymorphism in Sequence 2, consistent with balancing selection maintaining polymorphism at this locus. This polymorphism data comes from the Drosophila melanogaster alcohol dehydrogenase (Adh) gene, and the divergence estimates come from comparisons with the D. sechellia Adh gene. Sequence 1 is the region flanking the Adh gene, and Sequence 2 are the synonymous sites and introns from the Adh gene. The D. melanogaster Adh has two different allozyme alleles, and the pattern of nucleotide polymorphism at the locus is consistent with balancing selection maintaining the two alleles.
The other test I will describe involves examining the polymorphism and divergence within a single coding region. Recall that we can divide a protein coding sequence into synonymous and non-synonymous sites, and we can use the divergence between two sequences at those two classes of sites to infer selection. These types of tests lack power to detect positive selection because only a small fraction of sites will be under selection, and the signal of selection will be drowned out by all of the other sites. By incorporating polymorphism we can get an idea of the amount of divergence at synonymous and non-synonymous sites expected under neutrality (or, the amount of divergence can give us an idea of the expected amount of polymorphism).
Number of polymorphic and divergent synonymous and non-synonymous sites along for the Drosophila Adh locus. Data taken from McDonald and Kreitman (1991).
The data shown above are also from the Drosophila Adh locus (a sort of model locus for molecular evolution and population genetics). In this example, we see an excess of divergent non-synonymous sites (relative to non-synonymous polymorphism) suggesting that natural selection has fixed beneficial non-synonymous mutations since these two species diverged. If we had attempted to detect selection using only divergence we would see that there are fewer non-synonymous differences than synonymous differences, providing no evidence for directional selection. Only when we incorporate polymorphism data can we determine that natural selection has fixed beneficial non-synonymous mutations.
This marks the end of the detecting natural selection series. I would say something deep and meaningful at this point, but there isn't really anything else to say. So, goodbye, I guess.
Hudson, RR, M Kreitman, and M Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116: 153-159.
McDonald, JH, M Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652-654.