Looking for Darwin with a Bad Pair of Eyes

Bad tests for natural selection are bad at detecting selection.

Blogging on Peer-Reviewed Research

Austin Hughes has published a fairly critical review of some methods used to detect natural selection in protein coding sequences. His attack on current methods for detecting natural selection is threefold. First, he claims that comparing non-synonymous to synonymous substitutions (see here) does not allow one to differentiate between adaptive evolution and relaxed selective constraint. Second, he argues that comparing polymorphism and divergence of synonymous and non-synonymous sites (see here) does not allow one to differentiate between adaptive and demographic explanations for departures from neutrality. And, third, he declares that both of these approaches are flawed because they assume that selection will act on protein coding sequences by fixing multiple amino acid substitutions since the divergence of the two sequences being compared.

I don’t think you’ll find disagreement from anyone that comparing non-synonymous and synonymous substitutions per site (dN and dS, respectively) is a pretty poor way to detect selection. I’d argue that it’s poor not because it cannot be used to tell the difference between positive selection and relaxed constraint, but because it’s got such low power. Technically, dN>dS is evidence for natural selection, but there are other implementations of the test which compare dN/dS across multiple branches of a phylogeny; the latter approaches would yield false positives if relaxed selective constraint leads to an elevated dN along a particular lineage. And Hughes has a valid point that this test (along with some others) also assume that natural selection will fix multiple amino acid changes — a violation of this assumption makes the test even less powerful.

Not only does Hughes criticize the conservative dN/dS as a threat to yield too many false positives, some of his criticisms of the McDonald-Kreitman (MK) test (which compares dN and dS with synonymous and non-synonymous polymorphisms) don’t lead to the conclusions he’d like for you to believe. I’ll discuss some of those criticisms in a subsequent post, but I’d like to include one of them here. The MK test measures within species polymorphism by counting the number of nucleotide sites that vary within the sample. This does not present a complete picture of nucleotide polymorphism; it’s common to also measure the average differences between all pairs of sequences. Hughes correctly points out that deleterious mutations may be segregating as rare polymorphisms, which would elevate the amount of non-synonymous polymorphism in the data. Rather than leading to incorrect inferences of natural selection, this would actually make the test more conservative because it would take an even greater excess of non-synonymous differences between species to reject the null hypothesis and infer natural selection.

The Achilles heel of Hughes’s article, however, is that he attacks only a subset of the approaches used to detect natural selection. The article does not mention any tests that use polymorphism data, other than the McDonald-Kreitman test. Analyses that look at the site frequency spectrum of DNA sequence polymorphism (see here) or haplotype blocks are able to detect recent selection events even if only a single nucleotide is under selection. By excluding a large swath of tests from his article, Hughes allows himself to attack a straw man of current approaches toward detecting natural selection. Ironically, he devotes a sizable chunk of his article toward defending Kimura’s neutral theory against historical attacks that were based on a misunderstanding of the model.

Hughes concludes that codon based approaches toward detecting selection on DNA sequences are flawed and that we must use new techniques to detect natural selection in non-coding regions. He also includes a fair bit of text defending the importance of transcriptional regulatory regions in adaptive evolution (this part actually offers some solid criticisms of the Hoekstra and Coyne article reviewed here). But if he had performed an adequate survey of techniques that use polymorphism data to detect natural selection, he may have realized that they have the power to identify selection in non-coding regions. Even though it would be cool to be able to use gene expression data to detect natural selection, we’re still missing the appropriate algorithms for such an analysis.

Hughes AL. 2007. Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level. Heredity In press doi:10.1038/sj.hdy.6801031