'Misunderestimating' Natural Selection

By mikethemadbiologist on April 9, 2007.

From the archives, here's something about how we might be underestimating the strength of natural selection when we look at molecular data:

PZ Myers has a superb summary of a very interesting PLoS paper. In the paper, the authors identify those genes that have experienced strong selection, and thus might be responsible for the chimpanzee-human divergence (PZ Myers has a great summary):

With all the data available from the human genome project and the ongoing chimpanzee genome project, we can start comparing DNA sequences. One parameter that can be assayed is the frequency of synonymous changes in the DNA: these are changes in the nucleotide sequence that produce synonyms in the triplet code, and therefore cause no changes at all in the protein sequence. These changes represent a kind of steady background noise, the rate of random, neutral changes in the genome. Non-synonymous changes, on the other hand, do change the amino acid sequence of the resulting protein, and are presumed to be more likely to have some kind of effect on the phenotype. The ratio of nonsynonymous to synonymous nucleotide changes within a gene, d_N/d_S, is a measure of the history of selection for change in that gene. High d_N/d_S values mean there has been selection pressure for novel forms, while low d_N/d_S values mean selection has been working to conserve the sequence.

So here's the analysis: go through the list of human genes, find each one's homolog in the chimpanzee, compute the d_N/d_S ratio, and rank them in order. What you end up with is a list, with the genes that have experienced the strongest selection for new properties between the two species at the top. Note that you can't tell which of the two species has changed the most from their common ancestor from this analysis (although comparison with an outgroup can help with that), so all we know is which genes have diverged the most.

Here's my problem with the article: this method will miss many, many genes. In other words, many 'important' genes will be missed. Now, this isn't the authors' fault: to paraphrase Rumsfeld, sometimes you have to analyze the genomes you have, not the genomes you wish you had. Note the plural genomes. But I'm getting ahead of myself.

Imagine a gene 300 amino acids long (that's 900 base pairs of DNA; every three bases codes for one amino acid or codon). In many genes, most of the non-synonymous substitutions will be deleterious (dN/dS at that codon will be very close to zero), some will be neutral (dN/dS = 1), and a few will be beneficial (dN/dS > 1). If you average across the gene, the ratio of dN/dS will be much lower than 1. However, this doesn't mean that the gene isn't evolutionarily important: the few beneficial non-synonymous substitutions could be doing evolutionary backflips (dN/dS >> 1), and a gene-wide summary statistic still won't detect selection at this genes because you average dN/dS across all sites.

I'm not arguing a hypothetical case here. I'm currently in the process of submitting a manuscript about a gene in E. coli involved in the ecological divergence between 'harmless' E. coli and those involved in urinary tract infections. In this gene, about 2% of the amino acids appear to have a dN/dS ratio > 1.0, and in almost all of the other amino acids, amino acid substitutions are deleterious (dN/dS ~ 0.1). This gene has a gene-wide dN/dS ratio ~ 0.07, yet we know from functional and experimental studies that this gene is vital in the ecological divergence between the harmless and pathogenic forms. The 'PLoS' ranking system would most likely miss this gene.

Now, if your eyes haven't completely glazed over at this point, you're wondering, "How the hell does he know what's happening at each codon?" Simple. I'm the Mad Biologist. Never, ever doubt the Mad Biologist.

Seriously, there is a method known as the codon substitution method (for the technical details and paper, click here). Essentially, this method allows you to examine the dN/dS ratio for each amino acid, as opposed to the whole gene. I won't get into the technical details here, but what this method would require for the chimp-human analysis is lots of human and chimp genomes (at least ten of each, although two of each is the bare minimum and not very reliable). This is why I said earlier that you analyze the genomes you have, not the genomes you wish you had.

The punchline is that while this is a very interesting paper, I think we might be missing a lot of evolutionarily important genes simply because many, though not all, non-synonymous changes in these 'missed' genes are removed by natural selection. Instead, the PLoS method will be biased towards genes whose amino acid structure can tolerate a lot of change without a degredation of function. What this means is that there might be even more genes that are responsible for the chimp-human divide. That's pretty cool.

Note to creationists: If I catch a single one of you using this post to somehow try to 'undermine' the theory of natural selection, I'm going to flame your lame ass. The whole damn point of this post is that we might be underestimating the power of natural selection. In science, as opposed to crackpot theology, we use deduction and induction. Sometimes, in the face of incomplete evidence, we disagree over the particulars.

More like this

Well, as a chemist who briefly played around in the protein folding area some 15 years ago during my postdoc, I'm surprised at how low the ratios are. Aside from active site residues, isn't the main purpose of a lot of amino acids just to make the protein fold up right? And can't you do a lot of "conservative" changes in amino acids without messing up the basic fold? Leucine for isoleucine? Tyrosine for phenylalanine, that sort of thing?

If you have different dN/dS ratios, might that just be telling you something about where the amino acids are in the fold? Tightly packed hydrophobic core--low ratio; hydrophilic surface--high ratio. And these might be just next to each other in an alpha helix.

...and now I'm wondering if the codon-by-codon analysis may be a way of predicting folding of uncharacterized proteins, rather than a way of making bold statements about gene selection pressure....

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…