The genetic load problem

Dan Graur has written a good summary of genetic load. It's an important concept in population genetics, and everyone should be familiar with it…and this is a nice 2½ page summary with only a little math in it.

I'll try to summarize the summary in two paragraphs and even less math … but you should read the whole thing.

Genetic load is the cost of natural selection. You all understand natural selection (my usual problem is trying to explain that there's more to evolution than just selection), and so you know that you can't have selection without imposing a loss of fitness on individuals that lack the trait in question. As it turns out, when you do the math, the only parameter that matters is the mutation rate, µ, and the mean fitness of a population, w, is (1-µ)n, where n is the number of loci, or genes, in the genome. What w is, basically, is the cost to the population of carrying suboptimal variants.

Notice that (1-µ) is taken to the nth power -- that tells you right away that the number of genes has a significant effect on the cost to a population. As Graur shows by example, using a reasonable estimate of the number of genes and the mutation rate, the human genetic load is easily bearable -- if each couple has about 2½ children, losses due to selection overall will be easily compensated for, and the population size will be stable. But if n is significantly greater than 20-30,000 genes, because of that exponent, the cost becomes excessive. If the genome was 80% functional, he estimates we'd each have to have 7 x 1045 children just to maintain our current population.

What all this means is that there is an upper bound to the number of genes we can possibly carry, and it happens to be in the neighborhood of the number of genes estimated in the human genome project. We can't have significantly more, or the likelihood of genes breaking down with our current mutation rate would mean that most of our children would be born dead of lethal genetic errors, or the burden of a swarm of small deficits to their fitness.

What Graur doesn't mention is that this is old news. The concept was worked out in the 1930s by Haldane; it was dubbed "genetic load" in 1950 by Muller; Dobzhansky and Crow wrote papers on the topic in the 50s and 60s. I learned it as an undergraduate biology student in the 1970s. I have an expectation that more advanced and active researchers in the field will have this concept well in hand, and are completely familiar with it. It's just part of the quantitative foundation of evolutionary biology.

And this is why some of us go all spluttery and cross-eyed at any mention of the ENCODE project. They just blithely postulated orders of magnitude more functioning elements in the genome than could be tolerated by any calculation of the genetic load -- it quickly became clear that these people had no understanding of the foundation of modern evolutionary biology.

It was embarrassing. It was like seeing a grown-up reveal that he didn't know how to use fractions. It's as if NASA engineers plotted a moon launch while forgetting the exponent "2" in F= gm1m2/r2. Oops.

When well-established, 80 year old scientific principles set an upper bound on the number of genes in your data set, and you go sailing off beyond that, at the very least you don't get to just ignore the fact that you're flouting all that science. You'd better be able to explain how you can break that limit.

(via Sandwalk)

More like this

I have a little bit of an infatuation with copy number polymorphism (CNP), which describes the fact that individuals within a population can differ from each other in gene content. Some genes, such as olfactory receptors (ORs), have many different related variants in any animal genome. New copies…
So, let's see what's new in PLoS Genetics, PLoS Computational Biology, PLoS Pathogens and PLoS Neglected Tropical Diseases this week. As always, you should rate the articles, post notes and comments and send trackbacks when you blog about the papers. Here are my own picks for the week - you go and…
I was rather surprised yesterday to see so much negative reaction to my statement that there's more to evolution than selection, and that random, not selective, changes dominate our history. It was in the context of what should be taught in our public schools, and I almost bought the line that we…
Chad Orzel is asking about misconceptions in science that irritate. Evolgen and Afarensis have chimed in. My problem is not an misconception, it is a pet peeve. As I've noted before, random genetic drift is a catchall explanation for everything. I am not saying drift is not powerful, it is the…

The flaw in Dan's reasoning is that ENCODE, in saying that 80% of the human genome is functional isn't saying there are 3 million loci of length 1000bp and a mutation rate per locus of 10^-5. What they're saying is that there are 10,000 loci (just like Dan is) and they still have a mutation rate per locus of 10^-5 but the size of each 'locus' is a 300,000 bp.

I'm a bit confused by your math (or maybe just your terminology). I would expect your quantity µ, "mutation rate," to have units of either inverse time, or inverse time-inverse loci (the latter normalizing across genome size). But you use it in an exponential expression, and in particular in the form (1-µ)^n, which tells me that it must be dimensionless. What is the normalization that makes it so?

By Michael Kelsey (not verified) on 06 Jan 2015 #permalink

Michael Kelsey wrote (#2; January 7, 2015):
> I would expect […] quantity μ, “mutation rate,” to have units of either inverse time […]

> But […] the form (1-µ)^n, which tells me that it must be dimensionless. What is the normalization that makes it so?

In default of any other parameters (which are not apparent in PZ Myers' article, January 6, 2015), the reference duration should be the same as applies to the definition of " the mean fitness of a population, w" itself.

Now, the corresponding Wikipedia page suggests that the applicable reference duration unit is "a single generation".

Accordingly, this quantity $mu; might perhaps more appropriately be called the "specific mutation rate" (of a given species and/or population);
while "rate" as such is rather (or especially in physics) understood as "absolute rate" whose values are commensurate independent of any particular choice of duration unit.

$latex \LaTeX$-test:

"$latex \overline{w}$" renders as $ latex \overline{w}$.

By Frank Wappler (not verified) on 07 Jan 2015 #permalink

$latex \LaTeX$-test:

"$latex \overline{w}$" renders as $latex \overline{w}$.

By Frank Wappler (not verified) on 07 Jan 2015 #permalink

While I agree that ENCODE's numbers are wildly overstated for many reasons, I think that in restricting "functional" to be the same as "conserved", a great deal of baby is being thrown out with the bathwater.

There are plenty of phenotypes that are selectively neutral, and the vast majority of evolution is therefore a matter of random drift. For example, I think you'd find it hard to find evidence for selective constraint on dry vs flaky earwax, or whether earlobes are attached, or minute differences in the shape of ears/eyes/noses.

Genomic regions that affect these phenotypes are not conserved and evolve neutrally, but in my view can validly be regarded as having a function.

Dan's argument from genetic load is of course valid as an upper bound on the amount of the genome that has a function _that affects reproductive fitness_.

By Peter Ellis (not verified) on 07 Jan 2015 #permalink