Dan Graur has written a good summary of genetic load. It's an important concept in population genetics, and everyone should be familiar with it…and this is a nice 2½ page summary with only a little math in it.

I'll try to summarize the summary in two paragraphs and even less math … but you should read the whole thing.

Genetic load is the cost of natural selection. You all understand natural selection (my usual problem is trying to explain that there's more to evolution than just selection), and so you know that you can't have selection without imposing a loss of fitness on individuals that lack the trait in question. As it turns out, when you do the math, the only parameter that matters is the mutation rate, µ, and the mean fitness of a population, w, is (1-µ)^{n}, where *n* is the number of loci, or genes, in the genome. What w is, basically, is the cost to the population of carrying suboptimal variants.

Notice that (1-µ) is taken to the *n*^{th} power -- that tells you right away that the number of genes has a significant effect on the cost to a population. As Graur shows by example, using a reasonable estimate of the number of genes and the mutation rate, the human genetic load is easily bearable -- if each couple has about 2½ children, losses due to selection overall will be easily compensated for, and the population size will be stable. But if *n* is significantly greater than 20-30,000 genes, because of that exponent, the cost becomes excessive. If the genome was 80% functional, he estimates we'd each have to have 7 x 10^{45} children just to maintain our current population.

What all this means is that there is an upper bound to the number of genes we can possibly carry, and it happens to be in the neighborhood of the number of genes estimated in the human genome project. We can't have significantly more, or the likelihood of genes breaking down with our current mutation rate would mean that most of our children would be born dead of lethal genetic errors, or the burden of a swarm of small deficits to their fitness.

What Graur doesn't mention is that this is *old news*. The concept was worked out in the 1930s by Haldane; it was dubbed "genetic load" in 1950 by Muller; Dobzhansky and Crow wrote papers on the topic in the 50s and 60s. I learned it as an undergraduate biology student in the 1970s. I have an expectation that more advanced and active researchers in the field will have this concept well in hand, and are completely familiar with it. It's just part of the quantitative foundation of evolutionary biology.

And this is why some of us go all spluttery and cross-eyed at any mention of the ENCODE project. They just blithely postulated orders of magnitude more functioning elements in the genome than could be tolerated by any calculation of the genetic load -- it quickly became clear that these people had no understanding of the foundation of modern evolutionary biology.

It was embarrassing. It was like seeing a grown-up reveal that he didn't know how to use fractions. It's as if NASA engineers plotted a moon launch while forgetting the exponent "2" in F= gm_{1}m_{2}/r^{2}. Oops.

When well-established, 80 year old scientific principles set an upper bound on the number of genes in your data set, and you go sailing off beyond that, at the very least you don't get to just ignore the fact that you're flouting all that science. You'd better be able to explain how you can break that limit.

(via Sandwalk)

- Log in to post comments

The flaw in Dan's reasoning is that ENCODE, in saying that 80% of the human genome is functional isn't saying there are 3 million loci of length 1000bp and a mutation rate per locus of 10^-5. What they're saying is that there are 10,000 loci (just like Dan is) and they still have a mutation rate per locus of 10^-5 but the size of each 'locus' is a 300,000 bp.

I'm a bit confused by your math (or maybe just your terminology). I would expect your quantity µ, "mutation rate," to have units of either inverse time, or inverse time-inverse loci (the latter normalizing across genome size). But you use it in an exponential expression, and in particular in the form (1-µ)^n, which tells me that it must be dimensionless. What is the normalization that makes it so?

Michael Kelsey wrote (#2; January 7, 2015):

> I would expect […] quantity μ, “mutation rate,” to have units of either inverse time […]

> But […] the form (1-µ)^n, which tells me that it must be dimensionless. What is the normalization that makes it so?

In default of any other parameters (which are not apparent in PZ Myers' article, January 6, 2015), the reference duration should be the same as applies to the definition of " the mean fitness of a population, w" itself.

Now, the corresponding Wikipedia page suggests that the applicable reference duration unit is "a single generation".

Accordingly, this quantity $mu; might perhaps more appropriately be called the "specific mutation rate" (of a given species and/or population);

while "rate" as such is rather (or especially in physics) understood as "absolute rate" whose values are commensurate independent of any particular choice of duration unit.

p.s.

$latex \LaTeX$-test:

"$latex \overline{w}$" renders as $ latex \overline{w}$.

p.s.

$latex \LaTeX$-test:

"$latex \overline{w}$" renders as $latex \overline{w}$.

While I agree that ENCODE's numbers are wildly overstated for many reasons, I think that in restricting "functional" to be the same as "conserved", a great deal of baby is being thrown out with the bathwater.

There are plenty of phenotypes that are selectively neutral, and the vast majority of evolution is therefore a matter of random drift. For example, I think you'd find it hard to find evidence for selective constraint on dry vs flaky earwax, or whether earlobes are attached, or minute differences in the shape of ears/eyes/noses.

Genomic regions that affect these phenotypes are not conserved and evolve neutrally, but in my view can validly be regarded as having a function.

Dan's argument from genetic load is of course valid as an upper bound on the amount of the genome that has a function _that affects reproductive fitness_.