Breeding the breeder's equation

I've talked about "the breeder's equation," R = h2S, before.

R = response
S = selection differential
h2 = narrow sense heritability

For example, if you have a population where the mean phenotypic value is 100, and you select a subpopulation with a mean value of 125 to breed the next generation, and the heritability is 0.50, then:

R = 0.50 * (125 - 100) = 12.5

In other words, the response to selection in this case where the differential is 25 units in the parental generation would be 12.5 units in the offspring with respect to the original population. This is because the "narrow sense heritability," the proportion of phenotypic variation attributable to the variation in the additive effects of genes controls only 1/2 of the variation. In other words, there are other components of phenotypic variance, such as environmental variance, which may not be heritable from parent to offspring, and so there is a regression toward the population mean.

This is all good & well, just like using the ideal gas law, pV = nRT, is often very useful. But what does this tell us fundamentally about the nature of the phenomenon in question? The breeder's equation has traditionally been part of the toolkit of the applied geneticist for whom results matter, and the underlying biological principles are secondary. On the other hand, an evolutionary biologist would be interested in exactly how the genetical process plays out from first principles, because the processes are simply an ends toward understanding fundamental aspects of nature. The breeder's equation comes out of applied quantitative genetics, so in a sense it is "gene blind," it simply describes and projects evolutionary processes without any exploration of a deeper framework. But, we can obtain the breeder's equation, and ergo quantitative genetical processes, from population genetic theory. Nature is one.

First, let's use a simple "counting" example. Imagine two loci, 1 & 2, with alleles in flavors A and a, and B and b, respectively. Let's assume that the A & B contribute 1 unit of a trait, while a & b contribute 0 units. Additionally, the phenotypic outcomes are additive & independent across and within loci. By example:

AABB = 4 units
AaBb = 2 units
AAbb = 2 units
aaBB = 2 units
Aabb = 1 units
aaBb = 1 units
aabb = 0 units

Assume that the frequency of the two alleles are the same, p (since the loci are diallelic, p = 0.5 obviously). Within the population 50% of the alleles are A & B, and another 50% are a & b, at each locus, 1 & 2. We can construct the following table of genotype frequencies that segregate within the population assuming p = 0.5 for A & a and B & b:

Genotype Frequency Phenotype - additive Phenotype - dominance
AABB 1/16 4 2
AABb 2/16 3 2
AAbb 1/16 2 1
AaBB 2/16 3 2
AaBb 4/16 2 2
Aabb 2/16 1 1
aaBB 1/16 2 1
aaBb 2/16 1 1
aabb 1/16 0 0

I've included dominance because we'll discuss this in a bit. In any case, intuitively the proportions should make sense. There is only one way you can obtain a genotype of all "upper case" or "lower case" alleles, so you just multiply the expectations across each slot, 1/24, and you get 1/16. In contrast, there are multiple ways that you can obtain an AaBb genotype. AaBb, aABb, aAbB and AabB are equivalent, so you obtain 4 out of the 16 genotypic conformations.

Now, let's assume that we are selecting a parental population from a subset of the overall population. Genotypes selected are as follows in the proportions given:

Aabb = 2/5
aaBb = 2/5
aabb = 1/5

This obviously is not representative of the overall population, and because of what we know about the genetic architecture we should expect that this will result in a different phenotypic mean within the offspring. In the larger population the mean would be 2 (you can confirm this numerically from the table above, or, just note that it is the mean derived from the binomial distribution). In the parent generation the mean is:

(2/5)(1) + (2/5)(1) + (1/5)(0) = 4/5

We simply took the proportions, used them as weights for the phenotypic values for the genotypes. Since "lower case" alleles are disproportionately represented in the selected population intuitively we should have expected a value lower than the overall population mean. Let's stipulate that the trait is totally genetically controlled, then the outcome in the offspring will presumably be the same in regards to the mean phenotype as in the parents (expectation), 4/5. Going back to the breeder's equation, we substitute:

(4/5 - 2 ) = h2(4/5 - 2) or
h2 = 1.0

The selected parental phenotype & the offspring phenotype both have a mean value of 4/5, so we remove 2, the original population mean. By basic algebra that means that the narrow sense heritability is 1.0. The way I framed the genetic architecture as additive and independent means that all the variation is due to additive genetic variance, so we should expect that the heritability in the narrow sense should account for the full phenotype range.

So let's move on to the case of dominance. The frequencies & phenotypic outcomes are in the table above. Note now that though the loci are independent across each other in terms of effect, they are not additive in the intralocus context. That is, the "upper case" alleles are dominant so that AA & Aa are phenotypically identical, ceteris paribus. The mean value for the distribution of phenotypic values obtained from the genotypes in the case of dominance is 1.5. You can confirm this numerically, or simply derive it from the expression 2p(1 + q), where q = 1 - p, and in this case q = p (both being 0.5). Now let us select from this overall population parents of the proportion:

AAbb = 1/7
Aabb = 2/7
aaBB = 1/7
aaBb = 2/7
aabb = 1/7

As above, we obtain a mean phenotype of the select parents:

(1/7)(1) + (2/7)(1) + (1/7)(1) + (2/7)(1) + (1/7)(0) = 6/7

Again, this is expected, we selected from the two lowest phenotypic values. Now, what is the mean of the offspring? Recall that I stated above that 2p(1 + q) = μ, where with a mean value of 110 is the mean value. We substitute the allelic proportions from the above, collecting the A & B as like categories:

2(2/7)(1 + 5/7) = 48/49

Now, back into the breeder's equation:

(48/49 - 1.5 ) = h2(6/7 - 1.5) or
h2 = 0.81

Note: the trait is totally genetically controlled, but the heritability in the narrow sense is now less than 100%! Why? We are selecting phenotypic values, not genotypic ones. In the case of the additive inheritance there is a perfect 1:1 linear relation between genotype and phenotype. Not so with dominance, e.g., AaBb & AABB are phenotypically equivalent. Just as in a case where environmental variance can account for some of the exceptional phenotypes selected, dominance results in a imperfect transmission of phenotypes from parent to offspring because of the resegregation of homozygotes in subsequent generations. By example, two populations of pure heterozygotes & homozogyotes where dominance is operative might have the exact same mean phenotypic value in the current generation, but in the subsequent generation the recessive offspring of the heterozygotes will shift the phenotypic distribution even if the attribute is totally genetically controlled.

i-3a2d0d492e48ed333a55318f491c5e97-quanttrunc.jpgThe illustration above was predicated on discrete counting methodology. This being a post about quantitative genetics, I'm now going to throw in a basic definition in regards to the normal distribution and how it relates to truncation selection. The details are pretty irrelevant so I'm going to gloss over it and just define the parameters and throw out an expression which we'll use later. So to the left you see a normal distribution curve which models a continuous trait which emerges via the central limit theorem from the combined action of innumerable "random variables," whether they be environmental or genetic (or a combination thereof).

T = the truncation value, below which all phenotypic values are discarded from the parents of the subsequent generation
Z = the height of distribution at T
B = the area under the distribution being selected (i.e., the integral from T to ∞, so if B = 0.35, that means 35% of the population is selected, and that 35% of the population exhibits a phenotypic value above T)

You should also note deviations from the mean defined by "a," which indicates the effect of a substitution of one gene, and "d," which represents the dominance at this locus. If d = 0, there is no dominance, while if d = 1, full dominance is operative (dominance = d/a). To illustrate, if you have a locus, 1, where the effect is additive, then AA = mean + a, Aa = mean + d and aa = mean - a. The important point is that we're mapping genetic effects upon the normal distribution with these parameters a & d, and they'll come in handy later. Finally, should also keep in mind the expression:

s - μ)/σ2 = Z/B


μs = mean of the selected population
μ = mean of the population
σ2 = variance of the population

With that under our belt we shift back to the world of p's & q's, population genetics. In a previous post I outlined how changes in allele frequency can be modeled both via population & quantitative genetics, and now we'll use these formalisms to show how they connect to the breeder's equation. Here are expressions which represent differences in fitness between the genotypes above:

(fitness AA - fitness Aa) ~ Z(a - d)
(fitness Aa - fitness aa) ~ Z(a + d)
average fitness = B (B is the proportion saved for breeding)

Δp = pq(p(fitness AA - fitness Aa) + q(fitness Aa - fitness aa) )/(average fitness)

substituting from above
Δp =pq(pZ(a - d) + qZ(a + d))/B

since p + q = 1
Δp = (Z/B)pq(a + (q - p)d) or one can substitute 1 - p for q

Where there is no dominance you note that the formalism above reduces to (Z/B)pqa. With dominance the ratio of p & q is very importance since that determines the proportion of homozygote recessives which will segregate out per the Hardy-Weinberg equation. Now, the mean phenotype of a given generation can be obtained:

μ' = (p + Δp)2* + a) + 2(p + Δp)(q - Δp)(μ* + d) + (q - Δp)2* - a)

The change in mean phenotype here is simply assumed to track the change in allele frequencies within a population in Hardy-Weinberg Equilibrium.

Multiplying and removing Δp2 because it is so small
μ' = μ + 2(a + (q - p)d)Δp

Here we have the change in mean phenotype as a function of the change in allele frequencies (presumably being driven by selection) and the additive & dominance effects. If substitution of the allele has a negative additive effect one can see here that the mean phenotype value decreases, while if the effect is positive the phenotypic value also increases. Similarly, we see again that if dominance is operative the proportion of p & q is highly significant as the recessive phenotype is a squared function of the proportion of one of the alleles (i.e., by convention generally q2). The expression above can now bring us to the breeder's equation. I'll jump through the steps briefly

Move the mean value to the left side
μ' - μ = 2(a + (q - p)d)Δp

Substitute Δp for the expression with Z & B
μ' - μ = (Z/B)2pq(a + (q - p)d)2

Recall that (μs - μ)/σ2 = Z/B, so
μ' - μ = (μs - μ)2pq(a + (q - p)d)22

Now, what are the definitions for R & S?
R = μ' - μ (response to selection)
S = μs - μ (selection differential)

R = S2pq(a + (q - p)d)22

So now you have R, S and 2pq(a + (q - p)d)22. The original breeder's equation is R = h2S, so
h2 = 2pq(a + (q - p)d)22

Bingo, we've defined h2 with concrete genetic parameters, p & q (the proportions of the alleles) and a & d (additive and dominance effects)! Finally, since h2 is used in the context of continuous quantitative traits which vary because of the combined effects of loci of small effect, we need to add a summation:
(Σ 2pq(a + (q - p)d)2)/σ2

And, since we know that h2 = (additive genetic variance)/(total variance)
additive genetic variance = Σ 2pq(a + (q - p)d)2

Take a breath. If you've made it this far you might wonder why I posted this in the first place. Two reasons:

a) putting this post together meant I had to go over this topic in more detail
b) we see now that population and quantitative genetics are intimately connected, that Mendelian principles can be bridged with the utilitarian methods of applied quantitative genetics

The breeder's equation is a simple formalism. All you need is a heritability and a selection differential and you can generate a prediction of response. But being able to use an algorithm does not imply that one understands the underpinning of that algorithm. A normal distribution is an abstraction which only roughly maps onto continuous quantitative traits which emerge from the combined effects of discrete loci. Analyzing the continuous probability distributions gets us only so far in understanding genetics, because genetics is predicated on biological realities. The breeder's equation is proximately useful, but to explore its boundaries and limitations we need to comprehend how it relates to Mendelian genetics. To understand the nature of evolutionary dynamics we need to understand how quantitative traits are shaped by selection and how they respond to selection. To understand the variation which selection must utilize we must understand the nature of Mendelian genetics. And so on. Science is a contingent system, but that contingency is not based on faith, it is based on induction and deep tedious analysis. In the future I plan to explore evolutionary quantitative genetics, and some comprehension of this post is a necessary precondition.

Note: This post closely follows the treatment in Principles of Population Genetics.


More like this

One issue that has cropped up in the comments a few times here is a conflation between quantitative & population genetics. Though people seem to think they're interchangeable terms, they're distinct fields. That's why population genetics text books have chapters devoted specifically to…
I have stated before that additive genetic variance is the relevant component of variance when modeling the response to selection in relation to a quantitative trait. In other words: Response = (additive genetic variance)/(total phenotypic variance) X Selection Consider height, which is about 80%…
On occasion I've decided I'll quickly review some population genetic concepts. These are really "background assumptions," but sometimes comments make it clear that they're not in the "common" background. So to the left you see two normal distributions, assume these are quantitative traits. The x-…
Jake at Pure Pedantry has a lengthy post on heritability. It makes concrete (using real psychological illneses, etc.) some of my points in my previous post where I discuss the complexity of behavioral genetics. Two issues of note. First, Jake used the example of Huntington's Disease as "100%…

Excellent post.

By Jongpil Yun (not verified) on 02 May 2007 #permalink