Gene Expression

Selection on a quantitative trait

i-d4ec806e99dbe1206a032f95f1aa2129-normalRange.jpgOn occasion I’ve decided I’ll quickly review some population genetic concepts. These are really “background assumptions,” but sometimes comments make it clear that they’re not in the “common” background. So to the left you see two normal distributions, assume these are quantitative traits. The x-axis is the trait value, while the y-axis is the frequency of that value within a population. As you might note I’ve labeled the two populations “generation 1″ (g1) and “generation 100″ (g100). The implication is that the two distributions represent the “same” population shifted in time. Obviously the population exhibits evolution. You also note that the highest value in g1 is lower than the median value in g100. How could this happen? One could imagine that mutation introduced new variants into the population which changed the trait value, and /or that random genetic sampling processes shifted the allele frequencies over time.

Though it is theoretically it possible that new mutations and a random walk genetic process could result in this shift, the likelihood is that what you are seeing is selection (e.g., breeding after truncation of the population each generation). Additionally, new mutant variants aren’t even necessary within the genetic system to shift the trait values so much. All you need to know is that the population, though large (so that the distribution doesn’t look too discrete and the approximation to the Gaussian isn’t that coarse), is finite. Finite populations which are modeled by a continuous distribution by their nature don’t realize the full distribution. There is a theoretical value in regards to the frequency of individuals deviated extremely far from the median of the population, but unless the population is incredibly large you’re not likely to encounter those who exhibit trait values above a particular threshold. If you have 1 million individuals in your population and are evaluating them on a quantitative trait where at x deviations the proportion beyond that point represents 0.000001% of the total you obtain an expectation of 0.01 individuals. This is obviously theoretical, “0.01″ individuals doesn’t make sense in the real world. There will be cases where individuals will exist above this threshold, but you would have to look at many populations before you would discover this trait value extant within the population.

Now up to this point I haven’t mentioned genes too much. That’s because quantitative genetics is generally focused on the phenotype, the trait, and not a theory about the underlying genetic architecture. But the shift in the trait value to such an extreme extent is easily comprehensible in terms of changes in allele (gene variants) frequencies. If a quantitative trait is generated by the combined action of innumerable genetic loci of small effect, shifting the allele frequencies at each locus changes the expectation that a particular genetic configuration will be realized. To be more concrete, consider this table:

  gene 1     gene 2     gene 3    gene 4    gene 5     probability of configuration    
g1 0.1 0.1 0.1 0.1 0.1 0.001%
g100              0.9 0.9 0.9 0.9 0.9 59.95%

Allele frequency at generation 1 & generation 5 at five loci   

You see here five genes with proportions for a particular allele (for nerds, assume the architecture is haploid). The allele is obviously different across the genes (structurally, sequence, etc.), but, assume that it has the same effect on the trait. In other words, these five alleles are phenotypically equivalent. If all five alleles exist on these genes then they contribute a total value, x, to the trait (they are additive and independent). The probability that this case is operative is 0.001% in generation 1, and 59.05% in generation 100. And that is how phenotypic values which were not once found within the population at all can become the most frequent at some later generation, extremely low frequency alleles with the same phenotypic effect slowly increase in proportion within the population until their combination crosses the threshold of likelihood toward inevitability.

Take home: Evolution doesn’t work with a preexistent essence. Selected populations are subsets of their parent populations genetically, but the chance in allele frequencies often results in generation of new genotypes. Just as the discrete character of Mendelian inheritance does not necessarily mean that the underlying potential variation is lost, the re-assortment of genes and biased sampling due to selection (and correlation between phenotype and genotype) also results in the fact that the realized variation in terms of traits can explore unknown territory.

Note: Real genetic architectures are not really derived from the central limit theorem. independence, additivity, etc., are idealized assumptions. This is one reason that quantitative traits often exhibit deviations from the Gaussian distribution; for example, the “fat tails.”