Selection in structured populations

By razib on January 3, 2008.

Evolutionary genetics is subject to parameters; forces which pull and push and shape the nature of dynamic processes over time and space. Population size, mtuation rate, migration, selection. etc., these are all parameters we have to keep in mind when attempting to analyze the nature of evolutionary dynamics. The acceleration paper was predicated on consequences of changing on parameter, population size, upon other parameters such as selection, drift and the number of mutations. Of course I have wondered about the nature of population substructure and what it means for our species. My thinking has been influenced by these maps, which show sharp geographic discontinuities in the frequencies of recently selected alleles. Quotes like this make me wonder: "As long as s m, the new allele will migrate out of its original deme before it reaches fixation there," where s = selection coefficient, and m is the migration rate (specifically, the probability that an individual in generation t in deme x was a member of deme !x during generation t - 1). Selection coefficients on the order of 0.1 are enormous, those on the other of 0.01 are substantial. A migration probability for an individual on the order of 0.1 within a deme also seems very high. My own intuition though is that the distribution of m exhibits both a greater mean and variance than s.

Fixation Probability and Time in Subdivided Populations is the first of several papers I've been reading to obtain a greater insight on population substructure and how it might have played out in the evolutionary history of our species. From the abstract:

...Population structure changes the effectivesize of the species, often strongly downward; smaller effectivesize increases the probability of fixing deleterious allelesand decreases the probability of fixing beneficial alleles.On the other hand, population structure causes an increase inthe homozygosity of alleles, which increases the probabilityof fixing beneficial alleles but somewhat decreases the probabilityof fixing deleterious alleles. The probability of fixing newbeneficial alleles can be simply described by 2hs(1 - F_ST)N_e/N_tot, where hs is the change in fitness of heterozygotes relativeto the ancestral homozygote, F_ST is a weighted version of Wright'smeasure of population subdivision, and N_e and N_tot are the effectiveand census sizes, respectively. These results are verified bysimulation for a broad range of population structures, includingthe island model, the stepping-stone model, and a model withextinction and recolonization.

This is a technical paper, with diffusion equations and integration and simulations of stepping-stone and island models. I'll elide over the details, but there are a few general issues that need to be noted. The work in this paper extends and adds granularity to the famous 2s which I had spoken of before, the probability of fixation of a selectively advantageous allele within a new population. In short, if an allele confers a selection coefficience of 0.1, 10% increased fitness above the population mean, then it has a 0.2 probability of fixation. This is within a very large, operationally infinite, population. Why only 0.2 when the allele is favored? Stochastic factors are strong when an allele is at low frequencies, basically present in only a few copies. There is reproductive variance for any given individual (usually assumed to be Poisson distributed), and that variance is not perfectly correlated with an idealized genetic fitness. If a random "Act of God" exterminates a clutch with a very beneficial allele, then so be it.

In terms of the formalism, N_e & N_tot are the effective and total population. Effective basically refers to the fact that not all individuals contribute to the next generation, or contribute to the same extent. Random reproductive variance will always result in a lower N_e than N_tot. F_ST is basically a measure of the between and within population genetic variation. If most of the variation is partitioned between populations then F_ST is high, and approaches 1, but if most of it is extant within populations, it approaches 0 (F_ST across "races" is famous on the order of 0.15, so that 85% of the variance on a single locus is present within a race). h is measuring the extent of dominance, if the heterozygote is in between the two homozygotes it is 1/2, while if there is perfect dominant it is 1.

There are a few general results from the analyses and simulations within this paper. A great deal of substructure can result in a reduced power of selection to drive beneficial alleles to fixation within and across demes. It can also result in the fixation of deleterious alleles due to drift. Finally, it can also favor the fixation of recessive alleles which are beneficial as well, because they are more likely to be expressed as homozygotes within a small relatively inbred population (drift kicks their frequency up just high enough so that many more copies are expressed to selection).

Here's a figure from the paper. By "hard selection," the authors mean that the nature of the genotypes within the deme are relevant toward their replication in the next generation ("soft selection" seems to refer more to events which result in variant success across demes uncorrelated with their genotypes, so differences in allele frequencies are driven simply by initial differences across demes, captures by F_ST). Note that the selection coefficient, 0.001, is rather modest. You see that as the rate of migration increases the probability of fixation quickly converges upon 0.002, what one would predict from 2s. This is simply because migration increases the effective population size as the demes are being connected together into a larger breeding metapopulation.

This is from figure 5, and it shows the trend for a recessive allele which is only expressed as a homozygote, with an s of 0.002. Note that as the migration rate increases it becomes less and less able to manifest its advantange. This is because the effective population size is increasing and within a Hard-Weinberg Equilibrium its value as q² is getting smaller and smaller.

The whole paper is open access, so I encourage you to read it. At this point, the question I'm asking myself is this: effective population seems to have been increasing over time, but how has population substructure changed, if at all, over human history? Would small isolated groups such as the Andaman Islanders exhibit a non-trivial number of beneficial recessively expressed alleles because of their insulation from migration? As history progressed and migration increased have alleles increased in their chance of fixation in part because of the breakdown of population structure? Was the 2s limit reached relatively early on in the course of our history so that as R.A. Fisher might contend we can ignore substructure? And how might Shifting Balance dynamics play into this?

Reference: Fixation Probability and Time in Subdivided Populations, Michael C. Whitlock, Genetics 164: 767-779 ( June 2003)

More like this

A high rate of beneficial mutations may generate many simultaneous, beneficial mutations on the same type chromosome in a population. Recombination together with selection should produce "super fit" chromosomes that have accumulated multiple beneficial alleles. Such "super fit" chromosomes should spread beneficial alleles faster than predicted by "single mutation" models.

From your description this paper only looks at a single mutation model. (Or it assumes mutations are inherited independently.) How would the simulation results change if multiple mutations with recombination were included? Does anyone do that kind of simulation?

"Was the 2s limit reached relatively early on in the course of our history so that as R.A. Fisher might contend we can ignore substructure?"

My guess is that accelerated adaptation together with recombination means that substructure is important. With high levels of selection activity, population substructure and migration should be more important.

Also, it is not clear to me how important fixation time is when looking at a highly dynamic genome. More interesting is the probability of a beneficial allele attaining a moderate frequency in a sub population. At that point recombination and migration become important.

I wonder if our ability to track "single gene" or "few gene" traits distorts our perception of how most human adaptation occurs. A "single gene" trait that provides a significant fitness advantage would appear as a "super wave" spreading across a lake of chaotic ripples. In such cases a "single mutation" simulation might be pretty good. However, many traits will depend on hundreds or thousands of small beneficial mutations. In such cases the "single mutation" model will be inadequate.

Would a population that was expanding and subdividing have faster rates of fixation (within those subdivisions) for a given selection coefficient? Furthermore, wouldn't estimates of selection coefficients overshoot the real numbers unless that process of subdividing was taken into account?

Recombination together with selection should produce "super fit" chromosomes that have accumulated multiple beneficial alleles.

do you think the rate of recombination is high enough? you're basically talking about a supergene, right?

How would the simulation results change if multiple mutations with recombination were included? Does anyone do that kind of simulation?

i don't know yet. i'm digging through the lit right now.

this is plausible. i'll be posting stuff on how selection changes based on structure soon...a lot of it is pretty theoretical.

However, many traits will depend on hundreds or thousands of small beneficial mutations. In such cases the "single mutation" model will be inadequate.

i think we're getting into shifting balance territory; in which case substructure is of the essence.

Would a population that was expanding and subdividing have faster rates of fixation (within those subdivisions) for a given selection coefficient?

i assume that the average will be the same. in fact, subdivision will increase the power of drift and so reduce the rate of fixation (though i the divisions are large enough this might not matter). to get at what i'm thinking here, assuming you have 10 large demes which are sampled from an even larger deme. assuming that the 10 demes are pretty good representations of the larger deme (if their sample size is big, then variance shouldn't be an issue) then they'll all have the same fixation probability. in some demes the allele will go extinct, in others it will fix, just like with drift i assume that the expectation for the number which will fix or not fix is proportional to the probability of fixation.

"do you think the rate of recombination is high enough? you're basically talking about a supergene, right?"

Yes, supergene. "Super chromosome" is misleading since each chromosome will have multiple cross-over events during gamete formation. Loci that are widely separated on a chromosome are essentially inherited independently. The loci should be far enough apart that they will likely be united by an occasional rare recombination event but not so far apart that there is a high probability that common recombination events will separate the loci.

I'm guessing that the recombination rate is high enough to generate high fitness supergenes (when many simultaneous mutations with modest benefit are present in the population) and low enough that the supergenes will be fairly long. (Long supergenes would permit the combination of many modest benefit mutations into one high benefit supergene.) I'd like to see simulations of this process with different recombination rates and different levels of mutations of modest benefit.

Note that a mutation of moderate effect might act as the seed of a supergene. The supergene would slowly spread in a sub population and pick up nearby beneficial mutations of small effect. Overtime the fitness of the supergene would increase and it would spread more rapidly. (This same process would filter out slightly harmful mutations.)

My mind kept nagging me about the term "supergene". "Supergene" usually refers to a group of nearby genes that share an epistatic relationship. When considering long term stable genetic patterns this makes sense as the epistatic relationship keeps the linkage from breaking. My usage is slightly different.

The linkage is only maintained while the component genes have a fitness advantage over wild type. Once the supergene sweeps the populace there is no longer a fitness advantage in maintaining the linkage. New beneficial mutations can replace component genes without damaging an epistatic relationship. This type of "supergene" is a temporary "selection" structure, not an epistatically re-enforced, stable structure.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Remember to switch RSS feeds

April 3, 2010

If you link to this weblog from your weblog, please update links: http://blogs.discovermagazine.com/gnxp/ If you have not updated your feeds, please do so now: http://feeds.feedburner.com/GeneExpressionBlog The old feed address will point for another week or so to the new feed, but eventually it…

I'm moving to Discover

March 26, 2010

Update your bookmarks: http://blogs.discovermagazine.com/gnxp And RSS: http://feeds.feedburner.com/GeneExpressionBlog If you have a weblog that links to ScienceBlogs GNXP, I would appreciate you update the link for the sake of PageRank. There isn't much to say about the move. There wasn't one big…

Canada is not a "free society"

March 24, 2010

That's all I have to say to Eric Michael Johnson's post, Ann Coulter, Hate Speech, and Free Societies. OK, seriously, from what I recall Eric is an American, though resident in the forgotten north. American absolutist stances on free speech are not shared by most Western societies, so demanding…

Others in Siberia

March 24, 2010

The complete mitochondrial DNA genome of an unknown hominin from southern Siberia: With the exception of Neanderthals, from which DNA sequences of numerous individuals have now been determined...the number and genetic relationships of other hominin lineages are largely unknown. Here we report a…

The biophysical limits of cognitive computation

March 23, 2010

In this diavlog with Glenn Loury the behavioral economist Sendhil Mullainathan recounts the results of an experiment. - If given the option of paying $100 for an item vs. $80 for an item, but in the second case having to go across town for the item, respondents choose $80 and going across town - If…