Basic concepts - linkage disequilibrium

Thinking about it today, I realized there is a "Basic Concept" that I think I should touch upon, and that is linkage disequilibrium (LD). Notice the wiki link? I do that whenever I mention LD because it is such an essential concept for some of the evolutionary ideas which I am interested in, but often not necessarily a transparent or clear one to the lay person.

i-3f168d79bfa7b1d61a28108d8c25c4ef-chrom1.jpgIts lack of obviousness isn't due to complexity, LD is pretty simple, rather there are particular background ideas which one needs to firmly have in mind before one can easily grasp it. For this reason I've placed an image of a chromosome to the left. LD is not a purely intrachromosomal concept, but, I believe a biophysical model is important in understanding it, so I will use this image for illustrative purposes in the following post. So, you know that the human genome is divided physically into chromosomes, and each chromosome consists of two sister strands of DNA, chromatids. As you see to the left diploid organisms have two copies of a gene, alleles, at each "locus." A locus is obviously an abstract concept, it is basically a synonym for a gene. Assuming we have "gene" under our belts we can now conceive of a strand of DNA which is saturated with various genomic regions, introns, exons, intra and intergenic regions, etc. The details aren't particularly relevant to LD, just remember that locus 1 and locus 2 on the same chromosomal strand are a particular physical distance apart.

i-7490a78f21f6092a30b30a1b84dacfe0-chrom2.jpgNow look at the image to the left. The numerical and letter notation refers to the locus and chromosomal arm position, respectively. Each of the four "slots" represents one of two diploid copies of the gene inherited from one parent. The + & - script represents, for ease of conception, functional and non-functional copies of the gene. Mendel's Laws tell us that the identity of an allele at locus 1 should not give us any information about the identity of the corresponding allele on the same chromatid. In other words, just because on copy "A" locus 1 is - should not tell us whether there is a greater likelihood of locus 2 on copy "A" being + or -.

That is where linkage disequilibrium comes it: LD basically measures the deviation from this expectation of non-association along the genome. As I noted above, though I am using the example on one chromosome, this can apply throughout the genome (my own interest is specifically in physically continguous genomic regions, more on this below). The mathematical calculation of expectation is simple algebra, and I won't reprise the explanation offered in the wikipedia entry for "D." But, I will point to three cases where LD could exist.

1) Consider a circumstance where there is an epistatic interaction between two loci contingent upon the alleles. Imagine a infection which is lethal to individuals with null copies (-) of the alleles above for locus 1 and locus 2 on the same chromatid (imagine that locus 1 & locus 2 enter into cis interactions). This is a case of LD being generated by fitness consequences because of genetic combinations. If the null and functional copies exist at high frequencies (e.g., both start at around 0.5 in generation 1), then you would have a situation where the extant proportion of individuals with genotypes which are shifted toward mixed (+/-) or functional alleles (+/+) along the genome at the loci would be higher than expectation. The presence of one null allele on a given locus can immediately tell you that the other locus does not have a null allele, because that combination is lethal. Of course, over time selection would expunge the variation which generated this LD, as the null alleles would decrease in frequency.

2) Consider a circumstance where two populations, previously separated, come into contact (e.g., an isthmus connects two islands). If the populations exhibit alternative alleles to fixation on several loci, and those loci are on the same chromatid, one can envisage a situation where two alleles on two loci exhibit linkage over many generations. The issue here is that synteny takes time to be broken apart by recombination, so the genetic complexes which had fixed in the parents will carry on and be passed into the subsequent admixed generations until crossing over disrupts the physical association. To give a concrete example, imagine a locus for eye color and hair color. Imagine one population is fixed for "white" for both (100%) and another is fixed for "black" (100%). The first generation would be totally heterozygous intralocus, but, each chromatid with a "white" eye color copy would also have a "white" hair color copy, and vice versa. Over the generations recombination would result in swapping of partners and eventually one would not be able to predict whether the downstream gene was "white" or "black" based on its physical partner, but that would take time.

3) Finally, the one I am most interested in because of its evolutionary historical significance, and that is LD generated by selective sweeps. Imagine a table top that is little used. Over time it builds up a layer of dust which disrupts it smooth symmetry. Now, consider someone sliding a towel over its surface. Across the region that the towel traversed the dust will be swept away and a smooth and clear symmetrical surface will now shine, bordered still by expanses of dust. Over time the smooth region will become obscured by dust once more and fade into the background. The analogy I am making is that the dust is genetic variation, while the sweeping towel is a selection event. If a new mutant allele confers great selective benefits then it can rise in frequency precipitously. If this rise is faster than recombination can destroy genetic associations, other alleles in nearby regions can be "swept" along in a hitchhike. If one imagines a scenario where the likelihood of a "break" along with recombination occurs is equally distributed across the genome (this is not true, but accept it for simplification) then decreasing the distance between two genes along a chromatid decreases the likelihood of a recombination event separating them on any given meiosis. Since a subset of the genome is less diverse than the full genome a selective sweep which favors a coterie of neighboring genes and alleles has a homogenizing effect, generating "long haplotypes," genomic regions cleansed of variation. Of course, this variation eventually reemerges via mutation and recombination, but it takes time.

And so there you have linkage disequilibrium. As you might notice, LD is epiphenomenal. It is a passing fad, but since it erupts periodically it can be an excellent marker for the historically contingent events which are important in evolution.

Tags

More like this

How about doing LOD scores?

hm. could you make a post out of it though? the reason i posted on LD is that i end up linking to the wiki article in so many posts, so thought it isn't a "basic" scientific concept per se it is one which makes this blog intelligible. i haven't spoken much of LOD or tajima's D or all sorts of other stats on this blog....

Apropos of the previous post on maintenance of polymorphism, it might be worth reviewing some of the ways that this happens -- most of the things an interested lay reader is fascinated by are not driven to fixation (I guess we notice contrasts). Why aren't all/most males 6'6 with a 145 IQ, doggedly conscientious work ethic, and 9-inch penis?

Why aren't all/most males 6'6 with a 145 IQ, doggedly conscientious work ethic, and 9-inch penis?

dude, everyone doesn't have a 9-inch penis???

Finally, the one I am most interested in because of its evolutionary historical significance, and that is LD generated by selective sweeps.

I wouldn't say the LD is "generated" by the selective sweep. The LD is generated by the inital mutation, which occurs on a given haplotype (a new mutation is thus in complete LD--D'=1--with variation along the entire chromosome). Over time, recombination breaks up that LD, so for a given polymorphism, the amount of LD surrounding it is an approximate measure of it's age. If a polymorphism is under positive selection, it (and the haplotype it's on) moves more quickly than you'd expect to a higher frequency, and you end up with long haplotypes at high frequencies, the signature of selection you're talking about.

you're right. "generated" was the wrong word. what word would concisely describe (e.g., one or two words) the movement of that LD from n = 1 to proportion > 0.9, since that's what i'm really talking about.

second go around, minor point:

linkage = physical association of alleles along the chromatid for one individual

linkage disequilibrium = population wide measure of interlocus allelic associations

at least that's the sort of def. i've read/heard/been told before. so, in this scenario linkage is generated by mutation, but LD is generated by whatever drives/results in allelic associations population wide.

anyway, i don't want to pursue this line too much...as it is kind of like like the interminable debates about 'species' or 'genes' (or perhaps i'm misunderstanding the fuzziness of the definitions colloquially).

linkage = physical association of alleles along the chromatid for one individual

linkage disequilibrium = population wide measure of interlocus allelic associations

that's not a minor point!!! :)

one quibble-- linkage is the tendency of alleles at two loci to be inherited together, which implies physical proximity. your definition of LD is right on.

if we have two loci and an B with two alleles 1 and 2 each A and B are linked, then someone with a haplotypes A1-B2 and A2-B1 will tend to pass on those two haplotypes to their offspring (i.e. not the recombinant haplotypes A1-B1 or A2-B2). Likewise, an individual who has haplotypes A1-B1 and A2-B2 will tend to pass those on (rather than the recombinants A1-B2 and A2-B1). This is independent of the frequency of those haplotypes (i.e. LD)

LD is the population-level measure of association between the alleles, like you said, which depends on the frequency of the haplotypes. A couple definitions: if one of the haplotypes has frequency 0, there is "complete" LD, d'=1. If two of the haplotypes have frequency 0, there is "perfect" LD, r2=1.

When a new mutation arises, the locus where it arises is linked to nearby loci, of course. It's also somewhat trivial to note that in the population-level data, if you consider any other polymorphic locus, there's at least one haplotype that has frequency 0 (since there's only one copy of the new mutation), so d'=1 and there's complete LD.

So the linkage already existed (the mutated locus was already linked to nearby loci before it mutated, though if there were no markers in the area it would be impossible to quantify the recombination fraction between it and other nearby loci), the LD was generated by the mutation.

LD is a measure of association between polymorphisms, linkage is not. even if there weren't any polymorphism, two loci are still inherited together if they're close together (though it takes polymorphic markers to see this in a pedigree).

sorry, this isn't very clear.

sorry, this isn't very clear.

i think the key is to keep in mind the physical image first, and then move outward. but no, i don't think there's any way to be succinct with this because it presupposes a particular abstract (mendel's laws) & biophysical (chromosomes & DNA) understanding of basic genetics.

Why aren't all/most males 6'6 with a 145 IQ, doggedly conscientious work ethic, and 9-inch penis?

dude, everyone doesn't have a 9-inch penis???

Noo! You line was,

Why would all those things be correlated with a small penis?

So, you know that the human genome is divided physically into chromosomes, and each chromosome consists of two sister strands of DNA, chromatids.

You're conflating sister chromatids and homologous chromosomes.

The issue here is that synteny takes time to be broken apart by recombination, so the genetic complexes which had fixed in the parents will carry on and be passed into the subsequent admixed generations until crossing over disrupts the physical association.

You've managed to butcher synteny worse than any other usage I've seen. (Sorry, dude, but I gots to be honest here.) First off, synteny does not refer to gene order; it refers to the physical presense of genes on the same chromosome (regardless of order). The term "syntenic block" is a misnomer. Ironically, the wikipedia entry cites a paper by a bunch of asshats on chromosomal rearrangements between Drosophila species (honestly, who gives a shit about that crap?) which explicitly states the definition of synteny that I described above -- the drosophilists prefer the term linkage groups to synteny.

But what you've done is suggest that synteny refers to associations between alleles at different loci in separate populations. This is not even wrong.

But what you've done is suggest that synteny refers to associations between alleles at different loci in separate populations. This is not even wrong.

???

You're suggesting synteny refers to associations between alleles. I'm saying synteny refers to the physical linkage (as opposed to genetic linkage) of loci -- ie, genes can be syntenic, not alleles.