I have been describing some recently published worked on polymorphic deletions (see here and here for the previous two posts) on the old site. I will conclude that series here at ScienceBlogs with a discussion of linkage disequilibrium and deletions.
In the previous two posts I outlined two different approaches for identifying polymorphic deletions using single nucleotide polymorphisms (SNPs). I also described some of the analysis performed on that data set, which revealed that many of the deletions resulted in the elimination of at least a portion of a gene (some removed complete genes, and others contained multiple genes). Additionally, deletions containing genes appeared to be under purifying selection (natural selection against the deleterious deletions), but the genes within those deleted regions were not a random sample. This suggested that certain types of genes (functional classes) are more prone to deletion, and I gave some explanations as to why.
More after the jump . . .
In this post I will discuss how SNPs can be used to calculate linkage disequilibrium (LD) and how this relates to the deletions. The two papers that deal with SNPs, deletions, and LD can be found here and here. Before we begin, I will provide you with a brief description of LD -- feel free to skip the next paragraph if this is old hat.
Imagine you have two loci (a fancy name for location, but you can think of them as genes) on a chromosome, called locus A and locus B. Each has two alleles: 'A' or 'a' and 'B' or 'b'. If the two alleles at the two loci were randomly assigned to chromosomes, we would expect to find all four possible haplotypes (AB, Ab, aB, and ab) in equal frequencies (relative to the allele frequencies). This is not always the case, and the measure of the deviations from these expectations is known as linkage disequilibrium. Essentially, LD measures the independence of the two loci -- how well the allele at one locus predicts the allele at the other locus -- taking into account the allele frequencies at both loci.
One important consideration when examining the statistical association of two SNPs is when in evolutionary history the mutation event that gave rise to each SNP occurred. This is also true with studying the association between SNPs and deletions. As we have seen, similar deletions seem to have independent origins (the same deletion arose on different lineages) which means certain SNPs will predict some of the deletions of a certain region, whereas other SNPs will predict that same deletion in other individuals. Furthermore, if a deletion occurred recently (it's on the tip of a genealogical lineage), it will not have been around long enough to accumulate SNPs at significant LD. That said, Altshuler's group was able to identify strong LD between SNPs and a deletion for 9 of the 10 deletions they verified with quantitative PCR. They also found that the same SNPs in LD with each deletion in all three populations (African, Asian, and European), suggesting that the deletions originated prior to the common ancestor of all three populations and have been segregating in those populations ever since. Not so deleterious, eh?
As I have mentioned, many deletions have arisen more than once in human populations, probably due to repeat sequences flanking deleted regions that make those regions more prone to deletion. The third paper in this series (from a group at Perlegen) aims at determine what fraction of deletions are due to a single mutation event and which are due to recurrent mutations. This issue is interesting because many human diseases are due to the deletion of a gene (or multiple genes) in a single individual (a recurrent mutation), but there are also a lot of deletions segregating in natural populations that appear to be very ancient.
The Perlegen group identified a set of deletions using hybridization of genomic DNA from 24 individuals to an array that contained intervals of sequence from 600Mb of the human genome. If they observed a decreased signal in a particular interval, they concluded that the interval was deleted in that individual. They identified 215 deletions (ranging from 70bp to 10kb), and confirmed the deletions using PCR, allowing them to determine if an individual was heterozygous or homozygous for that deletion. 33 of the deletions were only found in a single individual, while 41 were found in at least 10% of the haplotypes sampled.
They then genotyped and ethnically diverse group of 71 individuals (African American, European American, and Han Chinese) at 100 of the deletions. These individuals had already been genotyped at over one million SNPs. Deletions with an allele frequency of at least 10% were selected, and the researches determined the relationship between those deletions any SNP within 50kb of the deletion endpoints. The LD between the deletions and the SNPs is on par with that observed between two SNPs at equal spacing and with similar allele frequencies. That means that the genotype at a SNP can predict whether an individual carries a deletion just as well as it can predict the allele at another SNP.
Their findings indicate that many of the deletions arose once in human history because they are in LD with nearby SNPs. Further evidence comes from alignment of the deleted regions with the chimpanzee genome. In almost every case, the sequence was present in the chimpanzee genome, suggesting that the sequence is ancestral and was subsequently deleted within the human lineage. While this may seem to conflict the previous results I mentioned regarding recurrent deletions, all it says is that most deletions arose only once. There is still some room for recurrent deletions within a model to explain the origins of human deletion (and we know for a fact that they occur).
I'll leave you with a few remarks to wrap up this discussion. First of all, deletions are very common within the human genome (and any eukaryotic genome for that matter). Copy number differences (deletions and duplications) make up a substantial amount of the genomic differentiation between individuals within a species and between species. It may come as a surprise to you, however, that many of these deletions do not appear to be deleterious (even those that contain genes). Some lead to human diseases, but many others appear to be quite common and arose a long time ago (an indicator that they are not very harmful). Recurrent mutations are often blamed for human diseases, but some may simply be neutral variants segregating in a population. Finally, the environment of a gene (whether it is flanked by repetitive sequence or not) is probably under selection to prevent the deletion or duplication of important genes. The strength of this selection is debatable, as many "disease genes" are flanked by repeats making them especially susceptible to deletion.
Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. 2006. A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 38:75-81.
Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. 2006. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet. 38:82-85.
McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB Lee C, Daly MJ, Altshuler DM, & The International HapMap Consortium. 2006. Common deletion polymorphisms in the human genome. Nat Genet. 38: 86-92.