I’m talking about coeliac disease:
Coeliac disease is caused by a reaction to gliadin, a gluten protein found in wheat (and similar proteins of the tribe Triticeae, which includes other cultivars such as barley and rye). Upon exposure to gliadin, the enzyme tissue transglutaminase modifies the protein, and the immune system cross-reacts with the small-bowel tissue, causing an inflammatory reaction. That leads to a truncating of the villi lining the small intestine (called villous atrophy). This interferes with the absorption of nutrients, because the intestinal villi are responsible for absorption. The only known effective treatment is a lifelong gluten-free diet. While the disease is caused by a reaction to wheat proteins, it is not the same as wheat allergy.
Coeliac disease isn’t that uncommon, many know of at least one individual who has been diagnosed. Quite often the disease isn’t diagnosed until late in adulthood, causing years of discomfort. I am thinking about this because of a new paper in Nature Genetics, A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium:
The number and volume of cells in the blood affect a wide range of disorders including cancer and cardiovascular, metabolic, infectious and immune conditions. We consider here the genetic variation in eight clinically relevant hematological parameters, including hemoglobin levels, red and white blood cell counts and platelet counts and volume. We describe common variants within 22 genetic loci reproducibly associated with these hematological parameters in 13,943 samples from six European population-based studies, including 6 associated with red blood cell parameters, 15 associated with platelet parameters and 1 associated with total white blood cell count. We further identified a long-range haplotype at 12q24 associated with coronary artery disease and myocardial infarction in 9,479 cases and 10,527 controls. We show that this haplotype demonstrates extensive disease pleiotropy, as it contains known risk loci for type 1 diabetes, hypertension and celiac disease and has been spread by a selective sweep specific to European and geographically nearby populations.
The first half of the paper is a standard genome-wide association as they sniff around for loci which might be correlated with variation in the traits of interest. If you want to peruse the alphabet-soup of genes to see if anything jumps out at you, go to the supplements, they put more of that aspect in there than in the paper proper. As noted in the abstract some of the genes were already implicated in variation in hematological parameters, basically cell counts of blood and what not. Others were new discoveries. In both cases after identifying candidates they also flesh out possible biochemical avenues of action. Here’s their study design in a schematic:
As usual, the molecular biology doesn’t interest me so much as the evolutionary aspect (yes, the title of this weblog has to be taken as poetic license, as I’m going avoid talking about the glory of mRNA expression). Looking more closely they found a very long haplotype associated with some traits, in particular diseases. That haplotype had about 10 genetic variants of interest, SNPs, though only one was a standard change in amino acid coding (nonsynonymous mutation). Here’s a figure which illustrates how crazy this long homogenized region of the European genome is when viewed through Haplotter:
You can separate this region into “ancestral” and “derived” (more recent) variants. The block above carries the derived genetic variants, and can be found at ~40% allele frequency in northern Europeans. It’s new, and it’s rather common. It looks totally ancestral in the HapMap African and East Asian samples. The sweep started ~3,400 years ago. It also turns out that one of the SNPs on the derived haplotype is common in Western Eurasia. Here’s a map courtesy of HGDP selection browser (derived is orange):
Remember, this is just one SNP. The others were not typed in the HGDP data set. In any case, the derived variant is absent in Africa, and at very low frequencies in East Asia and the New World. Moderate frequencies in Central & South Asia, and higher frequencies in the Middle East and Europe. Let’s, for the sake of argument, assume that this SNP can give us a rough gage of this haplotype block.
So why did this genetic region reach higher frequency in this part of the world? Here comes the answer to the title of the post:
We obtained strong evidence suggesting that the haplotype at 12q24 has arisen from a selective sweep specific to Europeans and nearby populations beginning approximately 3,400 years ago, a period characterized by the expansion of high-density human settlements in this part of the world. The role of this region in T cell-mediated immune response is compatible with the notion of immunity being a strong selective force in human evolution
Why high-density human settlements? Because of the spread of agriculture. In particular, the farming of wheat. Agriculure → population density → disease → immune responses to disease → selection & adaptation. At least that’s the story. Here’s what the derived haplotype is associated with:
The 12q24 haplotype links risk alleles for T1D, CAD and celiac disease (carried on the derived haplotype) as well as a recently identified association with hypertension30, thus highlighting a remarkable example of disease pleiotropy at this locus. The functional validation of the effect of the Arg262Trp variant in SH2B3 and other variants on this haplotype will be important to clarify and dissect the underlying causes of such pleiotropy and also to establish whether variation in PLT and/or the Arg262Trp change are causal for CAD or whether they merely reflect a pleiotropic effect due to the persistence of multiple functional variants on the long-range haplotype….
…Further functional assessment and in-depth analysis of the 12q24 region will be required to dissect the pleiotropic effects observed at this locus and, in particular, the causality relationship between platelet counts and CAD risk. We note that the region covered by the long-range haplotype contains a number of other candidate genes that may modify platelet phenotypes. The tyrosine-protein phosphatase non-receptor type 11 encoded by PTPN11 plays a regulatory role in a wide array of cell-signaling events involved in the control of cell functions, such as mitogenic activation, metabolic control, transcription regulation and cell migration. Mutations in PTPN11 are a cause of the mendelian disorder Noonan syndrome, which is characterized by platelet abnormalities and acute myeloid leukemias….
Some parts of this paper degenerate into an alphabet-soup of “more study needed.” But one finding seems to be that the derived allele or haplotype adduced above is correlated with greater risk of acquiring a range of diseases. That makes sense, it’s long, it covers a lot of bases. If the adaptation to infection story is right then it looks like genetic draft yanked along many other variants in the sweep, and some of those variants have nasty disease likelihoods. CAD = coronary artery disease. PLT refers to blood platelet counts, but obviously deviants in the counts of these sorts of cells can point to underlying pathologies.
Evolution works with what it has, and grabs the best near solution possible. That might not be a very good solution. Sickle cell anemia and cystic fibrosis are both two example “costs” in the cost vs. benefit analysis of evolution. The benefit being greater relative fitness against the population mean (the aforementioned phenotypes have benefits in the heterozygote genotpe). But I think coeliac disease is a really good illustration of the absolute stupidity of evolution (despite its relative genius). Downstream of the invention of agriculture there emerged an adaptation which as a cost results in a small minority of the population being allergic to agriculture! (in Western Eurasia agriculture is obviously to a great extent synonymous with wheat cultivation)
Of course the adaptive story is speculative. And we need to be careful about putting all our eggs in the basket of iHS. It may simply be peculiar population demography. Nevertheless, the clustering of traits is suggestive insofar as it shows how phenotypes can correlate simply due to their physical location in the genome if their underlying genetic variants. In the case of a gene like EDAR, which in East Asians exhibits a variant that results in the characteristic hair form of those populations, we need to note that the primary trait we observe may not be the trait under selection if selection is truly what is occurring. Though the story of infection is plausible (a disproportionate amount of selection is going to be related to disease resistance), it may be something else. And when it turns out that a novel trait which entices us to an adaptive explanation for its emergence is found to be clustered in the genome with a host of other traits, that should make us reassess the plausibility of our story (after all, they can’t all be under selection simultaneously all the time).
Citation: A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium, Nature Genetics, 11 October 2009, doi:10.1038/ng.467