A problem in genome-wide association studies ("GWAS") is the"missing heritability" issue--identified genetic variation can only account for a small fraction of the estimated genetic contribution to variation in that trait. Razib has a good roundup of several explanations (and I added some speculation about nearly-neutral mutations).
GWAS also have problems accurately characterizing the trait. For example, not all heart diseases (note the plural) are alike, so we have to be certain that we accurately assess the trait of interest. But what is very rarely discussed is the environmental component of heritability in GWAS. In fact, to me, the absence of accurate environmental characterization is potentially a huge problem for GWAS, to the point that I called it something the field has forgotten:
Heritability estimates are always environment-dependent. When the edifice of quantitative genetics was being developed (by Falconer and others), it relied heavily on agriculture (and agriculture was the main 'consumer'). This provided a large body of empirical knowledge that made it impossible to forget that even small differences in environment can affect the strength of heritability estimates (as well as the type of correlations between traits--but that's another argument all together). Human genetics in particular, which often does is piss-poor job of quantifying the environment (or incorrectly assumes that twin studies control for this effect) seems, to me, vulnerable to this.
(I explain this more here). So I was heartened to read that Bob O'Hara also thinks this is a key factor that could explain missing heritability:
Secondly, these studies are made on "natural" human populations. One thing we sometimes forget when doing these studies is that the results are specific to the environment they are measured in. I have mainly worked with data collected in the laboratory, and we always have to remember that the lab is not the same as the field; it's a different environment. So, even if in the observed environment variation in ADHD prevalence may mainly be genetic, that does not mean that we can't change the environment beyond what was in these studies to reduce the overall prevalence: for example we could reduce poverty and create a more equitable society where all genes will feel equal and will be able to live out their socialist selfish existence. Even if the poor were to be "genetic mud", that wouldn't mean their lives couldn't be improved and Great Things be encouraged to sprout from their fertile genetic soil.
I would take this further and argue that, until we characterize the environment more rigorously--and healthy and diseased isn't going to cut it--we have to treat many of these heritability estimates, and the attempts to explain the underlying contributions of genes, as speculative.
- Log in to post comments
Hi Mike,
Always good to point out that heritability is context dependent: it is the proportion of variance in a trait (measured on a particular scale) explained by genetic factors in a particular environment, at a particular time, in a particular population, etc. etc. As Fisher said, the denominator is a "hotch potch" of various components, including non-genetic factors, measurement error etc. Change one of those, you change the heritability, even though the genetic contribution to the trait may not differ. Hence, in a homogeneous environment, heritability may be high, while in a highly variable environment, it will be low--even though the genetic contribution to the trait has not changed. (As an aside: one often hears that because the heritability differs across different environments--e.g. the heritability of smoking changes with birth cohort or across regions with different anti-smoking policies--this implies there must be "gene-environment interaction." While this is possible, the simpler explanation to me is you've changed the environmental component of the variance.)
Also good to remind everybody that trait definition and measurement matters. For example, it's becoming clearer that the genetics of estrogen receptor positive and estrogen receptor negative breast cancer are different--and ER status is likely a proxy for some other underlying biological difference in tumors.
THAT SAID, the slipperyness and environmental-dependence of heritability is not a problem with GWAS, it's a problem with the concept of heritability and how we measure it. Except for a few very recent papers that use genome-wide data to infer distant kinship in a sample of "unrelated" individuals, and use this to estimate heritability, GWAS have not been primarily concerned with estimating heritability. GWAS give us markers associated with the traits we're studying, and some estimate of the effect of those particular markers. Granted that those effects are context-specific, and (should) come with all sorts of caveats (e.g. the G allele at rs123456 increases risk of nasty disease X by 20%--among U.S. women of European American ancestry recruited via a professional society or random digit dialing or whatever). That's a matter of gene-environment interaction a.k.a. effect-measure modification, and is a focus of many epidemiologic studies following up on the initial GWAS that identified these risk loci. (Never mind that the effect estimates from the initial studies could be biased for all sorts of design and statistical issues, those studies represent a very narrow sample of humanity, on both the G and E side.)
BUT THIS IS DIFFERENT than the problem you outline. If there is no gene-environment interaction, then the effect is the effect. And there is preliminary information that in some cases anyway (I suspect in many), if you have a marker that is a good proxy for the causal variant in multiple populations, then the genetic effect (as measured by a genotypic relative risk or per allele change in mean or what have you) is constant across changing backgrounds. BUT as I pointed out above THE GENETIC EFFECT (as measured in a GWAS or candidate variant study) CAN BE CONSTANT BUT THE HERITABILITY CAN STILL VARY ACROSS ENVIRONMENTS.
GWAS were never set up to measure heritability. What people have done, post-hoc, in order to "keep score," has been to estimate a genetic contribution to genetic variance using the locus-specific effect estimates. (And here things get hairy, whether you are talking about heritability on the observed-risk scale, or on the liability scale, or comparing the sibling relative risk due to these variants to the sibling relative risk estimated from family data.) What people report then is a proportion of genetic variance explained by the known loci--which, assuming the genetic variance is the same in the GWASed population and the population where the overall genetic variance was estimated (i.e. assuming no gene-environment interaction), is the same as the proportion of heritability explained by the known loci, regardless of the overall trait variance. I.e. if there is no gene-environment interaction, then the PROPORTION of heritability explained by known markers is the same in a population with a high variance in the environmental determinants of a trait as in a population with low environmental variance (even though the heritabilities will be very different in the two populations).
Phew.
To summarize: the loci identified by GWAS clearly have some effect on human traits, including disease risk. This does not mean these traits are genetic in any deterministic sense--in fact almost without exception (and we can probably drop the almost), they are the result of the interplay of both genes and environment, and any attempt to partition causality as N% genetics and (1-N)% environment not just speculative, but potentially counterproductive and philosophically unjustified. The "missing heritability" problem is not about getting this G-E apportionment wrong, rather it's a guide to folks working in genetics as to how much genetic variance is still out there to be found. And for most traits studied so far, the answer is "a lot." The kerfuffle in the genetics world (spilling over into the mainstream press) is how to go about finding the missing bits.
We in our group feel that gene-environment interactions are important. We have identified many such GxEs with respect to metabolic phenotypes like blood lipid levels, BMI (obesity), blood pressure and such. Furthermore, we are positioned to analyze two GWAS (GOLDN and Framingham Heart Study) where there is extensive data on lifestyle choices of diet, exercise, and alcohol and tobacco use. This will allow us to identify GxE interactions on a large scale. Yes, the statistics will be an issue and we, with collaborators, are working on a solution.
It is not a minor observation that a "risk" allele only shows that risk when an environmental parameter passes a certain threshold.
Mike, you've mis-understood what I was trying to say in what you quoted. I wasn't claiming that the environment would explain the missing heritability. It might be an effect if there was a huge consistent bias in the populations that are used for GWAS and pedigree/twin studies, but I'd be amazed if that was the explanation for the differences: they're too big, and I'm more inclined to believe classical estimates of heritability (or at least think that they're less biased). Actually, with the extreme variable selection going on in GWASs, I suspect there's a reasonable amount of environmental variation in there (for technical reasons: the effect sizes are biased upwards for similar reasons to regression towards the mean).
Having written that, I agree with the general point that heritability could be different because of environmental conditions: I think it would be more accurate to re-frame the problem as a missing additive genetic variance.
Bob O'H wrote: "I think it would be more accurate to re-frame the problem as a missing additive genetic variance."
I agree. This also underscores the fact that gene-gene and gene-environment interactions (i.e. non-additive multi-factor effects) cannot account for the "missing heritability." While such interactions exist (although we can argue about how big they are--the fact that we have not seen a gold rush of gene-environment GWAS papers four years into the GWAS era says something), they cannot account for the "missing heritability," because they are not missing from the heritability = the ratio of additive genetic variance over total variance. Twin studies e.g. don't factor in these higher-order effects.
I am not sure that heritability is a useful parameter when it comes to GWAS. What I don't like is that it is a ratio, and so like all ratios you can get the same number with large numerators and denominators as you do with small numerators and denominators, so a heritability does not tell you how much genetic variance to expect.
What I 'love' about the height GWAS saga is that height has a high heritability (h^2 ~ 0.8), but we all know of the massive effect environment has on the final height of an individual - just look at the effect in height that occurs when people shift from one country to another, or you get very sick when you are around 11 years old, or there is a long war and food is short.
On the other hand, and here is an example from my specialty, agricultural genetics, coat colour in cows has a moderate -high heritability (classically around 0.6) but when coded into colours or hues (white, black, red, dun, brown, etc) there are a limited number of genes with large Mendelian effects and, for example, the cow is red or black due to mutations in the MC1R gene. A similar thing occurs in eye colour in humans, blue, green/hazel, or brown, it has a high heritability (h^2 > 0.8), little environment effect on blue/brown colour, mutations regulating OCA2 basically causing the difference, but a study measuring shades of eye colour can find a host of small effects as well (eg Liu PLoS Genet 2010). In this case high heritability comes with small environmental variability, highlighting my assertion that heritability is not a useful parameter for a GWAS.
So there is this disconnect between heritability and what you find in a GWAS and my bias is to ignore the heritability.
It also seems as if GWAS has the same challenge at QTL, namely you find the associations that are anomalously high in your study population. When you go to another population, the associations are lower or zero. (Classic regression!)
It seems there are two directions to go: locus based or phenotype based. When you find an association, you try to find out how and why that locus has the effect in that particular population and why it has less effect in others. That is, you try to partition G, E and GxE for the linked gene for various G and E sets. That line does lead to heritability some some (small) contributor to the phenotype.
In the second, and most popular, you keep doing associations in different populations until you find loci with consistent effects. Since populations don't come cheap, the results are often less than robust.