So if you haven’t read those articles, already, go and do so now – when you come back, I want to talk about the potentially worrying implications of this paper for the future of personal genomics.
There’s really only two pieces of jargon you need to know to follow this story, and those are the two classes of genetic variants that alter the expression levels of genes: cis and trans variants. To put it simply, cis variants are those that are found close to a gene, and trans variants are those that act on a gene’s expression levels but are found far away in the genome (typically on another chromosome).
The ominous message from this paper is this: by examining the effects of genetic ancestry on local gene expression levels in samples from African-Americans, the study provides the first solid estimate of the proportion of the variance in gene expression that is determined by cis-acting variants – and suggests that this proportion is low (around 12%). In other words, this paper suggests that ~88% of the variants altering the expression of a gene are trans variants found far from that gene.
That’s worrying for two reasons. Firstly, it suggests that it may be even more difficult than expected to untangle the molecular basis of the signals found in recent genome-wide association studies for common diseases. Many of these signals fall outside known genes, and the default hypothesis is that the causative variants underlying these signals somehow affect the expression of nearby genes. If in fact many of these variants exert their effects through regulation of distant genes it will be much more difficult to nail down the pathways involved, particularly for diseases where it is hard to get samples of the affected tissues from living individuals (e.g. psychiatric diseases).
But the most troubling implications of this finding are for the future of personal genomics; bear with me for a moment, because this will take a little background to explain.
We already knew from the relatively poor yield of genome-wide association studies (which target common variants) that a substantial fraction of the genetic risk of common diseases – especially at an individual level – is likely the result of rare genetic variants of moderate effect. Such variants are completely invisible to the chip-based approach of current personal genomics companies, but they will be detected by rapid, affordable whole-genome sequencing methods that will almost certainly be available within the next five years. The problem is that any individual genome will contain many rare variants, only a fraction of which are actually disease-causing – so the major challenge facing personal genomics right now is developing methods for inferring the functional effects of novel sequence variants.
As I noted above, another of the lessons from recent genome-wide association studies is that many disease-associated variants fall outside protein-coding genes, and are thus likely to increase risk by disrupting patterns of gene expression (rather than altering protein sequences). This is a problem, because our current understanding of the way DNA regulates gene expression still in its infancy, and we can make only the crudest guesses regarding whether or not a new-found variant will alter gene expression and, if so, in what direction – and that’s even for variants that are found close to a gene. Making de novo predictions about the effect of a novel sequence variant on distant genes will be immensely more challenging – but if this study is to be believed, that’s exactly what will need to be done for the majority of expression-altering variants.
This is all rather ironic given that barely a month ago I was all cheerful about a recent paper showing that cis variants tend to cluster tightly around transcriptional start and end positions, making it much easier to nail down expression-altering variants found close to a gene. Now it seems that this will only apply to a small fraction of the overall bulk of expression-altering variants – a rather depressing revelation.
There are still caveats about the findings of this paper that may provide a glimmer of hope. The authors note one issue at the end of the discussion: the gene expression data are derived from only one tissue (white blood cells), and it will be important to extend this analysis to other tissues involved in common disease (such as pancreatic cells in diabetes). However, while the regulatory variants will differ from tissue to tissue, I’d be surprised if the big picture (in terms of the proportion of cis and trans variants) was strikingly different – unless anyone can think of reasons why proximal regulatory elements would systematically alter in importance from tissue to tissue?
A more interesting caveat is that this study essentially looked at the effect of between-population genetic variation on gene expression, by using the relative proportion of African and European ancestry within each region of the genome in admixed individuals, and it’s unclear to me whether the same effect will necessarily be seen for within-population variation. In the comments over at Gene Expression, G from Popgen ramblings notes that the authors made an effort to address this issue (by looking at the population differentiation of cis and trans variants) and found no striking difference, but it’s still possible that there are a few highly population specific trans variants acting on many genes that explain a large chunk of the variance. If that’s the case, the fraction of the variation explained by cis variants may turn out to be substantially higher in within-population data – but I’ll admit that it’s a long shot.
If this picture of the distribution of cis and trans effects is accurate, then the only way to accurately predict the effects of novel expression-altering variants will be by assembling a genome-wide map of the regions affecting the expression of each disease-relevant gene in each disease-relevant tissue. That’s a feat that will require some very clever biology, and will take far more longer than five years – so once again, you will have your genome sequenced long before anyone can tell you what it means.
Citation: Alkes L. Price, Nick Patterson, Dustin C. Hancks, Simon Myers, David Reich, Vivian G. Cheung, Richard S. Spielman (2008). Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans PLoS Genetics, 4 (12) DOI: 10.1371/journal.pgen.1000294