Genetic Future

i-9727b59a9e40cafa48b0c6942f6ce897-crystal-ball.jpgWell, it’s a little late, but I finally have a list of what I see as some of the major trends that will play out in the human genomics field in 2009 – both in terms of research outcomes, and shifts in the rapidly-evolving consumer genomics industry.

For genetics-savvy readers a lot of these predictions may seem, well, predictable, so I want to emphasise that my purpose here is not really to make risky forecasts; I’m more interested in laying out what I see as the major big picture trends for the year to come, with a few specific predictions about unknowns thrown in. In any case (as you will see) I’ve gone for the approach of throwing enormous numbers of predictions out there in the hope that at least a few of them will come true.

So without further ado, here they are – Genetic Future’s predictions for the emerging (and ongoing) trends of 2009 in the field of human genomics:


2009 will be the year of rare variants for common diseases. Numerous studies will identify variants sitting in the frequency range of 0.1-5% that contribute to the risk of common diseases, and explain a small but important fraction of the “missing heritability” left unexplained by common variants. These studies will take several different approaches: chip-based analysis making use of the few rare variants on existing chips, and new chips offering increased coverage of rare variants; targeted resequencing of genes previously identified as harbouring common variants associated with complex diseases; targeted resequencing of functionally-related genes; and the first attempts at unbiased resequencing of the entire exome (that is, every protein-coding region of the genome). These variants will have a larger effect on disease risk than what we have seen for most common variants (odds ratios over 2), but due to their low frequency will each explain only a small fraction of overall risk variance.

Genome-wide association studies of common variants will exceed the 100,000-sample mark, but provide diminishing returns in terms of convincing associations. Studies published in 2008 for traits such as height and serum lipid levels frequently incorporated data from tens of thousands of samples, so the 100,000-sample mark will not come as a shock to anyone. However, the low-hanging fruit is now well and truly plucked for most complex diseases; these new studies will be pushing into the realm of variants with ever-smaller effect sizes, which will prove tougher to replicate and also provide limited predictive power. However, they will still generate important insights into the molecular pathways underlying complex diseases.

The validation of many newly-discovered variants will prove difficult. With common variants of moderate effect it was usually possible to perform a replication study in a reasonably-sized follow-up cohort. For rare variants and common variants with very small effect sizes replication will prove much more troublesome, due to reduced power (and, in the case of rare variants, their restriction to specific populations). Validating such variants will require building up a case for association from multiple sources of information, including functional studies (see below) and incorporation of information about multiple variants from the same gene in different populations.

Isolated populations will provide a powerful source of new genetic variants for common diseases and other complex traits. Populations that have been isolated and subject to substantial inbreeding for long periods of their recent history provide substantial advantages for genetic research: they have increased genetic and environmental homogeneity, simpler genetic architecture for complex traits, and longer stretches of association (linkage) between neighbouring markers. Last year we saw reports of genetic association studies in northern European populations, the Amish and other groups; expect to see many more this year.

Attention will move gradually away from complex diseases and towards the genetic dissection of disease-related traits. So-called “endophenotypes” or “intermediate phenotypes” are specific traits that are associated with broader complex diseases (e.g. serum glucose and insulin levels are an endophenotype associated with type 2 diabetes). There is good reason to expect that – at least in some cases – these traits will prove more amenable to genetic dissection than complex diseases themselves. Last year saw this approach prove fruitful for some endophenotypes (e.g. serum lipid levels) but not for others (e.g. serum insulin), but the overall feeling seems to be optimistic. Expect genome-wide association studies performed on an ever-wider array of carefully-defined measurements with clinical relevance; some of the associated markers will also increase disease risk while others do not, and the paradoxes will prove illuminating.

The causal variants underlying some genome-wide association signals will be mapped, while others will prove elusive. Genome-wide association studies have identified many areas of the genome associated with complex diseases, but in most cases the actual causal variant – the precise genetic change that actually underlies the effect on disease risk – is still unknown. Large consortia are currently sequencing the associated regions, and these efforts (along with data released this year by the 1000 Genomes Project) will no doubt make some headway. In some cases there will be obvious candidates: a variant with an obvious effect on the expression levels or the sequence of the protein produced by a nearby gene, for instance. However, in many cases the list will only be reduced to a line-up of a dozen or so genetic changes, any one of which could be the real culprit.

Human genetics will become more intimidatingly complex than ever. The mapping process described above will confirm that many of the genetic variants underlying human disease are not found in protein-coding sequences, and that a disturbingly high proportion fall far away from any known gene, and outside understood functional elements. This will dramatically increase the complexity of interpreting new genetic variants identified by personal whole-genome sequencing.

There will be increasing emphasis on the functional validation of signals identified by genome-wide association studies. High-level journals are already pushing for authors of large-scale genetic studies to also include at least cursory analyses of the potential functional impacts of their identified variants – e.g. effects on gene expression. Such efforts will become increasingly more sophisticated as large consortia look for other high-throughput assays for identifying potential functional impacts for large numbers of putative causal variants. The pay-off will be a much deeper understanding of the molecular pathways underlying disease risk, with implications for biomarker discovery and drug design, although many variants will prove impossible to nail down functionally (see above).

Studies incorporating individuals from multiple populations will be fruitful in mapping causal variants, and uncovering differences in disease genes between human groups. Different human populations display different patterns of linkage disequilibrium (the strength of association between nearby genetic markers). Incorporating data from multiple populations can thus allow researchers to untangle these markers from one another and zoom in on the real offender in an associated region. Expect to see multiple large genome-wide studies in individuals from Africa, Central/Southern Asia and particularly in East Asia (especially China, Japan and Singapore), where the burgeoning genomics infrastructure will allow in-house studies to be performed. There will be studies highlighting both similarities and differences between populations in the genetic architecture of disease risk; naturally, the differences will get much more press than the similarities.

Most of the “missing heritability” will stay missing. This one’s only slightly risky: despite the massive genome-wide surveys of common variation described above, and new associations with rare variants and structural variation, the bulk of the heritable risk for most complex diseases will remain unexplained at the end of 2009. There will continue to be little evidence for widespread epistasis (non-additive interactions between risk variants). Attention will slowly turn to more exotic explanations such as epigenetic variation, and even variation in microflora (which is at least partially heritable). However, much of the variance may well rest in a huge number of variants with very small effect sizes, which may prove almost impossible to unravel.

Still, clinically usefully genetic tests for some complex diseases will be developed. I guess this is most likely to happen first for auto-immune diseases, which have generally proved somewhat more tractable to the common-variant approach. The possibility of using population screening to identify at-risk individuals for further tests will be actively considered.

Genetic variants underlying complex psychiatric diseases will remain largely elusive. Diseases such as bipolar and schizophrenia have proved frustratingly recalcitrant to genetic studies, and this picture will probably not change markedly in 2009. Expect to see a few new association signals in 2009 from several large genome-wide studies in schizophrenia, bipolar, autism and depression, but probably nothing that will provide much predictive value in the clinic. There are wild-cards here, though – there may well be rare large-effect variants out there that provide individual predictive utility, and 2009 might be the year they start popping up. We’ll see.

We will start to see “bad” genome-wide association studies. Up until now the expense of large-scale genotyping has meant that genome-wide studies have only been performed by well-funded groups with access to expert statistical support. As the cost of genotyping drops, expect to see a new wave of studies produced by groups without such expertise and using small sample sizes. The end result will be enriched for false positive findings due to chance or bias, which – if they are “sexy” enough – will nonetheless sometimes find their way into high-impact journals. There will be scathing letters to the editor calling for tighter standards, but that won’t stop a gradual dilution of the quality of the literature.

There will be a proliferation of new companies attempting to gain a share of the personal genomics market, most of whom will sink rapidly into obscurity. It’s cheap to build a website and license a commercial lab to perform genome-wide genotyping, but fiendishly expensive and difficult to build the knowledge-base required to effectively explain complex genetic data to consumers, and to navigate the regulatory maze. Expect to see “bottom-dwellers” (in Steven Pinker‘s terminology) who emerge into the market, blink once or twice in the glare of media attention, and then slowly sink back into the mud.

We will not see a retail complete genome sequence offered for less than $1000. I’d be happy to be proven wrong here, mind you, but I just can’t see prices tumbling this far over the next twelve months – even with the huge competition and rapid technological advances in the DNA sequencing sector. Of course, it depends what you mean by “complete” – it will no doubt be possible to offer a fragmentary, low-coverage genome at this price by the end of the year, but such a product would be almost worse than no information at all. Alternatively, cut-price genome sequences may be offered by companies at a loss, to attract attention and create a more sustainable long-term market.

Mainstream personal genomics companies will offer affordable large-scale (but not whole-genome) sequencing, with disappointing results. I’m guessing that at least one of the big three personal genomics companies will be releasing some kind of large-scale sequencing product this year (“large-scale” meaning reading the sequence of thousands of genes simultaneously). This will either be targeted sequencing of a very large set of disease genes, or whole-exome sequencing (i.e. reading all protein-coding regions). The commercial pressure to be the first company to launch such a product may overwhelm common sense evaluations regarding the still-considerable technical challenges associated with this process. Rushing to market will result in messy data that adds further and unnecessary complexity to the process of making sense of the genome: not only will it be unclear whether a novel variant is disease-causing or not, it will often be unclear whether or not it is even real. There will be negative reviews, apologies, and calls for tighter regulation. Caveat emptor.

Navigenics will not come to dominate the consumer genomics market. I just can’t see this at all – people just aren’t excited about the Navigenics product in the same way they are about 23andMe’s. Unless something astonishing happens, I expect to see 23andMe continue to dominate both the media coverage of personal genomics, and to almost certainly secure the majority of sales (as opposed to free give-aways) in 2009. Navigenics will survive, though, probably drifting steadily towards large-scale clinical diagnostics while doing its best to remind everyone that it is the only “serious” company on the market. Meanwhile, the fate of deCODEme remains uncertain while its parent company attempts to find a way out of its current financial crisis.

The personal genomics industry will begin to blur out of existence. By this I don’t mean that demand for personal genomics will disappear, but rather that the industry will cease to exist as a discrete, monolithic entity. Right now, despite their attempts to distinguish themselves, all three mainstream personal genomics companies are basically offering the same product. That will all change as the field begins to fragment into niches, with various entities staking out claims over distinct but overlapping territories that begin to merge with related industries (particularly clinical diagnostics). The increasing struggle for companies to maintain their identity and territory during this diversification, while being simultaneously buffeted by regulatory changes (see below) and rapid technological advances, will be fascinating to watch.

The regulatory landscape will shift – but in which direction? It seems almost certain that we’ll see major regulatory changes in the consumer genetics industry this year, but I have no real insight to offer regarding what those changes will be. The best-case scenario is a surgical, benign strike that clears out the bottom-feeders and creates clear standards to prevent unethical behaviour in the future, while still allowing direct-to-consumer genetic testing to persist and innovate; the worst-case scenario would be a heavy-handed clamp-down on the whole industry. I guess the actual outcome will be somewhere in between, and may vary markedly between countries – with uncertain consequences for the largely-online business of personal genomics.

There will be an uptick in “it’s all lies” stories about personal genomics, impacting on public perception of the industry. As the novelty of the industry wears off, journalists will find a new way of making headlines out of personal genomics companies – emphasising the inevitable discrepancies between testing results from different companies, and quoting major scientists expressing disquiet about the value of genomic data. In addition, speakers will start to make a name for themselves doing “exposés” of the industry to both lay and scientific audiences. The industry will respond to genuine criticism, but damage will still be done. (Note: the Biopolitical Times has a very similar prediction; I swear I didn’t copy!)

Genetic ancestry will bring in big money, but it will trigger a backlash from indigenous populations. The development of tools for discerning high-resolution estimates of geographical ancestry based on SNP chip data (the same data currently used by personal genomics companies) will make genetic ancestry even more insanely popular than it is already. However, these tests are only as good as their reference populations. Research groups that have been quietly collecting DNA from different ethnic groups for decades will suddenly find themselves the subjects of considerable commercial interest. At the same time, increasingly aggressive efforts to collect samples from indigenous communities will raise hackles and accusations of biopiracy (as have similar sampling projects).

Well, that’s it for now – congratulations to those who made it all the way through. If you violently disagree with my predictions, or want to record your own predictions for posterity, add a comment below.

Subscribe to Genetic Future.

Comments

  1. #1 Tera Eerkes
    January 19, 2009

    Nice post. This list may be “predictable” but I couldn’t agree more with many of these estimates.

    One thing to add perhaps..I see clever scientists beginning to look for that “missing heritability” in places like the whole mitochondrial genome, and the very small sized copy number variation, say 5-50bp, that will hopefully be cataloged and published soon for the HapMap individuals (I give that a 50/50 chance in 2009).

  2. #2 Andro Hsu
    January 20, 2009

    Of course, I’m sure you will be correcting the significance of your predictions for multiple hypothesis testing.

  3. #3 Daniel MacArthur
    January 20, 2009

    I’ll accept nothing less than genome-wide significance.

  4. #4 Parik T.
    January 30, 2009

    I also expect 2009 to bring lot of turbulence in the laws governing the commercialization of these technologies developed for personalized medicine.