Limitations of 1000 Genomes Project

By dgmacarthur on November 18, 2008.

A Nature News article discusses the ongoing 1000 Genomes Project, an international effort planning to sequence 1,200-1,500 human genomes. The discussion springs from project co-chair David Altshuler's update at last week's American Society of Human Genetics meeting on the progress of the project (in brief: 3.8 terabases down, 996.2 terabases to go).

The article provides a generally positive overview of the project's historical context, goals and progress. The one contrary note comes from Duke University's David Goldstein, who has previously publicly expressed skepticism regarding the value of much of the data currently emerging from genome-wide association studies looking for common variants underlying common disease risk. Goldstein is also wary about over-stating the importance of the 1000 Genomes data:

"1000 Genomes will be hugely useful for growing the technology to generate and analyse sequence data," says David Goldstein of Duke University in Durham, North Carolina, adding "But in terms of a catalogue of the variants most important to human biology and disease, it's less clear how important it will be." Goldstein advocates sequencing people with extreme presentations of disease to understand more about common disease pathways.

This is an important point, which isn't well-explained in the article. The 1000 Genomes Project will provide a fairly comprehensive catalogue of genetic variation present at frequencies above 1% (at least within its target populations), but its coverage of even rarer variants will necessarily be incomplete.

This is mind-numbingly obvious: no catalogue of variation would be truly "comprehensive" unless it sequenced every human being on the planet at birth. At the extreme end of the frequency spectrum, all of us carry perhaps a hundred or so variants that are unique to us (in terms of identity by descent, at least) since they occurred in the sperm and egg cells that gave rise to us - sequencing 1000 genomes won't find those variants unless we happen to be one of the study participants. In between those unique variants and the lower end of the range covered by 1000 Genomes - perhaps 0.1 to 1% - there is a large swath of human genetic variation that will be almost entirely missed by the project.

Do these extremely rare variants actually matter, in terms of predictive personalised medicine? Goldstein obviously thinks so, and I agree with him. There are sound theoretical reasons and growing empirical evidence to suggest that the lower end of the frequency spectrum is enriched for large-effect disease risk variants, precisely those variants that will be most useful for making predictions about individual disease risk. These variants will only be identified by deep resequencing of large cohorts of disease patients and controls, with the "extreme cohort" approach advocated by Goldstein representing a particularly powerful strategy. I'll have a lot more to say about hunting rare variants over the next couple of weeks.

In addition to missing extremely rare variants, the short-read sequencing technology providing the backbone of the Project's pipeline will also struggle with the regions representing the real "dark matter" of the genome: the highly repetitive regions, constituting perhaps 10-15% of human DNA, which are largely untouchable by short-read platforms. Improved sequencing technology will dig further into these areas over the next twelve months, but it's likely that many unmappable regions will persist well beyond that. It's currently unclear how much functionally important variation exists in these regions - however, no catalogue of variation that excludes them can realistically be said to be "comprehensive".

That's not to say that the Project is not an important step in the right direction - on the contrary, the data emerging from the Project over the next twelve months will be incredibly useful in many areas of human genetics (e.g. nailing down the causal variants in regions highlighted by genome-wide scans for disease risk, normal variation, and recent natural selection; and cataloguing the "pretty rare" variants between 0.5 and 5% frequency for the next generation of genome scans). But it is still important to emphasise that the map of variation generated by the Project will still contain important dark areas - and that there's plenty of work left for human geneticists to do.

(As an aside, what's with the picture associated with the Nature News article? That's a pretty limited slice of human genetic diversity right there...)

Subscribe to Genetic Future.

More like this

Knome offers sequencing of all of your protein-coding genes for $24,500

Personal genomics is a rapidly evolving game, with a clear end goal in sight: offering consumers an accurate, affordable and complete genome sequence, and providing them with tools to dig out the useful nuggets of information contained therein. That goal remains out of reach, and while DNA…

David Goldstein on the failures of genome-wide association studies

The genome-wide association study has been the technique du jour in human genetics for much of the last two years. It's a pure brute force approach, surveying up to a million sites of common variation throughout the genomes of thousands of people at a time, some of whom suffer from a particular…

Peering into the Genetic Future: trends in human genomics in 2009

Well, it's a little late, but I finally have a list of what I see as some of the major trends that will play out in the human genomics field in 2009 - both in terms of research outcomes, and shifts in the rapidly-evolving consumer genomics industry. For genetics-savvy readers a lot of these…

Sequencing one genome at a time is so last week

Last week I posted on the publication of three papers in Nature describing whole-genome sequencing using next-generation technology: one African genome, one Asian genome, and two genomes from a female cancer patient (one from her cancer cells and one from healthy skin tissue). At the end of that…

the short-read sequencing technology providing the backbone of the Project's pipeline will also struggle with the regions representing the real "dark matter" of the genome: the highly repetitive regions

Agreed. I think we're going to find that structural variation plays more of a role than anyone imagined 4 or 5 years ago. Besides the obvious deleted, truncated, or fused genes, there are all sorts of more subtle dosage effects to consider. No one technology is going to address all this.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

EPA Reconsiders Its Biden Ban On Asbestos Everywhere

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…