Major themes from Biology of Genomes meeting

By dgmacarthur on May 8, 2009.

It's difficult to distill down a meeting as data-rich as the Cold Spring Harbor Biology of Genomes meeting, but here's a first-pass attempt.

We're sequencing lots of people
One of the highlights of the meeting was the update on progress from the 1000 Genomes (1KG) Project. I was fortunate enough to have been given a sneak peek at the data at the 1KG satellite meeting earlier in the week (which you can download yourself if you're so inclined), but it was still impressive to see it all put together in the presentation today by Goncalo Abecasis.

Abecasis reported on the data emerging from the three pilot projects of 1KG: a very high-resolution analysis of six individuals (three individuals each from a European and a West African family); a much lower-resolution scan across the genomes of 180 individuals (60 Europeans, 60 West Africans, and 60 East Asians); and a targeted analysis of 1000 randomly selected genes in several hundred individuals from multiple populations.

The data emerging from the pilot projects are still pretty raw, but the numbers are impressive: the project has already identified over 20 million single-base variants (SNPs), over 11 million which are completely novel; 40,000 short insertion/deletion polymorphisms; and over 4,000 larger structural rearrangements of DNA.

There's much more to come: the project is scaling up to generate low-coverage sequence data for 1,200 individuals by the end of 2009, and may expand this set of samples to incorporate additional populations in 2010. As befits a project seeking to create a resource for the broader genetics community, the data will be made publicly available as it is generated.

Lots of sequence data is useful
The catalogue of human genetic variants created by the 1KG project will be of immediate benefit to researchers working on the genetic basis of complex diseases. Gil McVean spelled out how this will work by applying early 1KG data to results from the Wellcome Trust Case Control Consortium using the process of genotype imputation.

Genotype imputation starts by using a reference panel with very high-resolution genetic data to define the patterns of association between nearby variants. That information can then be applied to a set of disease cases and healthy controls that have been genotyped at only a small subset of those variants; using the association data from the reference panel it is possible to impute the genotypes of these individuals at many other sites. Like magic, genotype imputation allows you to "see" genotypes at millions of sites throughout the genome that were never directly typed experimentally.

I'll talk more about the details of McVean's results later; for now, suffice it to say that he showed that imputation using 1KG sequence data as a reference can add non-trivial value to the results of existing genome-wide association studies - value that will only increase as the number of individuals sequenced in the project increases.

Adding functional information to sequence data
Vast amounts of sequence data will have only limited value unless we can come up with ways of figuring out exactly which sites in the genome are actually functional, and of predicting what effects genetic variation can have on human physical variation and disease risk.

Several talks approached the functional annotation of the human genome from a variety of angles. Stephen Mongomery and Tony Kwan both discussed approaches to pin down the specific genetic variants that affect the levels of gene expression; such variants are excellent candidates for playing a role in other human traits. David Goode presented an analysis combining data on human genetic variation in regions that are conserved over deep evolutionary time, which suggested that the vast majority of genetic variants with functional effects reside outside the protein-coding regions of genes. Figuring out exactly which sites in the genome are actually under deep evolutionary constraint is non-trivial, but some extremely clever approaches to inferring this using low-coverage genome sequences from 29 mammals were presented by Adam Siepel yesterday.

All of these approaches are interesting to anyone thinking about taking full advantage of personal genome sequences: how can we figure out which of the millions of genetic variants present in an individual's genome actually have an impact on function and disease risk? Combining information from multiple sources will be essential to answering this question.

Subscribe to Genetic Future.

More like this

"Aside from the intrinsic coolness of these studies, understanding the genetic architecture of traits in other species gives us insight into our own genome."

That's funny, because I tend to think of all this human genomic work as stuff that might ultimately be of use in annotating the ant genomes that will be coming out in the next few years.

> which suggested that the vast majority of genetic variants with functional effects reside outside the protein-coding regions of genes.

This small sentence will mean a big shift in the way most of us (outside of genomics, anyway) think about how allelic variation affects phenotype. Sounds like Nature's way of keeping biologists employed for a long time to come.

Re: genomic imputation: is this simply one of the ideas behind the HapMap, phrased (perhaps clumsily) as "large blocks of DNA tend to stay together through meiosis, with points of separation marked by recombination hotspots"?

Hey AMac,

Yep, that's pretty much it; because those SNPs tend to be inherited together, you only need to actually look at a couple of them to be able to predict the sequence at other nearby variants.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Glyphosate reduces soil biodiversity and decreases the proportion of native species (French)

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…