Next-generation genome browsers

By dgmacarthur on March 9, 2009.

Jan Aerts discusses the problem of incorporating information on large-scale genomic rearrangements into genome browsers.

Genome browsers such as UCSC and Ensembl are fantastic for presenting many types of genomic information, such as the position and orientation of protein-coding genes or the location of small-scale genetic variants. However, their linear arrangement makes it very difficult to use them to present large-scale genetic variations (duplications, deletions or inversions). This is a serious problem, as these types of genetic variants are surprisingly common in the human genome, and will be emerging at a frightening pace over the next year or two from large-scale human sequencing efforts such as the 1000 Genomes Project.

Jan notes one alternative visualisation strategy: de Bruijn graphs, a representation of sequence structure that is also used in the popular genome assembly algorithm Velvet. I've stolen one of Jan's figures to illustrate what these graphs look like [added in edit: I also recoloured the figure to make it a little clearer]:

i-95ea040a8ae2e005df13cf8a6f68633f-cnv_1_recoloured.jpg

These two images are simply two different ways of presenting the same information. The boxes represent blocks of sequence, and the coloured lines represent alternative arrangements of that sequence that might be present in two different individuals. In both cases, the blue individual has an inversion of the green sequence block relative to the red individual, and also contains an extra chunk of sequence (purple) not present in the red individual. The two pictures differ only in terms of which of the two individuals is selected to be the linear "reference" sequence.

Will this approach end up becoming the default visualisation technique for structural variation? I don't know, but certainly we're all going to have to get used to some sort of similar approach to depicting large-scale rearrangements in a genome browser.

Subscribe to Genetic Future.

More like this

Mutation

Where the variation comes from. Evolution proceeds by the action of many different evolutionary forces on heritable variation. Natural selection leads to the increase in frequency of variation that allows individuals to produce more offspring who, themselves, produce offspring. Genetic drift…

Can't find your disease gene? Just sequence them all...

A paper just published online in Nature Genetics describes a brute force approach to finding the genes underlying serious diseases in cases where traditional methods fall flat. While somewhat successful, the study also illustrates the paradoxical challenge of working with large-scale sequencing…

Cold Spring Harbor Biology of Genomes meeting

I'll be spending the next few days at the Biology of Genomes meeting at Cold Spring Harbor, NY - one of the most awaited events on the genomics calendar. I plan to blog here about the major themes emerging from the meeting; you can also follow me on Twitter if you want shorter, punchier updates,…

Finding disease mutations in a sea of noise

Jones et al. (2009). Exomic Sequencing Identifies PALB2 as a Pancreatic Cancer Susceptibility Gene. Science DOI: 10.1126/science.1171202 A paper published online today in Science illustrates both the potential and the challenges of using large-scale DNA sequencing to identify rare genetic variants…

The blue individual has an inversion compared to red in each diagram, but I don't think the the two blue individuals (or the two red individuals) are identical across diagrams.

I discussed using de Bruijn graphs (and other complexities) in aligners about a year ago on my blog, When stars align. Thanks for the kind comments about my blog last week.

Hi Andro,

The original shading scheme of the first block was unnecessarily confusing (it made it look as though it was in the same orientation in both images, when actually it's inverted) - I've recoloured the figure and added notation to the ends of the blocks to make it easier to follow.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…