Next-generation genome browsers

Jan Aerts discusses the problem of incorporating information on large-scale genomic rearrangements into genome browsers.

Genome browsers such as UCSC and Ensembl are fantastic for presenting many types of genomic information, such as the position and orientation of protein-coding genes or the location of small-scale genetic variants. However, their linear arrangement makes it very difficult to use them to present large-scale genetic variations (duplications, deletions or inversions). This is a serious problem, as these types of genetic variants are surprisingly common in the human genome, and will be emerging at a frightening pace over the next year or two from large-scale human sequencing efforts such as the 1000 Genomes Project.

Jan notes one alternative visualisation strategy: de Bruijn graphs, a representation of sequence structure that is also used in the popular genome assembly algorithm Velvet. I've stolen one of Jan's figures to illustrate what these graphs look like [added in edit: I also recoloured the figure to make it a little clearer]:

i-95ea040a8ae2e005df13cf8a6f68633f-cnv_1_recoloured.jpg 

These two images are simply two different ways of presenting the same information. The boxes represent blocks of sequence, and the coloured lines represent alternative arrangements of that sequence that might be present in two different individuals. In both cases, the blue individual has an inversion of the green sequence block relative to the red individual, and also contains an extra chunk of sequence (purple) not present in the red individual. The two pictures differ only in terms of which of the two individuals is selected to be the linear "reference" sequence.

Will this approach end up becoming the default visualisation technique for structural variation? I don't know, but certainly we're all going to have to get used to some sort of similar approach to depicting large-scale rearrangements in a genome browser.

Subscribe to Genetic Future.

More like this

Where the variation comes from. Evolution proceeds by the action of many different evolutionary forces on heritable variation. Natural selection leads to the increase in frequency of variation that allows individuals to produce more offspring who, themselves, produce offspring. Genetic drift…
A paper just published online in Nature Genetics describes a brute force approach to finding the genes underlying serious diseases in cases where traditional methods fall flat. While somewhat successful, the study also illustrates the paradoxical challenge of working with large-scale sequencing…
I'll be spending the next few days at the Biology of Genomes meeting at Cold Spring Harbor, NY - one of the most awaited events on the genomics calendar. I plan to blog here about the major themes emerging from the meeting; you can also follow me on Twitter if you want shorter, punchier updates,…
Jones et al. (2009). Exomic Sequencing Identifies PALB2 as a Pancreatic Cancer Susceptibility Gene. Science DOI: 10.1126/science.1171202 A paper published online today in Science illustrates both the potential and the challenges of using large-scale DNA sequencing to identify rare genetic variants…

The blue individual has an inversion compared to red in each diagram, but I don't think the the two blue individuals (or the two red individuals) are identical across diagrams.

By Andro Hsu (not verified) on 09 Mar 2009 #permalink

Hi Andro,

The original shading scheme of the first block was unnecessarily confusing (it made it look as though it was in the same orientation in both images, when actually it's inverted) - I've recoloured the figure and added notation to the ends of the blocks to make it easier to follow.