Jan Aerts discusses the problem of incorporating information on large-scale genomic rearrangements into genome browsers.
Genome browsers such as UCSC and Ensembl are fantastic for presenting many types of genomic information, such as the position and orientation of protein-coding genes or the location of small-scale genetic variants. However, their linear arrangement makes it very difficult to use them to present large-scale genetic variations (duplications, deletions or inversions). This is a serious problem, as these types of genetic variants are surprisingly common in the human genome, and will be emerging at a frightening pace over the next year or two from large-scale human sequencing efforts such as the 1000 Genomes Project.
Jan notes one alternative visualisation strategy: de Bruijn graphs, a representation of sequence structure that is also used in the popular genome assembly algorithm Velvet. I’ve stolen one of Jan’s figures to illustrate what these graphs look like [added in edit: I also recoloured the figure to make it a little clearer]:
These two images are simply two different ways of presenting the same information. The boxes represent blocks of sequence, and the coloured lines represent alternative arrangements of that sequence that might be present in two different individuals. In both cases, the blue individual has an inversion of the green sequence block relative to the red individual, and also contains an extra chunk of sequence (purple) not present in the red individual. The two pictures differ only in terms of which of the two individuals is selected to be the linear “reference” sequence.
Will this approach end up becoming the default visualisation technique for structural variation? I don’t know, but certainly we’re all going to have to get used to some sort of similar approach to depicting large-scale rearrangements in a genome browser.