Genetic Future

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe

The idea is very simple:
  1. Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped);
  2. Install the plug-in from here and point it to your 23andMe data;
  3. Browse to a website discussing one of the genetic variants included on the 23andMe chip, and you’ll see highlights around the rsID of any variant on the page (rsIDs are unique codes assigned by dbSNP to most of the common variants targeted by personal genomics companies);
  4. Mouse over the rsID and your own genotype for that SNP will appear.
For any 23andMe user who’s ever come across a variant on PubMed and wondered what their own genotype was, then gone through the process of logging into 23andMe and checking, the value of this tool is immediately obvious.
Here’s a screenshot using my own data:


i-1b5e00a38963a4f8258ce798a213ef10-snptips_screenshot.jpg

SNPtips creator Andrew Evans has a blog post up explaining the rationale behind the project. I spoke to Evans by email earlier this week, and he told me that future plans for the tool include development for Chrome, extension to data-sets from other companies such as deCODEme and Navigenics, and provision for viewing data from multiple individuals (which will be useful for those with multiple genotyped family members, or for groups like Genomes Unzipped).
As more people gain access to increasingly more comprehensive information about their own genome, online tools will become essential for navigating the data rapids. This is a small but very useful step in that direction.

Comments

  1. #1 Adam
    January 11, 2011

    I love this idea. Is it 23andme only, or does it work with decodeme too?

  2. #2 Daniel MacArthur
    January 11, 2011

    Hey Adam,

    Currently 23andMe only, but they have plans to extend it to other companies soon. Of course, for the impatient and informatics-savvy deCODEme customer it would be easy to reformat your deCODEme file as a 23andMe file, which would work just as well.

    Beyond that – Luke has imputed everyone in Genomes Unzipped onto HapMap 2, and I’m planning to try those files out later this week.

  3. #3 Keith Grimaldi
    January 11, 2011

    Neat tool! Re the AT/GC flip tweeted about, is that referring to the cases where the common use of the SNP is the opposite of the result provided by 23andMe? As in your (coincidental?) example – in dbSNP and most papers the LCT SNP is C>T (so you would be TT, lactose persistent)?

  4. #4 pconroy
    January 11, 2011

    I guess I’ll have to wait till the Chrome extension comes out then.

    Who uses IE or Firefox anymore??

  5. #5 Daniel MacArthur
    January 11, 2011

    Keith: yes, that’s correct. Users will just have to learn how to do strand flipping in their heads – the only problem will be A/T and C/G SNPs, but fortunately these are reasonably rare (0.75% of SNPs on the 23andMe v2 chip and 1.5% of SNPs on the v3 chip). Alternatively, geneticists could agree to adopt a consistent standard for strand representation (cue hysterical laughter).

    Who uses IE or Firefox anymore??

    20.6% and 42.0% of ScienceBlogs readers, respectively. Chrome comes in third with a 17.9% market share. But that said, Chrome support is the first thing I asked about. :-)

  6. #6 Neil
    January 12, 2011

    Re: reverse complementing SNPs

    In general, Illumina GWAS chips (unlike Affymetrix’s) purposefully only include A/C, A/G, C/T and G/T SNPs, so the small number of A/T and C/G SNPs reported by 23andMe is from their custom component on top. Therefore, that strand flipping is not a problem is an artifact of the technology used.

    Alternatively, geneticists could agree to adopt a consistent standard for strand representation (cue hysterical laughter).

    To take the bait (and as Daniel knows full well) the obvious method would be to report only on the forward strand, but that implies a fixed genome assembly. So, a large chunk of lab work is still done on assembly NCBI36, as that is what HapMap’s data has been reported on. (And people don’t know how to cope with alternate named haplotypes in GRCh37, but that’s a different story).

    The alternative to fixing the assembly, is to fix which allele to report first by sequence context – Illumina’s version of that is online.

    Nice ;-)

  7. #7 Daniel MacArthur
    January 12, 2011

    Hi Neil,

    Yes, I should have specified that the AT/GC problem will be more of an issue for other chips (e.g. Navigenics, which last time I checked used an Affy 6.0 array).

    And all fair points on the strandedness issue; strandedness has long been the bane of many a geneticist’s existence, but I agree that’s because it’s a genuinely difficult problem to solve.

  8. #8 Luke
    January 12, 2011

    I agree that’s because it’s a genuinely difficult problem to solve.

    No! No its not! This isn’t 1999! We’re not talking about the variation in the long nosed weeve! The vast majority of the human genome does not change strand between builds, and thus a given SNP is very likely to change strand. Report on the FORWARD STRAND. Always the FORWARD STRAND. There is never any good reason, in this day and age, to report a variant on the negative strand.

    The solution is certainly not to chuck away a third of human genetic variation because it doesn’t fit your format requirements!

  9. #9 Neil
    January 12, 2011

    Luke @8

    The Illumina GWAS chips do not assay A/T and C/G SNPs because it is cheaper not to – or, if you prefer, they have chosen to double the SNP density by not doing so:

    http://www.illumina.com/documents/products/technotes/technote_iselect_design.pdf

    Bead Type Definition

    Depending on the type of SNP or marker being assayed, the Infinium HD Assay uses one of two probe (or bead type) designs, Infinium I or Infinium II. The Infinium II probe design, which stops at the base before the SNP of interest, uses only one probe per loci (i.e., one probe for both alleles). This probe design is suitable for the majority of loci in most organisms. Infinium I probe design is required for relatively less common A/T and C/G SNPs and requires two probes (or bead types) per SNP because the probe stops at the base representing the SNP of interest (i.e., one probe for each of both alleles).

    Yes, I should get out more.

  10. #10 Andrew Evans
    January 12, 2011

    Hi everyone –

    Great feedback!

    Yes, Chrome is definitely high on the port list, as Daniel points out. So far, it’s the top vote in our informal survey from the release. Safari is a distant second, and IE is last with zero votes.

    On the strand issues – these break down into two categories, basically – which strand the SNP is reported on, and whether the SNP is prone to ambiguity (A/T and C/G). Each probably calls for a different solution. 23andMe data is normalized to the + strand always, so we’ve taken the tack in this first release to just report it as-is, and assume that people will do whatever they normally do with their raw 23andMe data anyway (flip strands mentally, look up in dbSNP, etc.). We are looking at ways, however, to make this friendlier in future releases – at a minimum, we could flag the A/T and C/G SNPs so people have some warning when dealing with them. For strand orientation, we are contemplating ways to show dbSNP orientation alongside the native data, etc. This is a bit tricky because it either requires pre-processing of raw data or real-time API lookup, each of which has disadvantages.

    DECODEme will probably be the platform we support next, just because it’s one we are familiar with (so if you want Navigenics, or any other platform, *please* point us to some good data files to use for testing! email snptips@5amsolutions.com if you don’t mind). DECODEme conveniently puts strand orientation per SNP right in the file, even though everything isn’t + strand normalized.

    Do stay tuned to SNPTips and 5AM Solutions – we have more cool things in the works…

    Best,
    Andrew

  11. #11 Luke
    January 13, 2011

    @Neil 9

    Increasingly SNPs are included on chips because they are the best tags for known effects, or because they are rare SNPs that cannot be tagged any other way, meaning that the GC/AT rate continues to rise. The illumina 1M chip has 0.4% GC/AT SNPs, whereas the 2.5M has about 2.5%. Plus, illumina genotyping is far from the only technology that calls variants, and sequencing calls are going to get increasingly common.

    Now is the time to acknowledge that times have changed – with a stable genome, there is no good reason to not report something on the forward strand, and the future will thank you for doing so.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.