Christina's LIS Rant

scio10: Science in the Cloud

John Hogenesch, Assistant Professor of Pharmacology – Penn School of Med

gene-at-a-time is giving way to genome wide – larger datasets, collaborative research

last year more added to genebank than all previous years combined (wow!) – exceeds Moore’s law.

Academia responds by buying storage and clusters – but you need great IT staff – and it’s really hard to get and keep them (they go to industry), heating & cooling, depreciation, usage/provisioning (under/over utilized). Larger inter-institutional grids – access is tightly regulated, they are very complex to program in/for

Cloud computing: software as a service, infrastructure as a service, platform as a service

They use SAAS for collaboration – basecamp from 37 signals. Collaborating with multiple labs, multiple people. Compare $50/month with no IT support costs to sharepoint $1k server, $500 license, admin 5% effort $2k.

IAAS for proteomics – example – search complex samples over 6 frame translated genome. They provisioned 20 AWS nodes, running windows, conducted over 7 days at a cost of $1400.

In genomics – lots of recent publications using cloudburst, crossbow (?), and hadoop for blast/blat/r scripts….

BLAT on AWS – using CloudCrowd (NY Times alternative to hadoop), provisioned 20 large memory instances of ubuntu, 85% of sequences were mapped, ~72 hours/$424 (experiments cost $30k with machine and reagents and all – so over the course of the 30 you can do in a year, 600k savings)

q: how much programming to get it ready to go on AWS?

a: about 8 hours with a somewhat experienced programmer – a very experienced on could do it in 1-2hours – programming is done in Ruby

PAAS – aggregating clouds – genome wide screen for modifiers of the circadian clock , 300 found, (Zhang et al Cell, 2009), gene cetric data integration – go to each data site and search for your gene and then compile. ID/synonym resolution is hard. BioGPS – federated search of these gene sources – URL based scheme, extensible. Puts results from different sources in boxes on BioGPS. Has a catalog search so you can see if you can buy from Invitrogen (sponsor, thank you!) and others. (http://biogps.gnf.org/circadian)

PAAS use case – publishing in the cloud – Plos Currents Influenza. pmids used for references, google knol to write, moderators decide suitable/unsuitable – not review. PLOS will consider expanded versions in their pubs. ~52 publications so far. Example has been viewed 7k times.

q: biobase – only mammalian?

a: yes, but code is available (.net) so you could customize

q: small vs. large institutions – does this help people who are under resourced for equipment

with this we can give you the algorithm and then you could run it on the same service – so this is different from just sharing algorithms

q: writing grants etc. how does that go with cloud services?

a: capital costs (buying servers) is typically out of a different bucket so this might complicate. Some in the room have had success, no problems. Some have met skepticism. In the UK they’re very concerned about the PATRIOT act provisions.

q: do you need an AWS specialist

a: they had someone with an MS in bioinformatics and a bs in bio – picked up how to do the first in a week, second done in 8 hours. Could probably replace that person fairly easily

q: concern with using a free service online – stability/preservation of data

a: test to see about getting data out after you set up an account, if super important then host on your own site

q: using these in teaching?

a: using wave, using pbwiki, using blackboard, using open wetware wiki, (i use OneNote), also googledocs (they tried wikis first, didn’t fly, googledocs works well for them)

q: proportion of work done in cloud vs. local computing resources

q: boundaries of the institution

a: now either academic or industrial – so this will probably  allow independent investigators again, rent some lab time, rent some computing time and then prototype something. Can also use publically available data – always lots more things to find/use it for than just what originators foresaw

Comments

  1. #1 Matthew Putman
    January 16, 2010

    This is an important development. I see the power of information in the cloud extending to all areas of research, and even medical diagnosis. Thanks for the post.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.