scio10: Science in the Cloud

By cpikas on January 16, 2010.

John Hogenesch, Assistant Professor of Pharmacology - Penn School of Med

gene-at-a-time is giving way to genome wide - larger datasets, collaborative research

last year more added to genebank than all previous years combined (wow!) - exceeds Moore's law.

Academia responds by buying storage and clusters - but you need great IT staff - and it's really hard to get and keep them (they go to industry), heating & cooling, depreciation, usage/provisioning (under/over utilized). Larger inter-institutional grids - access is tightly regulated, they are very complex to program in/for

Cloud computing: software as a service, infrastructure as a service, platform as a service

They use SAAS for collaboration - basecamp from 37 signals. Collaborating with multiple labs, multiple people. Compare $50/month with no IT support costs to sharepoint $1k server, $500 license, admin 5% effort $2k.

IAAS for proteomics - example - search complex samples over 6 frame translated genome. They provisioned 20 AWS nodes, running windows, conducted over 7 days at a cost of $1400.

In genomics - lots of recent publications using cloudburst, crossbow (?), and hadoop for blast/blat/r scripts....

BLAT on AWS - using CloudCrowd (NY Times alternative to hadoop), provisioned 20 large memory instances of ubuntu, 85% of sequences were mapped, ~72 hours/$424 (experiments cost $30k with machine and reagents and all - so over the course of the 30 you can do in a year, 600k savings)

q: how much programming to get it ready to go on AWS?

a: about 8 hours with a somewhat experienced programmer - a very experienced on could do it in 1-2hours - programming is done in Ruby

PAAS - aggregating clouds - genome wide screen for modifiers of the circadian clock , 300 found, (Zhang et al Cell, 2009), gene cetric data integration - go to each data site and search for your gene and then compile. ID/synonym resolution is hard. BioGPS - federated search of these gene sources - URL based scheme, extensible. Puts results from different sources in boxes on BioGPS. Has a catalog search so you can see if you can buy from Invitrogen (sponsor, thank you!) and others. (http://biogps.gnf.org/circadian)

PAAS use case - publishing in the cloud - Plos Currents Influenza. pmids used for references, google knol to write, moderators decide suitable/unsuitable - not review. PLOS will consider expanded versions in their pubs. ~52 publications so far. Example has been viewed 7k times.

q: biobase - only mammalian?

a: yes, but code is available (.net) so you could customize

q: small vs. large institutions - does this help people who are under resourced for equipment

with this we can give you the algorithm and then you could run it on the same service - so this is different from just sharing algorithms

q: writing grants etc. how does that go with cloud services?

a: capital costs (buying servers) is typically out of a different bucket so this might complicate. Some in the room have had success, no problems. Some have met skepticism. In the UK they're very concerned about the PATRIOT act provisions.

q: do you need an AWS specialist

a: they had someone with an MS in bioinformatics and a bs in bio - picked up how to do the first in a week, second done in 8 hours. Could probably replace that person fairly easily

q: concern with using a free service online - stability/preservation of data

a: test to see about getting data out after you set up an account, if super important then host on your own site

q: using these in teaching?

a: using wave, using pbwiki, using blackboard, using open wetware wiki, (i use OneNote), also googledocs (they tried wikis first, didn't fly, googledocs works well for them)

q: proportion of work done in cloud vs. local computing resources

q: boundaries of the institution

a: now either academic or industrial - so this will probably allow independent investigators again, rent some lab time, rent some computing time and then prototype something. Can also use publically available data - always lots more things to find/use it for than just what originators foresaw

More like this

This is an important development. I see the power of information in the cloud extending to all areas of research, and even medical diagnosis. Thanks for the post.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Universities Can Agree On All Hate Speech Except Antisemitism

More by this author

Yeah, me too.

August 2, 2010

I'm also leaving ScienceBlogs, but it's not for the reasons some others have given. I don't think Pepsi's blog will hurt my real life reputation and besides, it's been pulled, there have been apologies - it's time to forgive. July was the first month I've gotten enough hits to get a paycheck - and…

Very cool - American Physical Society offers free access to public libraries

July 29, 2010

This APS rocks! Here's the press release from PAMnet: FOR IMMEDIATE RELEASE APS ONLINE JOURNALS AVAILABLE FREE IN U.S. PUBLIC LIBRARIES Ridge, NY, 28 July 2010: The American Physical Society (APS) announces a new public access initiative that will give readers and researchers in public libraries…

Michael Pater, Connecticut artist, died today

July 25, 2010

He was also my husband's uncle. I only found two of his images online, the remainder are photographs of prints we have on our walls - intentionally poor quality for those. He was a member of the Lyme Art Association, so there may be more information on their site. The Courant (Hartford, CT)…

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

July 25, 2010

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific…

Well, sometimes you just have to Google it

July 21, 2010

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work…