Will the Cloud Save Genomics?

By mikethemadbiologist on August 10, 2011.

I've blogged before about some of the technical issues surrounding how we can handle the massive increase in the size of genomics datasets. There's also a need to grapple with the analytical aspects of all of these data:

So, from a bacterial perspective, genome sequencing is really cheap and fast--in about a year, I conservatively estimate (very conservatively) that the cost of sequencing a bacterial genome could drop to about $1,500 (currently, commercial companies will do a high-quality draft for around $5,000- $6,000). We are entering an era where the time and money costs won't be focused on raw sequence generation, but on the informatics needed to build high-quality genomes with those data.

Titus Brown does the math and then puts the issue very succinctly:

The bottom line is this: when your data cost is decreasing faster than your hardware cost, the long-term solution cannot be to buy, rent, borrow, beg, or steal more hardware. The solution must lie in software and algorithms.

He argues that proponents of cloud computing as our salvation must be relying on something else:

People who claim that cloud computing is going to provide an answer to the scaling issue with sequence, then, must be operating with some additional assumptions. Maybe they think the curves are shifted relative to one another, so that even 1000x costs are not a big deal - although figure 1 sort of argues against that. Like me, maybe they've heard that hard disks are about to start scaling way, way better -- if so, awesome! That might change the curves for data storage, if not analysis. Perhaps their research depends on using only a bounded amount of sequence -- e.g. single-genome sequencing, for which you can stop generating data at a certain point. Or perhaps they're proposing to use algorithms that scale sub-linearly with the amount of data they're applied to (although I don't know of any). Or perhaps they're planning for the shift in Moore's Law behavior that will come when that Amazon and other cloud computing providers build self-replicating compute clusters on the moon (hello, exo-atmospheric computing!) Whatever the plan, it would be interesting to hear their assumptions explained.

That could be, but I think it has more to do with the basic reality that very few groups are currently analyzing massive datasets. Simply put, we haven't realized the problems because we're just now blundering into them.

Regardless, Brown is right: we will be spending a lot more money on software and people than sequencing, cloud or not.

More like this

I stated as early as in 2008 in peer-reviewed science papers (The Principle of Recursive Genome Function, and Cold Spring Harbor Lab presentation) as well as popularized in Google Tech YouTube "Is IT ready for the Dreaded DNA Data Deluge" http://www.youtube.com/watch?v=WJMFuc75V_w the same that Mike underscores: the bottleneck is NOT Information TECHNOLOGY but Information Theory of fractal iterative recursion of genome regulation. Prediction at 30:00 min. of the YouTube to show that unregulated cancerous fractal growth is caused by aberrant methylation of intergenic (formerly, "junk") supplementary info lends itself to software-enabling algorithms, based on a crisp mathematical understanding of recursive genome function. With the availability (soon a veritable avalanche) of both cancerous and intact DNA data-sets (showing methylation status e.g. by PacBio sequencing technology) HolGenTech deploys defense-validated High-Performance-Computing platforms in private clouds to run software based on advanced algorithms.

thanks useful entry

Perhaps we'll see a shift in the way sequencing-based science is done. Rather than a "more is better", sequence everything you can approach, we'll see people only actually sequencing what they can analyze. Which is something I'd like to see anyway. I'm a bit tired of "Wellâ¦there's the sequence" based publications.

Thanks, "jigolo". "Confounding", you are right, and more. Not only one can get "a bit tired" of sequences without analytics, but the entire (mega-billion-dollar) sequencing industry might simply become unsustainable without matching investments into analytics. We are talking about the supply-demand balance of Industrialization of Genomics. My favorite metaphor is "imagine the nonsense of Ford's assembly line of automobiles - but only a few dirt roads here and there with zero gas stations". The cloud is only a band-aid, such that the oozing sequences could be stored somewhere, but in itself computing ability will never solve the problem of intrinsic (fractal) mathematics of genome regulation.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…