In the cloud, Next Gen DNA sequencing computes

By sporte on January 8, 2010.

These days, DNA sequencing happens in one of three ways.

In the early days of DNA sequencing (like the 80's), labs prepared their own samples, sequenced those samples, and analyzed their results. Some labs still do this.

Then, in the 90's, genome centers came along. Genome centers are like giant factories that manufacture sequence data. They have buildings, dedicated staff, and professional bioinformaticians who write programs and work with other factory members to get the data entered, analyzed, and shipped out to the databases. (You can learn more about this and go on a virtual tour in this nice video from Washington University).

At the same time, Universities expanded their core service laboratories and these labs began offering a greater number of sequencing services. Today, much the non-genome center sequencing happens in core labs. Scientists obtain samples and send them to the core labs. The core lab staff prepare the samples, carry out the sequencing reactions, and deliver data to their customers.

This system worked fine until the Next Generation DNA sequencing (NGS) instruments came along.

This January issue of Nature Biotechnology has two articles (1, 2) that address the role cloud computing can play in helping smaller laboratories cope with the large volumes of data produced by NGS.

As noted in the editorial (1):

Next-generation sequencers produce a prodigious stream of data. A single Illumina instrument, for example, can generate up to 90 billion bases per run. This represents terabytes of raw image data that require at a minimum 4 GB of RAM and 750 GB of local storage capacity to carry out the data handling and analysis.

And:

Whereas genome centers are set up to deal with such gargantuan files, most academic laboratories are in a completely different situation. They have no large central computing pool and data storage capacity. They are more likely to generate data in an ad hoc manner, rather than in a steady stream amenable to an automated data management pipeline. And they often lack sequencing specialists and support staff working under the same roof who can create software tailored to their needs and solve computational problems.

Cloud computing is one answer to that problem.

Although the article (2) places a strong emphasis on security concerns related to cloud computing, it does a good job describing how Amazon has worked to make the cloud comply with the Health Insurance Portability and Accountability Act (HIPAA). The article also presents a table of cloud service providers.

Interestingly, Geospiza is the only company in the table that offers a software system for dealing with the LIMS needs and analyzing Next Generation DNA sequencing data. All the others are part of the cloud infrastructure.

I can't offer an unbiased opinion because I've worked at Geospiza, but I can attest that the cloud works well. I used an early version of the system last spring when I was writing an article for Current Protocols in Bioinformatics. You can even take a look at some of the results in Geospiza's data center. The information for logging in is on that page.

I did the analysis by getting both Illumina and ABI SOLiD data sets from the NCBI. I used Geospiza's web interface to upload the data somewhere in the cloud, selected an alignment algorithm and a reference data set, and waited a few hours for the analysis to complete. It was pretty straightforward. I didn't need to get a new electrical system or new hardware or even write any programs.

I can hardly wait to try out the new stuff for looking at allele-specific expression, SNPs, and mapping splice junctions (see a picture).

References:
1. Clare Sansom. Up in a cloud. 2010. Nature Biotechnology 28, 13 - 15.

2. Gathering clouds and a sequencing storm 2010. Nature Biotechnology 28, 1.

3. Porter, S., Olsen, N., and T. Smith. 2009. Analyzing Gene Expression Data from Microarray and Next‐Generation DNA Sequencing Transcriptome Profiling Assays Using GeneSifter Analysis Edition. Current Protocols in Bioinformatics. DOI: 10.1002/0471250953.bi0714s27

More like this

Next Generation Sequencing adds thousands of new genes

I had the good fortune on Thursday to hear a fascinating talk on deep transcriptome analysis by Chris Mason, Assistant Professor, at the Institute for Computational Biomedicine at Cornell University. Several intriguing observations were presented during the talk. I'll present the key points…

Next Generation DNA Sequencing does more than sequence DNA

You might think the coolest thing about the Next Generation DNA Sequencing technologies is that we can use them to sequence long-dead mammoths, entire populations of microbes, or bits of bone from Neanderthals. But you would be wrong. Sure, those are all cool things to do, but Next Generation…

Will Cloud Computing Help Genomics Handle Post-Moore's Law Data Loads?

Genome Biology recently published a review, "The Case for Cloud Computing in Genome Informatics." What is cloud computing? Well: This is a general term for computation-as-a-service. There are various different types of cloud computing, but the one that is closest to the way that computational…

Bioinformatics Summit proceedings and Next Generation DNA sequencing

Last spring, I gave my first hands-on workshop in working with Next Generation Sequencing data at the Eighth Annual UT-ORNL-KBRIN Bioinformatics Summit at Fall Creek Falls State Park in Tennessee. The proceedings from that conference are now on-line at BMC Bioinformatics and it's fun to look back…

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been reposted at…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a short tutorial with…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I teach a…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…