My Genius Prediction About the Short Read Archive: TOO MANY DATAZ!

By mikethemadbiologist on March 4, 2011.

The National Center for Biotechnology Information (NCBI) recently announced that it will shut down the Short Read Archive (SRA). The SRA stored the semi-processed data for genomics projects, so researchers could examine the raw data for a genomics project. The reason given by NCBI is "budget constraints." While I'm saddened by this, I'm not surprised, since the volume of data produced by a single genome center is tremendous, to the point where the storage and data upload are prohibitive:

when several centers were collaborating to test new sequencing technologies, the data were so large, they actually shipped hard drives to each other to compare results. Well, that's what might have to happen to upload data:
If cloud computing is to work for genomics, the service providers will have to offer some flexibility in how large datasets get into the system. For instance, they could accept external disks shipped by mail the way that the Protein Database once accepted atomic structure submissions on tape and floppy disk. In fact, a now-defunct Google initiative called Google Research Datasets once planned to collect large scientific datasets by shipping around 3-terabyte disk arrays.

The other possibility is that the raw data, or even 'first-step' processed data might not be made publicly available anymore--think of this as the physics model:

At some future point it will become simply unfeasible to store all raw sequencing reads in a central archive or even in local storage. Genome biologists will have to start acting like the high energy physicists, who filter the huge datasets coming out of their collectors for a tiny number of informative events and then discard the rest.

As genomics and other data-intensive disciplines of biology move towards cloud computing (and I think it will definitely happen), it will be interesting to see how NIH funding shifts.

Well, now we know how one part of that funding will shift.

More like this

Two More Drosophila Genomes

The world of genomics is changing. It was initially about sequencing the genome a single representative individual from a particular species.

J. Craig Venter, thoroughly exposed...

...that is, if you still think that a genome sequence tells all secrets about someone's success in science etc. ;-)

Make that 12 Drosophila Genomes

What happens when I mention a paper describing two more Drosophila genomes?

Your Bones Got a Little Genome

Genome size can be measured in a variety of ways. Classically, the haploid content of a genome was measured in picograms and represented as the C-value.

While commonly called "short read archive" it is, in fact, "sequence read archive" I continually fixed that on several of my sequencing white papers only to have other people change it back on me :p

What's interesting to me is that the sequencing center contract included (strict?) language about deposition of sequence to the SRA and Trace archives. So I guess now all that sequence data just gets deleted or whatever when the assemblies are submitted?

And I wonder how long GEO will be able to take RNA-seq data sets before they have data issues. I mean, I know it's not a big issue now compared to the Illumina and SOLiD sequencing that places are pumping out, but I imagine the rate of experiments being done will keep increasing.

This is such a pity. Guess future bioinformatics students are screwed and never mind those who want to check up on the data themselves.

Presumably those running the sequencing experiments will keep their data for a while.. but its no help if two years down the line someone wants to improve the the assembly using better software or new partial sequences. Guess they'll just have to resequence from scratch.

Perhaps google could be convinced to take over the archive?

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…