The Double Standard of Genomic Data Release and the Role of Incentives

By mikethemadbiologist on June 8, 2009.

Should any data, not just genomic data, be held hostage by the grant award process?

Hunh? Let me back up...

By way of ScienceBlogling Daniel MacArthur, I came across this excellent post by David Dooling about, among other things, how different genome centers, based on size, have different release policies (seriously, read his post). Dooling writes (boldface mine):

The more interesting question is: why aren't all data and research released rapidly and freely available? Since the Bermuda Principles were agreed to in 1996, all genome sequencing centers have submitted their data, from raw sequence data to finished sequence to assemblies to annotation, to public repositories as quickly after generation as possible. These principles were reinforced by the Fort Lauderdale agreement in 2003 which added a provision that protected the production centers' right to first publication. But as we have seen recently, that provision of the Fort Lauderdale agreement is not always enforced. As sequencing has moved into medical applications, the sequencing centers have taken great pains to release human sequence data in a responsible manner, but still rapidly. What's more, they now also release the detected variants fully annotated and correlated with phenotypic information in protected access databases available to any researcher. As data that requires more and more analysis and significant human curation are made rapidly available well before publication, the production centers become ever more vulnerable to getting "scooped" on their hard won findings.
As Church and Hillier properly conclude in the above referenced article

Sequence data are now easier to produce, but decisions about timelines for data release, publication, and ownership and standards for assembly comparison and quality assessment, as well as the tools for managing and displaying these data, need considerable attention in order to best serve the entire community. (Emphasis mine)

This conclusion begets many questions. If the rapid release described in the Bermuda Principles still holds true, why does it only apply to large-scale sequencing centers? Many researchers are generating more sequence in a month than the Human Genome Project was able to produce in a year. As they continue to be allowed to perform pre-publication (as opposed to post-generation) data submission, why are they not being held to the same standard as the large-scale sequencing centers?

I agree with Dooling, smaller projects and genome centers should have to release data in a timely manner too. But the reason why the double standard exists has to do with the funding incentives for these different groups (Note: I work at a large sequencing center).

For the large sequencing centers, most of the projects are geared towards genome production. That is, the funding agency assesses whether or not benchmarks for sequence (and assembly and annotation) quality have been met in the time frame expected. To put it more crassly, renewal of funding is not primarily determined by manuscript output. Renewal of funding is determined by genome output. Yes, publications by the center are included in renewals, although 'prestigious' publications by other groups not associated with the center can also matter. And Dooling is right: often the sequencing centers are the only groups with the bioinformatics and analytical resources and know-how to make sense of the data, so, in reality, the centers end up publishing papers using the data.

But for the large centers, this is essentially contract work: the funding agency has determined that a certain amount of genomic data is required to aid other scientists in one or more disciplines, and the center is obligated to deliver these data. That's what pays the bills. There's none of the all-too-typical R01 (or similar grant) progress update "we didn't deliver what we said we would, but we found this other thing that's interesting." If you're supposed to deliver X genomes, you deliver those genomes, period. In fact, some of these arrangements aren't even grants, they're actually federal contracts. Funding agencies learned this the hard way, as too many early sequencing centers resembled 'genomic roach motels': DNA checks in, but sequence doesn't check out.

The smaller centers often do not have these arrangements. The funding agency treats this as a typical research grant. There are specific aims and hypotheses designed to address a particular research goal. But more importantly, these grants are not structured with the expectation that the funded group will rapidly deliver a set of data to a wider community. The incentive structure is that, by the end of the grant, the researchers will have addressed some specific questions. To be crass again, the ability to renew the grant (or leverage it into another grant) is determined by publication output at the time of grant renewal, which can be several years. This creates an incentive to not share data, often to the detriment of the field as a whole.

Ideally, genomic data, once it passes quality control, would be released regardless of the size or scale of the center producing it. But the current reality is that university researchers associated with smaller centers support their careers and their universities' genome centers through grants that are often awarded based on publications related to generated data.

This, of course, returns us to the question that Dooling poses, "[W]hy aren't all data and research released rapidly and freely available?" Should any data, not just genomic data, be held hostage by the grant award process? As long as U.S. science is structured around small* academic labs engaged in incredibly tough competition for resources, and those resources are allocated based on publication record, I don't see how this will change. After all, if you work really hard, only to have 'your' data scooped by another group, then 'open' data release is unfair to these researchers. On the other hand, if they are judged by data production, we could have open data release policies. But that leads to a whole 'nother set of problems, which is how one then gets funding to do analysis....

*The genome center I work at has over 1,000 people. A lab with twenty people is nothing...

More like this

Thanks for discussing my post. One small note (no need to approve this comment), my name is David not Daniel. Thanks.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Universities Can Agree On All Hate Speech Except Antisemitism

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…