A discussion of open access data using bird flu and other disease data as examples.

The recent scares over bird flu have led many researchers to investigate the epidemiology, genetics, and disease risks of the virus. The researchers are focused on both preventing the transmission of the virus into human populations and preparing for a potential pandemic. By analyzing DNA sequences from different viral strains, researchers can understand how the virus spreads within and between populations, how it changes over seasons, and what (if anything) we can do to predict its next evolutionary jump.

Before we get further into bird flu data, here is a primer on influenza. Influenza A viruses are classified into subtypes based on two proteins that make up the envelope (surface) of the virion: hemagglutinin (HA) and neuraminidase (NA). There are 16 known HA antigens (named H1 through H16) and 9 NA antigens (N1 through N9). Each antigenic type forms a unique cluster based on both protein sequence and evolutionary history. New subtypes can be formed by either the creation of new HA or NA antigens (rare events) or mixing HA and NA antigens (quite common). H5N1 is the subtype responsible for bird flu.

Many studies have examined DNA sequences from various influenza strains to understand how the virus evolves within and between hosts, within and between populations, and from season to season. The groups that specialize in analyzing these data must collaborate with workers in the field who are responsible for collecting the data. Initial data analysis consisted of categorizing the antigen subtypes of viruses within human populations, but technological as well as analytical advances have allowed researchers to relate antigen evolution to DNA sequence evolution.

Research into the evolution of human pathogens differs from other evolutionary genetics research in that the connections to human health are much more direct. Studying the evolutionary dynamics of copy number polymorphisms within human populations may shed light on the factors underlying genetic diseases. But figuring out the distribution of various strains of a pathogenic virus or the population level factors that determine the emergence of pandemics are necessary for the development and implementation of appropriate vaccination strategies. That’s why it’s disappointing to see certain groups hoarding data.

Aside from saving us from an impending pandemic, disease researchers are also interested in furthering their research careers — either earning tenure, getting appointed to an endowed chair, or simply building prestige. This is true for all academics, but the conflict between serving the public and serving their self-interests lead to ethical issues that aren’t as pronounced when human health is not involved. Compounding the problem is the hyper-competitive nature of high profile research areas such as that involving infectious diseases or human genetics.

In the race to be the lab that takes the next big step amongst many other labs working on the same problems, sharing data seems to go against one’s self-interests. Other researchers working on similar questions, but outside the area of human disease, can be more open with their data because they aren’t as likely to be scooped. Taking an example from a research area with which I am quite familiar, Michael Eisen set up a wiki for the open sharing of analysis of the 12 Drosophila genomes project. As a comparison, there are many stories of researchers hoarding data from a particular disease or pathogen, publishing multiple papers over their career, without ever making the data publicly available. One also hears of other researchers sitting on samples or data for years on end because they intend to analyze them for publication, but they have not gotten around to it. When interested individuals request access to the data or samples, they are often not granted permission. Developing a relationship with labs that have important samples is a necessary step if one hopes to study the evolutionary dynamics of pathogens.

It seems that the areas where collaboration and open access should be encouraged — because of the direct implications on human health — are those in which hoarding and isolation are most predominant (the evidence here is purely anecdotal). On the other hand, less competitive research disciplines (not all, but many — systematicians and naturalists are notorious for hoarding field samples) are more open with their data. When the public health benefits of the research are greatest, the openness of the data appears to be the worst.

Which brings us to Indonesia. Within the past year, Indonesia has been affected by multiple outbreaks of bird flu (influenza strain H5N1). Indonesian researchers have collected data on these outbreaks, but they are not sharing the data with the international community (Revere and Glyn Moody have both commented). They are treating the samples as a national treasure, not free for the international community to plunder. A discussion of whether their motives are ethical or not leads to interesting debates, but what I find more interesting is whether they are an exception or the norm.

Is the Indonesian hoarding of bird flu sequences any different from researchers hoarding data on measles or other viral samples? Is the Indonesian government committing a much worse “crime” against open access than other scientists? Or is what they’re doing nothing compared the hoarding of data that goes on in other areas of biomedical research?


  1. #1 Sandra Porter
    March 6, 2007

    I can name another example. HIV. You would think that it would be possible to find trace data for HIV sequences since it’s such a serious problem.

    But there isn’t any in any database that I know of, not the NCBI trace archive, nor anywhere else.