I realize that the typical format for blogging is to find something that pisses you off and then rant about it, but I actually like the recent workshop report by NHGRI, “The Future of DNA Sequencing at the National Human Genome Research Institute.” (pdf file) While I’ll have more to say about the report overall, I liked the section about the Human Microbiome Project (the goal of the HMP is to use sequencing technologies to understand how the microbes that live on us and in us affect health and disease).
I was happy to see that NHGRI still thinks that it has a role in funding the HMP. It’s never been clear to what extent various NIH institutes would continue to support microbiome research after the Jumpstart funding runs out in 2013.
I also like the statement of what NHGRI’s microbiome activities should be:
- Analyze microbiomes of many more normal subjects than is now being considered for the Human Microbiome Project (HMP), in order to obtain a fuller appreciation for the range of microbial communities in the human, and how their composition relates to environmental and other factors.
- Include the sequencing of host genomes.
- Integrate microbiome information with other projects, for example 1000 Genomes or GTEx.
- Analyze the microbiome of model organisms, for example to enable experimental analysis.
- Use sequencing to attain a more fundamental understanding of microbe biology, for example microbial communities, gene transfer, and other fundamental biology.
The only thing I would add (and I’ll discuss it in the future) is that there needs to be much more support for developing the bioinformatic and analytical tools to analyze the massive amounts of data. Most of the methods used were designed in an era when microbiome datasets were orders of magnitude smaller. These methods often don’t scale up because these methods are ‘N-squared’ problems: as the N, the number of sequences increases, the number of calculations required increases by N x N (or more). We simply don’t have really good methods to handle and then analyze hundreds of millions of sequences. Without this, we won’t be able to use the data as well as we could.
But, overall, this is very encouraging. I hope NHGRI listens.