genomics

Doug Natelson raises a good question about when data should be made publicly available: How much public funding triggers the need to make something publicly available? For example, suppose I used NSF funding to buy a coaxial cable for $5 as part of project A. Then, later on, I use that coax in project B, which is funded at the $100K level by a non-public source. I don't think any reasonable person would then argue that all of project B's results should become public domain because of 0.005% public support. When does the obligation kick in? This is actually a fairly common problem in…
Nick Loman listed the fifty most sequenced bacterial genomes according to NCBI. A reader at Nick's blog came up with an improved list--one that reflects the genomes for which we actually have data (depending on who is doing the sequencing, a project will be registered with NCBI, often months before any sequencing is done). Here's the 'improved' top twenty: 173 Escherichia coli 82 Salmonella enterica 78 Staphylococcus aureus 69 Propionibacterium acnes 56 Streptococcus pneumoniae 56 Enterococcus faecalis 45 Bacillus cereus 42 Mycobacterium tuberculosis 36 Vibrio cholerae 29 Pseudomonas…
What a fabulous combination. This week, Congress has held hearings on the direct-to-customer ('DTC') genetic testing industry. It appears, based on previous statements by FDA officials, that they have publicly contradicted themselves--or been willfully ignorant--about the larger scientific benefits from DTC testing. This week's hearings also seem to have attracted some serious hyperbolic anti-DTC testimony, even by my standards (these companies are "raping the human genome project"? The HGP was made public domain so everyone, including those who work at companies, could have access to the…
Detroit Industry by Diego Rivera To put this post in larger context, Paul Krugman stirred up quite a ruckus with a column that argued that a lot of jobs for college graduates are being rendered obsolete by technological change. For scientists, this is not a new phenomenon. At a recent celebration type-of-thing, a colleague explained how a Prominent Genomic Researcher realized that the next leap forward in biology was going to happen when biologists would view their science as an information science. The future was not going to involve benches filled with dozens of Ph.D.s furiously…
The National Center for Biotechnology Information (NCBI) recently announced that it will shut down the Short Read Archive (SRA). The SRA stored the semi-processed data for genomics projects, so researchers could examine the raw data for a genomics project. The reason given by NCBI is "budget constraints." While I'm saddened by this, I'm not surprised, since the volume of data produced by a single genome center is tremendous, to the point where the storage and data upload are prohibitive: when several centers were collaborating to test new sequencing technologies, the data were so large,…
Matthew Yglesias writes regarding Moore's Law, which states that CPU transistor counts double every two years: My pet notion is that improvements in computer power have been, in some sense, come along at an un-optimally rapid pace. To actually think up smart new ways to deploy new technology, then persuade some other people to listen to you, then implement the change, then have the competitive advantage this gives you play out in the form of increased market share takes time. The underlying technology is changing so rapidly that it may not be fully worthwhile to spend a lot of time thinking…
I got to spend last week in sunny California. I forgot how wonderful it is to sit and eat lunch outside! I was participating in a workshop held at the Department of Energy's Joint Genome Institute (JGI). The workshop was entitled Microbial Genomics and Metagenomics. Basically I spent the week learning about different tools that are available to help biologists deal with the data flood that has come out (and continues to flow faster and faster) of sequencing technologies that continue to get faster and cheaper. Since microbes are not exactly easy to observe with ones eyes,…
This morning I attended a "bloggers-only" conference call with Dr. Eric Green and the folks from the NIH Human Genome Research Institute (NHGRI) to hear about NHGRI's new strategic plan. The new plan represents a shift away from viewing the genome through a lens marked "for research use only" and towards the goal of making the genome useful as a clinical tool. As a consequence, we will see a greater emphasis on funding activities that support clinical work. For example, it's not always clear how variations in the genome are related to disease. NHGRI might fund projects that help sort and…
By way of Jonathan Eisen, we discover that museums are starting to hire microbiology curators. I'm very excited about this, probably more excited than Eisen (and he's a pretty excitable guy). In part, I've always loved museums and have thought that building microbiological collections for museums would be a neat thing to do. But there are also some vital scientific needs that would be met by museum curation. What makes microbiological curation really exciting to me is the advent of cheap genomic methods. If you're able to culture it, we can sequence its genome, which is a pretty good way…
Matthew Herper rounds up some of the discussion about the decreasing cost of genomics. But one thing that hasn't been discussed much at all is the cost of all of the other things needed to make sense of genomes, like metadata. I briefly touched on this issue previously: A related issue is metadata--the clinical and other non-genomic data attached to a sequence. Just telling me that a genome came from a human isn't very useful: I want to know something about that human. Was she sick or healthy, and so on. These metadata too, will have to be standardized: I can't say one of genome came from…
I had the good fortune on Thursday to hear a fascinating talk on deep transcriptome analysis by Chris Mason, Assistant Professor, at the Institute for Computational Biomedicine at Cornell University.  Several intriguing observations were presented during the talk.  I'll present the key points first and then discuss the data. These data concern the human transcriptome, and at least some of the results are supported by  follow on studies with data from the pigmy tailed macaque. Some of the most interesting points from Mason's talk were: A large fraction of the existing genome annotation is…
...or it won't be much of a revolution. Yesterday, I discussed the difference between a DNA sequencing revolution and a genomics revolution, and how we have a long way to go before there's a genome sequencer in every pot (or something). But let's say, for argument's sake, these problems are overcome--and I think they will be. Then the real trouble begins. The big issue is standardization--without it we will have a genomic Towel of Babel: "There is a growing gap between the generation of massively parallel sequencing output and the ability to process and analyze the resulting data," says…
Last week, Forbes had an article about the advances in genomics, which focused on the Ion Torrent sequencing platform. It's a good overview of genomics and the Ion Torrent technology, albeit a bit much on the cheerleading side. For instance, this: Audaciously named the Personal Genome Machine (PGM), the silicon-based device is the smallest and cheapest DNA decoder ever to hit the market. It can read 10 million letters of genetic code, with a high degree of accuracy, in just two hours. Unlike existing DNA scanners the size of mainframes and servers, it fits on a tabletop and sells for only $…
Mary Carmichael has a great video (and associated post) about the rise of genetic denialism--ridiculous arguments that 'genes don't cause disease.'* Carmichael offers two reasons why the argument is flawed; I'll offer a third in a bit, but I do want to note one minor point of disagreement with Carmichael. If you go to roughly the ten minute mark, Carmichael has a summary slide that lists several phenomena that could account for the 'missing heritability'--the observation that currently identified genomic variation can't account for much the expected heritability as determined by various…
To paraphrase Mark Twain, biological understanding may not repeat itself, but it does rhyme. There's a recent commentary, "The Great DNA Data Deficit: Are Genes for Disease a Mirage?", from the Bioscience Resource Project which, if Twitter is any guide (and how could Twitter possibly not be?), is leading to some angry rebuttals, even as I write this. The piece discusses Genome Wide Association Studies ('GWAS'), and argues that GWAS is overhyped in that many common diseases have a very limited genetic basis. Before I deal with the piece (which has good and bad points), I would note that I'm…
I don't see the need to redescribe the recent paper about the discovery of bacteria that can might replace, in extremis, phosphorus with arsenic, which was overhyped by NASA, was poorly covered by most journalists, and which has compromising methodological problems (for good coverage, read here, here, and here; and snark). But what the paper does demonstrate is the importance of culturing microorganisms, knowledge about which is becoming rapidly lost by younger scientists. With the advent of DNA-based, culture-independent techniques, where we can look at the DNA and RNA of microbes without…
I guess. Over at the Motley Fool, Brian Orelli asks, "Which Is a Better Buy: Complete Genomics or Pacific Biosciences?" While I agree that PacBio is probably the better bet (and bet is the operative word), I don't think the reasoning is right. Orelli: If you're interested in trying to catch the boom and get out before the bust, both Complete Genomics and PacBio look like a good choice to benefit from an exponential increase in DNA sequencing. It's too early to make a definitive call, but of the two, I like PacBio better because I'm not fond of the low-cost, high-volume business model. Sure…
Gabe Rudy, blogging at our 2 snps, has a really good introduction to sequencing technology and its history. It's worth the read, but I don't entirely agree with the reason given for why ABI SOLiD lost out to Illumina: Coming to market at the same time, but seeming to have just missed the wave, was the Applied Biosystems (ABI) SOLiD system of parallel sequencing by stepwise ligation. Similar to the Solexa technology of creating extremely high throughput short reads cheaply, SOLiD has the added advantage of reading two bases at a time with a florescent label. Because a single base pair…
Last week, the U.S. Department of Justice filed an amicus brief which argued that naturally-occurring DNA sequences can't be patented: Reversing a longstanding policy, the federal government said on Friday that human and other genes should not be eligible for patents because they are part of nature.... "We acknowledge that this conclusion is contrary to the longstanding practice of the Patent and Trademark Office, as well as the practice of the National Institutes of Health and other government agencies that have in the past sought and obtained patents for isolated genomic DNA," the brief…
A problem in genome-wide association studies ("GWAS") is the"missing heritability" issue--identified genetic variation can only account for a small fraction of the estimated genetic contribution to variation in that trait. Razib has a good roundup of several explanations (and I added some speculation about nearly-neutral mutations). GWAS also have problems accurately characterizing the trait. For example, not all heart diseases (note the plural) are alike, so we have to be certain that we accurately assess the trait of interest. But what is very rarely discussed is the environmental…