...ok, I'll stop. But 99 E. coli commensal genomes will be sequenced. And that's really cool.
I'm always wary of counting chickens before they hatch, but I'm fairly certain at least 99 E. coli genomes will be sequenced (NIAID will continue funding, and more than fifty are already in various stages of sequencing).
And to a considerable extent, it's the Mad Biologist's fault. I'm the crazy bastard who thought of this and proposed it (obviously, many, many others are involved from sequencing to funding, and, last but not least, strain collections).
What makes this project unique is that these are all commensal isolates--that is, not associated with clinical disease. They are from a variety of sources: humans, non-human animals, and environmental sources.
Here's what will be sequencing (and if the numbers don't sum up exactly, that's because I don't have the list in front of me):
- 8-15 members of the five major E. coli groups: A, B1, B2, D, and E*. This is to get a handle on the breadth of diversity in E. coli. As an aside, originally, we thought about sequencing a bunch of genomes to answer a very specific question (e.g., do beta-lactam [an antibiotic] resistant E. coli differ from sensitive ones?), but then we realized that we simply didn't know enough about E. coli genomes to choose genomes intelligently, which meant that figuring that out should be one of the primary goals.
- Three outgroup strains from two species, E. fergusonii and E. albertii. I'm tired of using Salmonella as a outgroup, and one outgroup strain isn't enough as far as I'm concerned.
- Two different environmental clades (groups related by a common ancestor) of E. coli. These are interesting because a clinical lab would run standard biochemical tests on these strains and conclude that they're run-of-the-mill E. coli. However, genetically, they're quite divergent from the major groups. They also can form blooms in pristine freshwater bodies. They will be very interesting.
- 26 isolates from three common commensal clones. Within each clone, the strains differ by less than ~1/3,500 bases (0.0003% sequence divergence) based on partial sequencing of seven loci. The goal here is to ask if commensalism has similar adaptations and patterns of evolution. Also, one of the clones is one of the most common urinary tract infection clones (as well as being a good commensal). There's some recent evidence that different components of primary metabolism are emphasized in the urinary tract versus the gastrointestinal tract--I wonder if this is reflected in the genomes. We'll see...
What's exciting about this is that this sample (not to mention the other ~50 sequenced E. coli genomes, most of which are pathogens) will enable us to use all of the population genetic and phylogenetic techniques we've been using with single genes on a genomic scale. Kinda proud of that.
By the way, I don't know when these will be done (so don't ask, we're working on it**), but the good news is that, as soon as the genomes are completed and pass quality control, we release them to the public (and, of course, you are going to follow the Fort Lauderdale Protocol regarding fair use, right? We approve of such things...).
Finally, it's nice to see the daylight at the end of tunnel after putting considerable effort into this. So with that, I bring you "Daylight" by Matt and Kim (not only is it apropos, but it makes me happy):
*There actually is an explanation why there is no group C, but, trust me, it's not that interesting.
**Really, I don't know right now. Don't ask.
What sequencing technology are you using? 454 or SOLiD or Solexa or?? I'm just assuming that you're not planning on using Sanger sequencing for this :p
What level of coverage and assembly are you aiming for?
currently, we're using 454. Some genomes will have at least 10x coverage, others 30x. Our minimum assembly metrics are those used by the Human Microbiome Project (although we always exceed those).
one of the clones is one of the most common urinary tract infection clones (as well as being a good commensal).