'Counterintuition', the Human Microbiome, and Why Fluency in Math Matters

A while ago, I talked about some things biologists should learn, and the glaring omission was mathematical fluency. I bring this up because one of the things the Mad Biologist does is work on the Human Microbiome Project (between that, and fighting evil, we are very busy...). The part of the Human Microbiome Project ('HMP') that I'm involved with is a consortium of four sequencing centers and an informatics center, whose goal is to sequence the microbes associated with 18 different body sites from 250 people. And math is vital to what we do.

Before I get to the reason why math matters, there's one more bit of information you need. One component of the project is to PCR amplify the 16S gene--a gene that is found in every bacterium--and use this gene as a 'barcode' to determine what organisms are there and at what frequencies. In other words, this is the molecular microbial ecology of the human body.

Onto why math matters. A couple of weeks ago, I presented to the group some estimates of how many sequences we needed to observe every species in different body sites (Note: the technical term is 'OTU' or operational taxonomic unit which is the set of sequences all of which are similar to each other above some threshold; we use OTUs that are > 97% similar. For simplicity's sake and reader familiarity, I will refer to OTUs as species). The four centers together have sequenced ~1000 sequences from every body site, so I could estimate how many species should appear in each body site based on the number (and distribution) of species we observed. I could also estimate how many new, unobserved species we should see as we add more sequences. Rubbing the previous two sentences together, it's possible to figure out how many sequences we need to see each species once (yes, there are confidence intervals attached to these estimates...).

Since this part of the HMP is a collaborative effort, I sent around my figures and methods before our weekly phone conference. I had a sneaking suspicion that I would get a lot of questions, and I was right. Over and over, I was told that these estimates were 'counter-intuitive.' Why? Sites that had relatively few species required nearly as much sequencing (and, in some cases, more than) as those sites that had lots of species. Now, being a probability theory dork (and trained as an ecologist), this wasn't counterintuitive to me at all, but to my colleagues, the idea that a less complex--that is, species rich-community--would require as much sequencing didn't make sense.

But how deep you need to sequence depends on the frequencies of the rarest species. To put this another way, if in a community of 20 species, the two rarest species occur at a frequency of 1/10,000, to see them you will have to sequence much more than in a community of 100 species, all of which occur at equal frequencies (1/100). Or, put another way, in the skewed twenty species community, adding additional sequence is very unlikely to reveal new species as compared to the equally distributed community of 100 species.

Now, my colleagues are very smart people, and they got it once I explained it to them. But this example demonstrates why mathematics as a way of thinking is vital for biologists.

Probability theory shouldn't be counterintuitive.

More like this

How counterintuitive something is depends on your intuition.

mathematics as a way of thinking is vital for biologists

Two words too many.

By D. C. Sessions (not verified) on 21 Apr 2009 #permalink

Seven words too many, I'd say.

Dear Jorge. Could you teach me how to translate your information? I would like to read in spanish. I love you, my eternal and wonderfull son.

Great article. As a young researcher, I see myself as a valuable asset because I started school in computer science and ended with a degree in microbiology. The people I work with are amazed at my comprehension of math (albeit remedial, in my opinion). This article is reassures me that all that time learning calculus and statistics was worth it. Thank you for the insight.
-K

I suspect your backgound in ecology was more important than fluency in math. What you are saying is, of course, common knowledge among ecologists. I suspect that there are a fair number of people doing "modern biology" who have come into the field with very little broad knowledge of biology. It sounds like you have people trying to do microbial community ecology with no background in what they are trying to do. They are fortunate to have you involved.

By Jim Thomerson (not verified) on 22 Apr 2009 #permalink

You are completely correct.

Most people (myself included) get into biology because there math skills are too weak to understand physics. This must change. Many biological problems are possibly mathematically more challenging than physics sue to the inherent fuzziness and poor understanding of the fundamentals of biological systems.

Through a strange career path I became involved in bioinformatics and taught myself maths and computer science and I am still not great at either. Unfortunately many of my non-informatics colleagues are much worse. The kinds of errors this leads to are pretty bad. Many have no idea what to do with genomic scale data. Worse, their lack of math leads to poor initial design so the resulting data may be unusable.

Another skill that is badly lacking is database management. My rule of thumb is that if your spread sheet takes up more than one screen you are going to make a serious mistake and not notice. Any dataset that is more than about 60*12 (R*C) needs to be in a database. Use Access if you have to, it's not great but it's better than excel.

Could you comment on why (if there is any pattern) that less complex systems might show a different overall distribution (i.e., I gather you are saying that there is consistent skewing towards rare species in simpler systems). I am an ecologist and also a stats geek, hadn't been familair with this paatern elsewhere....hadn't actually thought about it, actually. Was wondering if this would also be reflected in a non-micriobial system, e.g. a forest ecosystem looking at plants.

By Craig Holley (not verified) on 23 Apr 2009 #permalink

There is an attitude prevalent in the sciences that the less mathematical a science is, the "softer" it is. The perceived pecking order goes something like: Physics, Chemistry, Biology, Sociology. There is an idea out there that the mathematics in physics is the hardest, and one doesn't need much in the way of math by the time one gets to sociology. In fact, the reverse is true. In terms of the actual mathematics required to describe the systems being studied, physics is the easiest. Because the math in physics is easier, the equations required to describe physical systems have been more fully developed than those in biology. It used to be that you could get a Ph.D. in biology without knowing much mathematics not because the math in biology is not complicated, but because so little was known. In my opinion, that has been changing over the past couple of decades. In biological fields like genetics, a fair amount of mathematical skill is now required, and I think that any current Biology Ph.D. student who does not have strong mathematical training will have difficulty finding long-term success. I also expect that we will see a division in Biology like the one that has occurred in Physics between theoretical physicists, who mostly work out the equations of physics, and experimental physicists, who still have to know lots of math but focus more on working in the lab.

Craig,

I'm not sure why we see less skew (e.g., higher Shannon's index) in more species rich, human-associated microbial communities than in the species poor ones. It would be interesting to find out what the distribution of species is in species-poor microbial communities (e.g., Yellowstone hot springs) to see if this is a general pattern. Maybe 'harsh' environments are more likely to be skewed?

I was very good at math and physics in high school and early college, but i gave it all up for a BA in biology and psychology and decided to focus on genetics (in an era where genetics was one gene in a fly mated to another, and phenotypes were as quantitative as +/- or +++). Now i work in molecular diagnostics and i regret stopping all my math and computational skills so early. I wish to remediate my math skills; where do i begin?. Are there any recommended resources or courses i should take?. Should i go back and take all the core cores in math, comp sci, and physics and physical chemistry i skipped with the BA route. I was in college in the early 90's and i stopped with calculus I and II, physics I and II, and statistics I. Where do i begin today to rebuild my math and computational skills without going back for a new major in college 16 years later. I now tell everyone to be fluent in math. It is truly the universal language understood by scientists in all nations, and i used to believe in the statement "without data, you are just a man with an opinion", now it should read "without QUANTITATIVE data . . . " Somebody save me.

As to me, my teenager interest in mathematics was not the last factor driving me to get a biology major (counterintuitive again?) Those 2 fields of interest did not really merge for years, and eventually I enjoy the explosion in biological data available now and all my senses are finally satisfied :)