The NY Times is touting a computer simulation of Mycoplasma genitalium, the proud possesor of the simplest known genome. It's a rather weird article because of the combination of hype, peculiar emphases, and cluelessness about what a simulation entails, and it bugged me.
It is not a complete simulation — I don't even know what that means. What it is is a sufficiently complex model of a real cell that it can uncover unexpected interactions between components of the genome, and that is a fine and useful thing. But as always, the first thing you should discuss in a model is the caveats and limitations, and this article does no such thing.
I'd like to know how fine-grained the model is; I get the impression it's an approximation of interactions between molecular components based on empirically determined properties of those elements. Again, I don't think the authors have claimed otherwise, but it's implied by the NY Times that now we have an electronic simulation that we can plug variables into and get cures for cancer and Alzheimer's, without ever having to dirty our hands with real cells and animals anymore.
That's nonsense. Everything in this model has to be a product of analyses of molecules from living organisms; they certainly aren't deriving the functions and interactions of individual proteins from sequence data and first principles. We can't do that yet! The utility of a model like this is that it might be able to generate hypotheses: upregulating gene A leads to downregulation of gene Z, a gene distantly removed from A, in the model, and therefore we get a preliminary clue about indirect ways to modulate genes of interest. The next necessary step would be to test potential drug agents in real, living cells. This model will have a huge mountain of assumptions built into it — and you can only build further on those speculations so far before it is necessary to cross-check against reality.
Also, isn't it a bit of a leap to jump from a single-celled, parasitic organism like M. genitalium to human cancers and brain disease? Yet there it is in the second paragraph, a great big bold exaggeration.
And then there's the really weird stuff. Some people need to step back and learn some biology.
“Right now, running a simulation for a single cell to divide only one time takes around 10 hours and generates half a gigabyte of data,” Dr. Covert wrote. “I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds. We often think of the DNA as the storage medium, but clearly there is more to it than that.”
What the hell…? Look, I could (if I had the skills) generate an hourglass simulator that calculated the shape and bounciness and stickiness of every grain of sand, and stored the trajectory of each as they fell, and by storing enough data for each grain, generate even more than half a gigabyte of data. So? This doesn't mean that an hourglass is a denser source of information than a cell. The storage requirements for the output of this program do not tell us "how much data a living thing truly holds" — that statement makes no sense.
As for "We often think of the DNA as the storage medium, but clearly there is more to it than that"…jebus, does a professor of bioengineering really need to go back and take some introductory cell biology courses, or what? Heh. "More to it than that." I'm glad to see that someone needed an elaborate computer simulation to figure that out.
I am, for some reason, reminded of the time I attended a seminar by a computer scientist on an exciting new simulation of the genetic behavior of viruses that I was told would have great predictive power for epidemiology. One of the first things the speaker carefully explained to us was how they'd incorporated sexual reproduction into the model. I wish she'd waited to the end to say that, because it meant that I sat there listening to the whole hour talk with absolutely no interest in any other details.
- Log in to post comments
Please write more pieces like this. Many more. Most accounts in the popular press of computerized simulations of life (or brain) processes are as naive as this one. But virtually no one besides yourself calls them out. (Especially when they are in the New York Times, which most educated people assume gets science right. The Times may get Higgs-style physics right, but they do an awful job with computers and AI.) Your analysis a few years back of the nonsense spouted by the Singularity crowd remains a classic, but there are far too few pieces like it. I certainly don't want to take anything away from your forceful writing against creationism, but there are (I'm sure you will agree) a number of equally qualified scientists making many of the same points in their own books or blogs. But there is no one with your training who is regularly working this other turf -- the regular hyping of what computers are capable of doing -- despite the massive misinformation being spread. Save us, professor!
Most of the experimental data they used to build their model didn't come from the organism they're modeling. Here's what I think they did: They created a sub-model for each of the 28 'cellular processes', using whatever data was available (probably mostly E. coli), and tweaking it as necessary to get the components to work together. Then they connected the cellular process models into the big model, and then retweaked all the values until they had a big model that worked.
From their Data S1 file: "...because the 28 cellular processes were trained using different experimental data obtained by different investigators under different conditions using different techniques and different model organisms, we refined the values of the sub-model parameters to make the processes mutually consistent."
This accomplishment tells us that their modeling skills are superb. Given all the tweaking needed to get it working, the concern is that the model may reflect the tweaks more than it does the reality of the cellular processes.
For those who wish to read the original paper, it can be found here (official paywalled copy), or here.