Meanderings and messages

So, it seems that 44 is the median age of depression. Old news, or at least it is for me. Although for 44 to be the median age of depression for me, I'd have to live until my late 70s. Right now, after a week of working on a grant application and dealing with my son returning to school, I'm not a happy camper, let me tell you. So blogging has been a low priority, and is likely to remain so.

Blogging on Peer-Reviewed ResearchBut this caught my eye. Some researchers have reconstructed the ancestral DNA of bacteria and worked out that it is (physically) adapted to higher temperatures. Or have they?

There is a mathematical procedure in information theory called minimum message length [see also here], which takes various messages from the same source, and attempts to reconstruct the original message. It does this by a mixture of Shannon entropy and Bayesian likelihoods. It is effectively the formal underpinnings of such attempts to reconstruct the evolutionary past, and it shows rather well how limited and impermanent are the conclusions reached this way.

Take two "messages" (from the last link):



string A= TAATACTCGGC

string B= TATAACTGCCG

What was the common ancestral sequence? Well we have reason, from Occam's Razor, to think that whatever is shared is likely to have been the common ancestor's sequence. So we get

Anc= TA**ACT****

which is not so informative. What were the ancestral sequences? Chris Wallace developed MML to work out the likelihoods, but not the certainties. So his student Lloyd Allison, whose paper I am cribbing from, offers this table:

i-4634d691ef2924151f2a3736c7b5b4fc-mml.png



While Allison was considering the relatedness of the sequences, the diagonal here gives the likely ancestral sequence (which is formally the same thing) based on the likely mutations. The asterisks represent the most likely sequence (they are most close to the MML). As you can see, for three of the last four symbols, the sequence is unclear.

OK, so now let's do this for 16 species of bacteria, each of which has a history of mutations, back mutations, and so on for the sequence. How likely is it that we have reconstructed anything?

I'm not rejecting this approach, merely the hyperbole that all too often comes from these sorts of exercises. The LCA of bacteria may well have been thermophilic, and there is other evidence to support this, but it is not even very certain that we can know this, let alone that we do. Knowledge of the past is often based on degraded or missing information, and here we see a case of that.

This is why, to segue into another mode - that of Darwin Day, and the praising of all things Darwin - the method used by Darwin, based on Lyell's uniformitarian geology, is the safest epistemology for the historical sciences. We should rely on factors that we know apply today, and not infer beyond what we safely can about the past. Contrary to what many popularly think, for instance, Darwin relied almost not at all on fossil evidence, instead making his inferences on the basis of modern organisms, and leaving that which he could not infer on that basis to speculation. Sometimes the past is simply inaccessible. At best our researchers have suggested that thermophilic lifestyles may have been an ancestral ecological niche, but then again, there are too many pathways, all of them plausible on the basis of modern biology, by which these sequences could have developed this way.

It's philosophy - you end up a skeptic even when all around you are quite happy with some conclusions. And that makes me depressed...

More like this

"As you can see, for three of the last four symbols, the sequence is unclear."

So you get lots of possible ancestral sequences. How many of those sequences code for functional proteins? ;)

If the Maynard Smith/Kaufmann/Gavrilets account is correct, probably pretty well all of them. Any neighbour of a currently functional protein is likely to be relatively functional itself (it's not all-or-nothing).

"Any neighbour of a currently functional protein is likely to be relatively functional itself"

not at the DNA sequence level...