Using Good Math to Study Evolution Using Fitness Landscapes

Via [Migrations][migrations], I've found out about a really beautiful computational biology paper that very elegantly demonstrates how, contrary to the [assertions of bozos like Dembski][dembski-nfl], an evolutionary process can adapt to a fitness landscape. The paper was published in the PLOS journal "Computational Biology", and it titled ["Evolutionary Potential of a Duplicated Repressor-Operator Pair: Simulating Pathways Using Mutation Data"][plos].

Here's their synopsis of the paper:

>The evolution of a new trait critically depends on the existence of a path of
>viable intermediates. Generally speaking, fitness decreasing steps in this path
>hamper evolution, whereas fitness increasing steps accelerate it.
>Unfortunately, intermediates are hard to catch in action since they occur only
>transiently, which is why they have largely been neglected in evolutionary
>studies.
>
>The novelty of this study is that intermediate phenotypes can be predicted
>using published measurements of Escherichia coli mutants. Using this approach,
>the evolution of a small genetic network is simulated by computer. Following
>the duplication of one of its components, a new protein-DNA interaction
>develops via the accumulation of point mutations and selection. The resulting
>paths reveal a high potential to obtain a new regulatory interaction, in which
>neutral drift plays an almost negligible role. This study provides a
>mechanistic rationale for why such rapid divergence can occur and under which
>minimal selective conditions. In addition it yields a quantitative prediction
>for the minimum number of essential mutations.

And one more snippet, just to show where they're going, and to try to encourage you to make the effort to get through the paper. This isn't an easy read, but it's well worth the effort.

>Here we reason that many characteristics of the adaptation of real protein-DNA
>contacts are hidden in the extensive body of mutational data that has been
>accumulated over many years (e.g., [12-14] for the Escherichia coli lac
>system). These measured repression values can be used as fitness landscapes, in
>which pathways can be explored by computing consecutive rounds of single base
>pair substitutions and selection. Here we develop this approach to study the
>divergence of duplicate repressors and their binding sites. More specifically,
>we focus on the creation of a new and unique protein-DNA recognition, starting
>from two identical repressors and two identical operators. We consider
>selective conditions that favor the evolution toward independent regulation.
>Interestingly, such regulatory divergence is inherently a coevolutionary
>process, where repressors and operators must be optimized in a coordinated
>fashion.

This is a gorgeous paper, and it shows how to do *good* math in the area of search-based modeling of evolution. Instead of the empty refrain of "it can't work", this paper presents a real model of a process, shows what it can do, and *makes predications* that can be empirically verified to match observations. This, folks, is how it *should* be done.

[migrations]: http://migration.wordpress.com/2006/07/12/duplication_and_coevolutionar…
[dembski-nfl]: http://scienceblogs.com/goodmath/2006/06/dembski_and_no_free_lunch_with…
[plos]: http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10…

Tags
Categories

More like this

Intuitively, people think that it's easy to be caught in a local optimum, and prevented from making progress to the global optimum (no "path of viable intermediates"). I know this because I did my dissertation on nonlinear least-squares fitting of entire molecular absorption bands, thirty-plus adjustable parameters, fit to 2,000+ observations. We were always getting this objection. Once people tried it, though, it became the standard method.

In fact, it was never a problem. The more complex the system -- the more parameters -- the less it's a problem. To be trapped, you have to be trapped in every direction in parameter space. The more parameters, the less likely that is. (I first saw this observation in a report from a group at Los Alamos that had developed an early lens design optimizing program. They said, if I recall correctly, that they had not seen a single case of a local optimum in two thousand tries.)

The only way I ever saw to get trapped in a local optimum, was to collapse the problem to one dimension. You have to make one parameter so important that the others aren't significant. And the only way I ever did that was by mistake.

Intuition is not a reliable guide in n-dimensional parameter space.

By Bob Hawkins (not verified) on 14 Jul 2006 #permalink

An analogous situation and the problem with intuition seems to be discussed by Tellgren on results contradicting Dembski's NFL booziness ( http://talkreason.org/articles/nfl_gavrilets6.pdf ):

"Because the number of mutational neighbours increases with increasing dimensionality, so does the number of potential paths and consequently the probability that any two high-fitness genotypes are connected must also increase. Therefore the well-connected "noodle soup" structure becomes easier to obtain the higher the dimensionality of the genotype space."

"I want to emphasize the scaling in the size and dimensionality of the search space, because it goes against a common intuition that finding points with high fitness gets more difficult as the search space becomes larger."

By Torbjörn Larsson (not verified) on 14 Jul 2006 #permalink

i came here from bad math about ID (did you lose yours? most people need one). i wanted to see your opinion of what is 'good math' on evolution. (i personally only use the 'best' stuff myself, but i am tolerant. )

it seems funny that a computer scientist might not even mention the work by von neumann, ulam, etc on 'automata' in this context (you know, like the origin of life). john conway comes from there. (i like conway). but i guess you were doing computations. (by the way, bacteria i think have less than 1 10*6 genes; humans only have about ? 130? 40? i forget. Collins, the christian, knows. )

there is also a fairly large literature going back to Fisher which does the math similar to the computer simulation you cite. its not like this is a new problem.
the approaches overlap via genetic algorithms.

i find it interesting that Dembski has a PhD from Chicago (the place where McLane started category theory---of which I am skeptical (since there already existed logic and recursion theory so i don't see exactly why a new bottle is needed to put old problems in, but whatever---sometimes something new may appear, as Pasteur showed using Von Newumann's machine concerning spontaneous generation.
I do think it may be interesting if you learn chinese and see if this blog written that way takes a new turn.).

i also find it interesting that among AIDS denialists there is a math biologist (whose math appears to be generic ) and a PhD in algebraic number theory, not to forget S Lang. to an extent, since their arguments appear to ignore evidence, as theoretical arguments they may be plausible. "there may be aleph 3 angels on the head of a pin". we have the BYU 911 physicist. etc.
B Josepheson (who actually seems logically consistant).

perhaps its more interesting to work on the fringes than in a 'corporate job' (some of which can be pretty loony. 'global warming? a farce'.).

i note that the PhD in math mentioned above (darin somebody of serge lang fame) has a new post 'deconstructing' correlations in a JAMA paper on some blog. i guess the best thuing to do is ignore it.
it might also be interessting to look at his thesis to see if its plagiarized or something.