Upstream plasticity and downstream robustness in evolution of molecular networks


In developmental biology, and increasingly in evolutionary biology, one of the most important fields of study is deciphering the nature of regulatory networks of genes. Most people are familiar with the idea of a gene as stretch of DNA that encodes a protein in a sequence of As, Ts, Gs, and Cs, and that's still an important part of the story. Most people may also be comfortable with the idea that mutations are events that change the sequence of As, Ts, Gs, and Cs, which can lead to changes in the encoded protein, which then causes changes in the function of the protein. These are essential pieces in the story of evolution; we do accumulate variations in genes and gene products over time.

There's more to evolution than just that relatively straightforward pattern of change, however. Consider humans and chimpanzees. We're both made of mostly the same stuff: the keratin that makes up our hair and the organization of hair follicles is nearly identical, and our brains each contain the same structures. The differences are in regulation. We both have the same kinds of hair, but chimps have more of it turned on all over the place, while we've mostly down-regulated it everywhere except a few places. The differences in our brains may be mostly differences in select timing: our brains are switched on to grow for longer periods of time in development, and there are almost certainly specific regions and patterns of connectivity that are tweaked by adjusting different levels of different gene products in different places at different times.

The really important bits of information that generate macroevolutionary differences are probably not the protein-encoding sequences of the genes, but the pieces of DNA that surround them (the regulatory elements that act as on/off switches for gene expression) and the molecular and cellular interactions that occur during development to change their status (the fancy word we use for that is epigenesis.) Another way to think of it is that much of the history of this new science of molecular biology has focused on puzzling out the spelling and vocabulary of the genome. The next step is to work out the genomic grammar. Proteins and their sequences are the words of the language, while organisms are whole novels—and obviously the differences between a Shakespeare and a Bulwer-Lytton aren't so much in the words available to them, but in how they are arranged in a sentence, paragraph, and page.

You must understand that genes don't stand alone, but every gene is an actor in a complex regulatory network. Each gene has a set of other genes that can influence whether that gene is off or on (these are called upstream elements.) A gene produces a protein product that affects multiple other genes (the downstream elements); this affect can be direct, if the gene is a regulatory gene itself, or indirect, if the gene product is part of a complex of cytoplasmic regulators. (I'm trying to keep this simple, or I'd go into greater detail on the fact that there are also multiple levels of regulatory interaction, that there is a great cloud of things called transcription factors that interact directly with DNA, and there is another great cloud of signal transduction factors and regulatory proteins working away in the cytoplasm.)

That long-winded introduction brings me to this paper by Maslov et al. (2004). The premise is straightforward: let's look at a lot of genes, and in addition to comparing sequence similarity, let's also compare the sets of upstream elements and the set of downstream elements and see how they differ. The authors are examining not just how the words of the language are spelled, but seeing if similar words are used in similar ways within sentences. In particular, they are looking at the results of gene duplication in evolutionary history. When a gene is duplicated, it is often going to also duplicate all of it's regulatory elements; it is initially going to have all of the same upstream and downstream elements as the original copy. It can then diverge in its sequence, and also in what other genes regulate it and what downstream genes and proteins it affects. The question they are asking is, how fast does regulation change compared to sequence information? The answer is in the abstract:

Background:Gene duplication followed by the functional divergence of the resulting pair of paralogous proteins is a major force shaping molecular networks in living organisms. Recent species-wide data for protein-protein interactions and transcriptional regulations allow us to assess the effect of gene duplication on robustness and plasticity of these molecular networks.

Results:We demonstrate that the transcriptional regulation of duplicated genes in baker's yeast Saccharomyces cerevisiae diverges fast so that on average they lose 3% of common transcription factors for every 1% divergence of their amino acid sequences. The set of protein-protein interaction partners of their protein products changes at a slower rate exhibiting a broad plateau for amino acid sequence similarity above 70%. The stability of functional roles of duplicated genes at such relatively low sequence similarity is further corroborated by their ability to substitute for each other in single gene knockout experiments in yeast and RNAi experiments in a nematode worm Caenorhabditis elegans. We also quantified the divergence rate of physical interaction neighborhoods of paralogous proteins in a bacterium Helicobacter pylori and a fly Drosophila melanogaster. However, in the absence of system-wide data on transcription factors' binding in these organisms we could not compare this rate to that of transcriptional regulation of duplicated genes.

For all molecular networks studied in this work we found that even the most distantly related paralogous proteins with amino acid sequence identities around 20% on average have more similar positions within a network than a randomly selected pair of proteins. For yeast we also found that the upstream regulation of genes evolves more rapidly than downstream functions of their protein products. This is in accordance with a view which puts regulatory changes as one of the main driving forces of the evolution. In this context a very important open question is to what extent our results obtained for homologous genes within a single species (paralogs) carries over to homologous proteins in different species (orthologs).


The simple answer is that regulatory networks change relatively rapidly. They have measured a parameter, Ω or overlap, that defines how similar the upstream regulators of two genes are; in this cartoon to the left, the two paralogous genes share two upstream regulators out of five, and so have an Ω value of 2/5 or 0.4. In the graph below we can see that fraction of shared regulators changes faster than the sequence similarity.

The PID (percent identity) dependence of the average regulatory overlap Ωreg normalized by the total number of regulators of either one or the other paralog. Relative error bars are estimated by the inverse square root of the total number of shared regulators in a given PID bin. The solid line is the best fit to the exponential form: Ωreg ~ exp(γ PID) with γ = 0.03 (3% change in Ωreg for every 1% change in PID.) The dashed horizontal line at 0.015 is a null-model expectation of the normalized overlap of two randomly selected proteins (not necessarily paralogs).

What this means is that duplicated genes diverge rapidly at the level of the regulatory network, losing and adding different upstream regulators; this is the "upstream plasticity" referred to in the title of the paper. What about the downstream elements?

One slight flaw in the paper (acknowledged by the authors) is that their measures of upstream and downstream overlap aren't entirely comparable—the upstream measure is of protein-DNA interactions, while downstream they are looking at protein-protein binding. Still, they see something interesting: while downstream regulation also changes rapidly, the proteins tend to retain a number of shared effects, even with significant changes in sequence. This maintained redundancy confers robustness on the network.

Divergence of downstream functions of duplicated genes in the baker's yeast S. cerevisiae. A. The average value of the interaction overlap Ωint—the number of physical interaction partners shared by a pair of paralo-gous proteins—as a function of the similarity of their amino acid sequences. The physical interaction data are taken from the set of Uetz et al. (open circles), the core dataset of Ito et al. (diamonds), and the non-redundant combination of the two (filled circles). Note the apparent plateau for PID's between 70% and 100% in all three datasets. Solid lines are guides for the eye. A randomly selected (usually non-paralogous) pair of proteins in the combined dataset on average has Ωint around 8x10-3 (off-limits in this figure). All data points at all PIDs are significantly above this null-model value.

The work has promise for explaining some features of evolution.

Our results also indicate that the genetic regulation of paralogous proteins changes faster than both their amino acid sequences and the set of their protein interactions partners. It is tempting to extend this observation to pairs of homologous proteins in different species (orthologs) that diverged from each other as a result of a speciation (as opposed to a gene duplication) event. This would help to explain how species with very similar gene contents can evolve novel properties on a relatively short timescale. However, such an inter-species comparison of molecular networks has to wait for the appearance of whole-genome data on molecular networks in closely related model organisms.

Maslov S, Sneppen K, Eriksen KA, Yan K-K (2004) Upstream plasticity and downstream robustness in evolution of molecular networks. BMC Evolutionary Biology 4:9-21.

More like this

We miss something important when we just look at the genome as a string of nucleotides with scattered bits that will get translated into proteins — we miss the fact that the genome is a dynamically modified and expressed sequence, with patterns of activity in the living cell that are not readily…
I commented a couple of days ago on a news item about a journal article on the evolution of gene expression in primates that had yet to be published. Well, the article has been published, and I've read it (Nature has also published a news and views piece on the study by Rasmus Nielsen). I have a…
Here's what seems to be a relatively simple problem in evolution. Within the Drosophila genus (and in diverse insects in general), species have evolved patterned spots on their wings, which seem to be important in species-specific courtship. Gompel et al. have been exploring in depth one…
Let's take a look at all seven PLoS journals today. As always, you should rate the articles, post notes and comments and send trackbacks when you blog about the papers. You can now also easily place articles on various social services (CiteULike, Mendeley, Connotea, Stumbleupon, Facebook and Digg…

Does the analogy of words and syntax go as far as being useful for analytic purposes?

Specifically, if there were algorithms used by linguists to analyse texts for changes in language use (something which I think I read about recently, possibly at Lambda the Ultimate), could it be adapted to analyse the genome?

Thanks for the informative discussion on evo-devo. This is a very interesting field, and it shows that there's more to evolution than the simplistic genetic reductionism of GC Williams and JM Smith.

I'm going to have to butt in and defend JMS's honour. He had this to say in 1982 (from the introduction to Evolution and the Theory of Games):

"The long-term consequences [of the separation between developmental and evolutionary biology] have been less happy, because most biologists have been led to suppose either that the problems of development are not worth bothering with, or that they can be solved by a simple extension of the molecular biology approach which is being so triumphant in genetics. My own view is that development remains one of the most important problems of biology, and that we shall need new concepts before we can understand it."

I hope you don't mind if I open a few cans of worms...

I had run into this paper before and I have little doubt that its results are significant. I notice that the authors are specifically focusing on the role of the transcription factors. Some authors argue as if changes in the coding sequences of transcription factors themselves are perhaps the main driving force of molecular evolution.

Alternatively, Sean Carroll and others have tended to focus on the role of the promoters, particularly since relatively few changes in a small promoter may have profound effects upon its logic and thus transcription. Moreover, Carroll seems to think that changes in the sequences of the transcription factors would be too pleiotropic, whereas changes in the promoter regions would tend to be far more modular and tend to be less deleterious. In either case, it appears to be an on-going debate.

Oddly enough, in terms of this specific paper, the loss of transcription factors for duplicated genes would itself seem to suggest that it is the promoter regions which are changing and thus provide the sort of plasticity which drives evolutionary change. First, the authors are focusing on the difference between transcription and the actual coding, and arguing that transcription accounts for much more of the change in terms of protein expression. Second, he authors are focusing on the distinction between transcription (from DNA to RNA) and translation (from RNA to protein), and arguing that transcription ("upstream" - which would include the transcription factors, promoters, enhancers and suppressors), which takes place in the nucleus is the origin of much of the change rather than what occurs "downstream" during translation in the cytoplasm. Third, if the the genes are duplicated, one should typically expect them to be roughly in the same region, or so I would presume, which means that their promoters will be exposed to the same transcription factors -- and as such, if the divergence will not be in the transcription factors (which are typically regulatory proteins, although sometimes structural proteins are involved as well), so if the change in expression of duplicated genes occurs, it should be in terms of the promoter regions.

In any case, for one good example of a paper that analyzes the role of transcription factors and promoters in drosophila and argues for the role of changes in the promoters, people might try:

The Evolution of Transcriptional Regulation in Eukaryotes
Gregory A. Wray, Matthew W. Hahn, Ehab Abouheif, James P. Balhoff, Margaret Pizer, Matthew V. Rockman and Laura A. Romano
Mol. Biol. Evol. 20(9):1377-1419. 2003

Moreover, this sort of approach would seem to be more in line with the approach to modular gene regulation that Michael Lynch has been developing and which you have done us the pleasure of introducing.

For a paper which examines the relative importance of changes in the promoter regions and changes in the transcription factors, one might try:

Evolutionary changes in cis and trans gene regulation
Patricia J. Wittkopp, Belinda K. Haerum & Andrew G. Clark
Nature |Vol 430 | 1 July 2004

At one point, they state:

"Differences in gene expression are central to evolution. Such differences can arise from cis-regulatory changes that affect transcription initiation, transcription rate and/or transcript stability in an allele-specific manner, or from trans-regulatory changes that modify the activity or expression of factors that interact with cis-regulatory sequences1,2. Both cis- and transregulatory changes contribute to divergent gene expression, but their respective contributions remain largely unknown3. Here we examine the distribution of cis- and trans-regulatory changes underlying expression differences between closely related Drosophila species, D. melanogaster and D. simulans, and show functional cis-regulatory differences by comparing the relative abundance of species-specific transcripts in F1 hybrids4,5. Differences in trans-regulatory activity were inferred by comparing the ratio of allelic expression in hybrids with the ratio of gene expression between species. Of 29 genes with interspecific expression differences, 28 had differences in cis-regulation, and these changes were sufficient to explain expression divergence for about half of the genes. Trans-regulatory differences affected 55% (16 of 29) of genes, and were always accompanied by cis-regulatory changes. These data indicate that interspecific expression differences are not caused by select trans-regulatory changes with widespread effects, but rather by many cis-acting changes spread throughout the genome. "

But if this didn't complicate matters enough, there is a paper from 2002 which argues that transcription start sites may change even when the original transcription start sites are conserved, and that research which simply relies upon the alignment of homologous promoter sequences may be misleading.

Please see:

Evolution of Transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Conservation and Turnover
Emmanouil T. Dermitzakis and Andrew G. Clark
Molecular Biology and Evolution 19:1114-1121 (2002)

"Although regulatory regions are not under the same constraints as coding sequences, alignments of regulatory regions of human and rodent genes often reveal blocks of highly conserved sequences (Hardison, Oeltjen, and Miller 1997 ; Jareborg, Birney, and Durbin 1999 ; Leung et al. 2000 ; Wasserman et al. 2000 ). Observation of such strong sequence conservation suggests conserved function, thereby generating testable hypotheses that have often been confirmed (Leung et al. 2000 ; Wasserman et al. 2000 ). However, studies in Drosophila have revealed compensatory changes in gene enhancers (Ludwig et al. 2000 ), illustrating that conservation of function can be maintained in the face of fluidity in the exact composition of regulatory regions."

Does this have any effect on the amount of information that evolution can accumulate over a generation? I seem to remember that somebody has done calculations on the number of bits that can be accumulated (on average) over a generation.

Mikko wrote:

Does this have any effect on the amount of information that evolution can accumulate over a generation? I seem to remember that somebody has done calculations on the number of bits that can be accumulated (on average) over a generation.

I am just an amateur, and I haven't run across such figures myself. However, I would hesitate to think of this except in terms of the population, and I would suspect that a great deal of this would be dependent upon population size.

In smaller populations, you are going to see a great deal of near-neutral evolution (see Ohta) resulting in the gradual increase in genomic complexity due to gene duplication, segmental duplication and the like (see Ohno). However, in larger populations, you should expect sexual recombination to have a far stronger effect, resulting in more efficient natural selection.

But frequency-dependent population size should lag considerably behind the actual population size as the result of it being approximately equal to the harmonic mean of the population sizes of successive generations, resulting in increased time for near-neutral mutations to be adapted to new functions. As a result, periodic bottlenecks should result in increased organismal complexity, if sufficiently spaced.

I personally like to think of this as a complexity rachet, with periods of tightening and relaxation.

Of course, the accumulation of information happens over population. The question I had in mind is whether the upper limits on the accumulation of information are still valid, or whether they need to be rethought. Unfortunately, I'm not an expert in evolutionary biology either (though I do know quite a bit about theory of computation and information theory), hence the question.

The closest thing I can think of that is related to what you are asking is Haldane's dilemma and the question of how many mutations may become fixed for however many generations. However, Haldane's dilemma is of course ancient history by now, and there are a variety of ways to respond to it.

In particular Bruce Wallace argued that there was a fairly simple mistake made in terms of the mathematics due to a mathematical convention. Addtionally, there is the difference between soft and hard selection. For more on this particular topic, I would recommend "Fifty Years of Genetic Load." More recently, there have been questions raised as to whether sexual recombination will actually be more efficient than point mutation when one considers the fixation of several genes rather than just one, but I haven't followed that up as of yet and somewhat distrust the results.

With respect to the issue that you raise, it sounds like (just in part as a matter of language) an early in silico result, and I suspect that it does not take into account the modularity which we have discovered in terms of gene regulation or bodyplan -- and the implications for a population which is itself divided into subpopulations where there is reduced gene flow between the subpopulations. However, I remember one in silico where they subjected the "organisms" to a variable environment and found that the organisms evolved modularity.

Another point which I would be curious about is the extent to which in silicos have taken into account the distinction between regulatory DNA and coding DNA. I know of at least one case in which they have - specifically in terms of the analysis of promoter regions. It was found that with relatively small promoter regions it was possible for only a small number of mutations to radically modify the logic by which transcription factors determine the extent to which a gene gets expressed.

In any case, I may try and find the original article and what articles reference it. Typically, if you have a popular article, it gives you enough clues to find the original technical one, and then you can find the later referencing articles -- via the web.


I took a look at your blog, and the mention of just "ID" was a little off-putting as it made me think Dembski/Behe etc., but I realize now that it has nothing to do with that kind of "Intelligent Design." I also see that you are rather accomplished in your own field. Anyway, I haven't found anything as of yet along the lines of what you were looking for, but I will dig some more. If either one of us finds something perhaps we can bounce it back to the other.

Take care.

Tim, thanks! The nick name goes way back and refers to many things, perhaps primarily to Freud. I think it was hilarious to see the same shorthand being touted by the creationists, since I have a family history of having been one. Past tense, since I take science seriously as opposed to ID proponents. Thanks for all the good stuff here, it really is entertaining and educational.

Just one more quick addition. The reference to Freud, of course, is a long standing joke rather than any actual reference to me believing in Freud's theories.

To Mikko,

Like you, I enjoy learning. Incidentally, I should note that the results involving the smaller promoter regions simply involved point mutations, also known as SNPs (single nucleotide polymorphisms). Of course larger mutations, (such as transpositions or retrotranspositions) could do things more quickly. In fact, we know of at least one case in which a promoter region appears to be the result of the retrotransposition of the original gene.

As for the ID and your views regarding Freud, I figured as much. He had some insights, but I think he has been largely dustbined by now.