I commented a couple of days ago on a news item about a journal article on the evolution of gene expression in primates that had yet to be published. Well, the article has been published, and I've read it (Nature has also published a news and views piece on the study by Rasmus Nielsen). I have a few comments on why this research is unique, what the researchers found, and the implications of this research below the fold.
WHY THIS STUDY IS UNIQUE: This is the first large scale study to examine gene expression in primates using species specific probes. Gene expression can be measured by hybridizing RNA to microarrays that contain complementary DNA sequences. In previous studies of gene expression the DNA sequences were derived from human genes. This meant that different levels of hybridization could be due to either changes in gene expression or changes in the coding sequences of the genes (or a combination of both). By using probes derived from the sequences of the four species studied (human, chimpanzee, orangutan, and rhesus macaque) the authors controlled for the effects of coding sequence divergence. In this study, they examined the expression of 907 genes from each of these species.
WHAT THEY FOUND: I have divided the authors' findings into four sections: pairwise differences in gene expression, purifying selection on expression, evolution along the human lineage, and relationships between expression and coding sequence.
Pairwise differences in gene expression: The authors determined which genes are differentially expressed between each pair of species (see table below). The authors claim that, under a neutral model, they expect a linear relationship between expression differences and divergence time; they see no such relationship in their data. The lack of such a relationship leads them to postulate that the evolution of gene expression is probably not a neutral process. I see two problems with this conclusion. First of all, the sample size is both small and made up of non-independent measures of divergence. It's possible that this small sample size (in addition to the noise expected around the mean) means that the trend is undetectable. Second, a common phenomenon in sequence divergence is the saturation of sites. Models of nucleotide and protein sequence substitution were developed, in part, to correct for underestimates in sequence divergence using only percent identity. It is possible that such corrections must be derived for expression divergence.
Inter-species differentially expressed genes Chimpanzee Orangutan MacaqueHuman110128176Chimpanzee-150141Orangutan--129
The authors focus mostly on differences between humans and chimpanzees. One of the things they note (from the table above) is that approximately 12% of the genes in their sample are differentially expressed between humans and chimps -- half have higher expression in humans, and half have higher expression in chimps. Using the orangutan and macaque as outgroups, they show that 45 genes have increased expression along the human lineage and 43 have increased expression along the chimpanzee lineage. From these data it appears that there are no difference in gene expression evolution along the two lineages
Purifying selection on gene expression: The authors were also interested in genes whose expression is conserved amongst primates. They postulate that the genes with the most expression conservation would have similar expression levels between species and low within species variation in expression. Most of the genes in their sample meet these criteria, which means that changes in the expression of most of the genes are deleterious. To support this claim the authors point out that some of the genes with conserved expression are associated with human diseases when misexpressed.
Evolution of gene expression along the human lineage: A lot of folks are interested in determining what makes us different from other apes. To identify expression changes along the human lineage the authors looked for genes whose expression levels are not significantly different among non-human primates, but are significantly elevated or reduced in humans. They claim that the expression of such genes is/was under directional selection in the human lineage, but I think these changes could also be due to relaxed selective constraint. The authors provide no way to distinguish between the expectations of relaxed selective constraint and positive selection. Regardless of whether genes differentially expressed in humans are under positive selection, the authors found some interesting trends. Fourteen genes are upregulated in humans, whereas only five have lower expression in humans. This only represents a subset of genes with differential expression in humans because the authors only examined liver tissue and did not consider different developmental stages.
Five out of twelve of the upregulated genes are transcription factors, while none of the downregulated genes are transcription factors. Transcription factors only make up 10% of the genes in their array, and they are overrepresented amongst the human upregulated genes. They also relaxed their criteria for identifying differential expression along the human lineage and found that nine of thirty upregulated genes are transcription factors, and there are no transcription factors in the nineteen downregulated. As a point of contrast, 9% of both upregulated and downregulated genes along the chimpanzee lineage are transcription factors. These numbers jive well with the frequency of transcription factors on the microarray. Transcription factors have also been shown to evolve rapidly at the coding sequence level along the human lineage. This leads me to wonder whether the cis regulatory regions for transcription factors also evolve rapidly or if their upregulation is due to changes in trans factors. I also wonder whether the upregulation of multiple transcription factors is due to the upregulation of a single (or small amount of) transcription factor(s) which regulate(s) the expression (act upstream) of other transcription factors. This would lead to a domino or snowball effect in which multiple transcription factors are upregulated because of the upregulation of an upstream gene. The flaw in this explanation is that we would expect that other genes would be upregulated in the human genome, a pattern we do not see in the data.
Relationship between the evolution of expression and amino acid sequence: The authors divided the genes into three groups: those that have significantly different expression in humans, the 100 genes with the most conserved expression amongst the four species, and all other genes examined. They used synonymous and non-synonymous polymorphism and divergence to detect signatures of positive selection in the genes in their sample. One quarter of the genes with differential expression in humans had signatures of positive selection in their coding sequences. This contrasts dramatically with the small frequency of genes with signatures of positive selection in genes with conserved expression (4%) and all other genes in their sample (6%). If the authors' claim that differential expression is indicative of positive selection, then it looks like selection operates similarly on gene expression and protein coding sequence.
WHAT THIS MEANS: A previous model postulated that divergence in gene expression (rather than protein coding sequences) is responsible for the phenotypic differences between humans and chimpanzees. The findings in this paper support that model in two ways. First, the authors show that transcription factors (one of the elements responsible for determining the level at which a gene is expressed) are themselves expressed at different levels between humans and chimps. Second, the protein sequences of transcription factors appear to be evolving under positive selection, which may lead to differential expression of the genes whose expression they regulate.
Look for something similar from Kevin White involving five to ten Drosophila genomes in the next couple of years (which will actually encompass a larger evolutionary time scale than this study). It'll be something like this only with more divergent species and species specific chips. I don't have any inside information to back up this claim, it's just idle speculation.
Gilad, Y, A Oshlack, GK Smyth, TP Speed, and KP White. 2006. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature. 440: 242-245.
How well does this paper mesh with some of the recent articles in PLOS (using SNP's) - I'm thinking of Wang et al and Voight et al?
I think the coding sequence analysis was based on Bustamante et al's paper. Previous analyses of rapidly evolving genes tend identify genes that fall into these classes: olfaction, chemosensation, gametogenesis, and cell adhesion / fertilization.
I also think it's important to note a couple more things:
1. they measured gene expression in the liver, using an array designed with probes biased towards genes known to be expressed in the human liver ("only" 907 genes). Another interesting place to look for differential gene expression in humans is obviously the brain.
2. in the way they designed their array, they excluded genes known to be in segmental duplications, which are known to affect expression levels.
so this is far from the end of the story.
BTW, I hear Kevin White is moving to the University of Chicago this year. could take him a while to get settled in. Scoop his ass, RPM!
A previous model postulated that divergence in gene expression (rather than protein coding sequences) is responsible for the phenotypic differences between humans and chimpanzees. The findings in this paper support that model in two ways. First, the authors show that transcription factors (one of the elements responsible for determining the level at which a gene is expressed) are themselves expressed at different levels between humans and chimps. Second, the protein sequences of transcription factors appear to be evolving under positive selection, which may lead to differential expression of the genes whose expression they regulate.
Yes, when I was teaching histology to Med students, I was struck at how similar all mammals (and vertibrates) are at the cellular level. It was apparent that what differed was the arrangement of cells within a tissue.
I guess my bias (as a cell biologist) is that most encoded proteins go into the making of various cell types. Turning on these cell-specific genes together takes a few transcription factors (like myoD in muscle cells). Then manipulating these master transcription factors, via higher order transcription factors, regulates tissue composition and organization. And these differences (tissue organization), are the major source of inter-species differentiation.
"The authors provide no way to distinguish between the expectations of relaxed selective constraint and positive selection"
though in the case of relaxed selective constraint, the within-species variance should be larger in humans (less selection contraint= more room to explore), right? in the genes they show, that doesn't seem to be the case (admittedly with only five individuals)-- the variance in each species seems to be about the same, with humans having a significantly higher or lower expression.