I commented a couple of days ago on a news item about a journal article on the evolution of gene expression in primates that had yet to be published. Well, the article has been published, and I’ve read it (Nature has also published a news and views piece on the study by Rasmus Nielsen). I have a few comments on why this research is unique, what the researchers found, and the implications of this research below the fold.
WHY THIS STUDY IS UNIQUE: This is the first large scale study to examine gene expression in primates using species specific probes. Gene expression can be measured by hybridizing RNA to microarrays that contain complementary DNA sequences. In previous studies of gene expression the DNA sequences were derived from human genes. This meant that different levels of hybridization could be due to either changes in gene expression or changes in the coding sequences of the genes (or a combination of both). By using probes derived from the sequences of the four species studied (human, chimpanzee, orangutan, and rhesus macaque) the authors controlled for the effects of coding sequence divergence. In this study, they examined the expression of 907 genes from each of these species.
WHAT THEY FOUND: I have divided the authors’ findings into four sections: pairwise differences in gene expression, purifying selection on expression, evolution along the human lineage, and relationships between expression and coding sequence.
Pairwise differences in gene expression: The authors determined which genes are differentially expressed between each pair of species (see table below). The authors claim that, under a neutral model, they expect a linear relationship between expression differences and divergence time; they see no such relationship in their data. The lack of such a relationship leads them to postulate that the evolution of gene expression is probably not a neutral process. I see two problems with this conclusion. First of all, the sample size is both small and made up of non-independent measures of divergence. It’s possible that this small sample size (in addition to the noise expected around the mean) means that the trend is undetectable. Second, a common phenomenon in sequence divergence is the saturation of sites. Models of nucleotide and protein sequence substitution were developed, in part, to correct for underestimates in sequence divergence using only percent identity. It is possible that such corrections must be derived for expression divergence.
Inter-species differentially expressed genes
Chimpanzee Orangutan Macaque
The authors focus mostly on differences between humans and chimpanzees. One of the things they note (from the table above) is that approximately 12% of the genes in their sample are differentially expressed between humans and chimps — half have higher expression in humans, and half have higher expression in chimps. Using the orangutan and macaque as outgroups, they show that 45 genes have increased expression along the human lineage and 43 have increased expression along the chimpanzee lineage. From these data it appears that there are no difference in gene expression evolution along the two lineages
Purifying selection on gene expression: The authors were also interested in genes whose expression is conserved amongst primates. They postulate that the genes with the most expression conservation would have similar expression levels between species and low within species variation in expression. Most of the genes in their sample meet these criteria, which means that changes in the expression of most of the genes are deleterious. To support this claim the authors point out that some of the genes with conserved expression are associated with human diseases when misexpressed.
Evolution of gene expression along the human lineage: A lot of folks are interested in determining what makes us different from other apes. To identify expression changes along the human lineage the authors looked for genes whose expression levels are not significantly different among non-human primates, but are significantly elevated or reduced in humans. They claim that the expression of such genes is/was under directional selection in the human lineage, but I think these changes could also be due to relaxed selective constraint. The authors provide no way to distinguish between the expectations of relaxed selective constraint and positive selection. Regardless of whether genes differentially expressed in humans are under positive selection, the authors found some interesting trends. Fourteen genes are upregulated in humans, whereas only five have lower expression in humans. This only represents a subset of genes with differential expression in humans because the authors only examined liver tissue and did not consider different developmental stages.
Five out of twelve of the upregulated genes are transcription factors, while none of the downregulated genes are transcription factors. Transcription factors only make up 10% of the genes in their array, and they are overrepresented amongst the human upregulated genes. They also relaxed their criteria for identifying differential expression along the human lineage and found that nine of thirty upregulated genes are transcription factors, and there are no transcription factors in the nineteen downregulated. As a point of contrast, 9% of both upregulated and downregulated genes along the chimpanzee lineage are transcription factors. These numbers jive well with the frequency of transcription factors on the microarray. Transcription factors have also been shown to evolve rapidly at the coding sequence level along the human lineage. This leads me to wonder whether the cis regulatory regions for transcription factors also evolve rapidly or if their upregulation is due to changes in trans factors. I also wonder whether the upregulation of multiple transcription factors is due to the upregulation of a single (or small amount of) transcription factor(s) which regulate(s) the expression (act upstream) of other transcription factors. This would lead to a domino or snowball effect in which multiple transcription factors are upregulated because of the upregulation of an upstream gene. The flaw in this explanation is that we would expect that other genes would be upregulated in the human genome, a pattern we do not see in the data.
Relationship between the evolution of expression and amino acid sequence: The authors divided the genes into three groups: those that have significantly different expression in humans, the 100 genes with the most conserved expression amongst the four species, and all other genes examined. They used synonymous and non-synonymous polymorphism and divergence to detect signatures of positive selection in the genes in their sample. One quarter of the genes with differential expression in humans had signatures of positive selection in their coding sequences. This contrasts dramatically with the small frequency of genes with signatures of positive selection in genes with conserved expression (4%) and all other genes in their sample (6%). If the authors’ claim that differential expression is indicative of positive selection, then it looks like selection operates similarly on gene expression and protein coding sequence.
WHAT THIS MEANS: A previous model postulated that divergence in gene expression (rather than protein coding sequences) is responsible for the phenotypic differences between humans and chimpanzees. The findings in this paper support that model in two ways. First, the authors show that transcription factors (one of the elements responsible for determining the level at which a gene is expressed) are themselves expressed at different levels between humans and chimps. Second, the protein sequences of transcription factors appear to be evolving under positive selection, which may lead to differential expression of the genes whose expression they regulate.
Look for something similar from Kevin White involving five to ten Drosophila genomes in the next couple of years (which will actually encompass a larger evolutionary time scale than this study). It’ll be something like this only with more divergent species and species specific chips. I don’t have any inside information to back up this claim, it’s just idle speculation.
Gilad, Y, A Oshlack, GK Smyth, TP Speed, and KP White. 2006. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature. 440: 242-245.