Gene Expression Differences between Populations

Phenotypic differences between populations, species, or any other taxonomic classification can be attributed to genetic and environmental causes. The genetic differences can be divided into sequence divergence of transcribed regions, copy number divergence, and expression divergence. These categories are hardly independent -- expression divergence results from the evolution of the protein coding sequences of transcription factors and cis regulatory regions of transcribed sequences.

An article in press in Nature Genetics (news item here) reports on differences in expression of 4,197 genes between Asians and Europeans (60 residents of Utah of European ancestry, 41 Han Chinese, and 41 Japanese). There were over 900 differentially expressed genes between Europeans and Chinese, over 700 between Europeans and Japanese, and only 27 between Chinese and Japanese. Because the two Asian samples were so similar, the researchers combined the data from the Chinese and Japanese individuals and found over 1,000 genes with different expression levels between Europeans and Asians -- 35 with at least a twofold difference in expression level.

One gene, UGT2B17, has 22 times higher expression levels in Europeans compared to Asians. Why? Because this gene lies within a polymorphic deletion, and the deletion is more common in the Asian sample than the European sample. With more individuals homozygous for the deletion, the expression level of UGT2B17 in Asians is much lower than in Europeans.

The tissue samples used to determine gene expression levels were also analyzed at known single nucleotide polymorphisms (SNPs) by the HapMap project. The researchers performed a statistical analysis to determine if any SNPs were significantly associated with the observed expression differences. They classified SNPs into cis and trans based on their position relative to the gene of interest -- SNPs located within 500kb of the expressed gene as cis, any other as trans. In the European sample, 10 expressed genes had cis associated SNPs and 94 had trans associated SNPs. In Asians, 23 genes had cis associated SNPs and 66 has trans associated SNPs.

They then looked at 11 genes with the same SNP significantly associated with expression levels in both Asians and Europeans; these SNP associations are less likely to be due to chance alone. All 11 were cis associated. For five of the 11 genes at least 50% of the expression variation could be attributed to the associated SNPs. The differences in expression levels between populations are due to allelic differences in cis regulatory elements in these cases.

There were four genes with significantly associated trans SNPs in both populations, but the SNPs differed between the two populations. This suggests that transcription factors may differ between the two populations, but this is somewhat speculative. The authors do not report whether these trans associated SNPs map to known transcription factors.

The paper concludes with the typical "gene expression differences explain phenotypic differences better than protein coding differences" position -- this time in relation to the genetics of complex disease. The authors argue that "variants in coding regions of candidate genes do not account for a large proportion of disease susceptibility", therefore gene expression differences are responsible. I'm not sure if I'd put all my eggs in the gene expression basket -- copy number polymorphism may also be important.


Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. 2007. Common genetic variants account for differences in gene expression among ethnic groups. Nature Genet. In press. DOI: 10.1038/ng1955

More like this

I haven't read the paper yet. But since they used HapMap SNPs how does ascertainment bias effect their association study for gene expression?

From what I understant, ascertainment bias with hapmap SNPs mostly affects popgen analysis. I think association studies aren't affected as much.

And GNXP has more.

But since they used HapMap SNPs how does ascertainment bias effect their association study for gene expression?

how would the ascertainment of SNPs bias an association study?

how would the ascertainment of SNPs bias an association study?

Because hapmap SNPs were identified in a small panel and then genotyped in a larger sample, rare SNPs have a good chance of being missed. This is especially problematic when performing popgen analysis (Tajima's D depends on site frequency spectra in which the frequency of rare polymorphisms are important), but I'm not sure how much of a role this would play in association studies. There is a chance that some rare SNPs may be associated with expression differences, so ascertainment bias could be a problem.

but I'm not sure how much of a role this would play in association studies

yeah, that's my question :)

in popgen, missing rare SNPs would cause a bias in the test statistic, maybe causing false positives. But the test statistic in an association study would not be affected by missing rare SNPs, You'd probably have low power to detect certain associations, but I don't see why there'd be a bias,