Copy Number Polymorphism

I have a little bit of an infatuation with copy number polymorphism (CNP), which describes the fact that individuals within a population can differ from each other in gene content. Some genes, such as olfactory receptors (ORs), have many different related variants in any animal genome. New copies spring up via duplication events (a type of mutation), so one could imagine that individuals from a single population differ in the number of copies of these genes. In fact, this is the case with any gene or gene family (a group of related genes) in the genome -- there may be duplications segregating as CNPs. A few recent publications deal with this phenomenon, and I summarize some of the findings below the fold.

  • Nadeau and Lee review CNP (subscription required): They point out that the number of copies of particular genes can be used to determine predisposition to certain diseases. In the example they use, a deletion of a second copy of the Fcgr3 in rats makes them more susceptible to glomerulonephritis. Humans with fewer copies of the gene are also at a higher risk for the disease. Interestingly, existing duplications can encourage new duplications because of the repeat mediated molecular mechanism responsible for generating segmental duplications. This means that genes that are duplicated have a higher probability of being duplicated again than those that have never been duplicated.

  • George Zhang's group shows that gene loss along the human lineage may be under positive selection: They focus on the CASPASE12 gene and argue that a null allele has nearly fixed in humans due to selection for resistance to sepsis. After identifying pseudogenes (genes that are no longer functional) in the human genome, they used SNP data to determine whether the pseudogenes are common to all humans or unique to the few individuals who contributed their DNA to the genome projects. They identified 36 OR pseudogenes out of 67 total pseudogenes despite the fact that OR genes only make up 2% of all functional genes in the human genome (ie, these genes get duplicated and encourage more duplicates, many of which are non-functional). The CASPASE12 gene is part of a gene family that contains 11 functional genes in the human genome. A null allele of CASPASE12 has fixed in non-African populations and is at a frequency of 89% in individuals of African descent. The amount of nucleotide sequence polymorphism in the region flanking the mutation that makes the protein non-functional is decreased in null alleles relative to functional alleles, suggesting that directional selection has favored the recent null allele. Most summaries of CNP focus on the evolutionary importance of gaining extra copies of genes, but gene loss may also serve a purpose in adaptation to the environment.

  • Chris Ponting's group presents an analysis of selection of human CNPs: They found that telomeres and centromeres are enriched for CNPs -- not surprising considering that these regions tend to be loaded with repeat sequences which encourage duplications. Their data set included the entire genome (both protein coding and non-coding sequences), but they found that CNPs had more genes within them than expected if they were duplications of random sequences. Genes associated with Mendelian diseases are underrepresented in CNPs, whereas genes involved in immunity and olfaction are overrepresented (the OR genes pop up again). Genes segregating as CNPs have accumulated more non-synonymous mutations relative to synonymous mutations than other genes, indicating that they are either under relaxed selective constraint or positive selection. Additionally, CNPs that are the result of a deletion (rather than a duplication) do not show this pattern. They also analyzed CNPs in mouse and found none of the trends I just described.

We have known that gene duplication plays an important role in evolution for some time now. It's neat to see people examining the population genetics of duplicated genes and CNPs to understand how they evolve.

More like this